Apache Pig
Apache Pig is a high-level procedural language platform developed to simplify querying large data sets in Apache Hadoop and MapReduce. Apache Pig features a “Pig Latin” language layer that enables SQL-like queries to be performed on distributed datasets within Hadoop applications.
Pig originated as a Yahoo Research initiative for creating and executing map-reduce jobs on very large data sets. In 2007 Pig became an open source project of the Apache Software Foundation.
Read Also:
- Apache HBase
Apache HBase (HBase) is the Hadoop database. It is a distributed, scalable, big data store. HBase is a sub-project of the Apache Hadoop project and is used to provide real-time read and write access to your big data. According to The Apache Software Foundation, the primary objective of Apache HBase is the hosting of very […]
- Apache Hive
Apache Hive (Hive) is a data warehouse system for the open source Apache Hadoop project. Hive features a SQL-like HiveQL language that facilitates data analysis and summarization for large datasets stored in Hadoop-compatible file systems. Hive originated as a Facebook initiative before becoming a sub-project of Hadoop. Hive is currently an open source volunteer top-level […]
- a Private Cloud Project
Companies initiate private cloud projects to enable their IT infrastructure to become more capable of quickly adapting to continually evolving business needs and requirements. Private cloud projects can also be connected to public clouds to create hybrid clouds. Unlike a public cloud, a private cloud project remains within the corporate firewall and under the control […]
- HortonWorks
An enterprise software firm that specializes in open source Apache Hadoop development and support. HortonWorks was launched in 2011 by Yahoo and Benchmark Capital, and its flagship product is Hortonworks Data Platform, which is powered by Apache Hadoop. Hortonworks Data Platform is designed as an open source platform that facilitates integrating Apache Hadoop with an […]
- Hadoop MapReduce
Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on compute clusters of commodity hardware. It is a sub-project of the Apache Hadoop project. The framework takes care of scheduling tasks, monitoring them and re-executing any failed tasks. According to The Apache Software Foundation, the primary objective of Map/Reduce […]