Apache Spark


Apache Spark is an open-source engine developed specifically for handling large-scale data processing and analytics. Spark offers the ability to access data in a variety of sources, including Hadoop Distributed File System (HDFS), OpenStack Swift, Amazon S3 and Cassandra.

Apache Spark is designed to accelerate analytics on Hadoop while providing a complete suite of complementary tools that include a fully-featured machine learning library (MLlib), a graph processing engine (GraphX) and stream processing.

Apache Spark originated at UC Berkeley’s AMPLab in 2009 and was donated in 2013 to the Apache Software Foundation, where it has become the most active project in terms of contributions.

One of the key reasons behind Apache Spark’s popularity, both with developers and in enterprises, is its speed and efficiency. Spark runs programs in memory up to 100 times faster than Hadoop MapReduce and up to 10 times faster on disk. Spark is natively designed to run in-memory, enabling it to support iterative analysis and more rapid, less expensive data crunching.

Read Also:

  • Microsoft Parature

    Microsoft Parature is a cloud-based customer engagement solution that enterprises can deploy to provide self-service capabilities to their customers. Examples of customer engagement capabilities provided by Microsoft Parature include self-service knowledge base portals for customers on the Web and social networks like Facebook, online chat systems for customer service and proactive sales opportunities, customer support […]

  • GHOST Bug

    The GHOST bug is a buffer overflow security vulnerability in some distributions of Linux that can potentially enable attackers to execute arbitrary code on systems. Specifically categorized as GHOST (gethostbyname) CVE-2015-0235, the GHOST bug flaw resides in the gethostbyname() and gethostbyname2() function calls in older versions of the GNU C Library (glibc) that is packaged […]

  • Cloud App Policy

    Cloud app policy refers to policies and procedures put in place by enterprises to ensure that the usage of cloud applications by employees complies with the overall corporate security plan as well as regulatory requirements. Policies Range From Minimal to Highly Restrictive The cloud app policies put in place by companies can range from minimal […]

  • GameOver Zeus

    GameOver Zeus is a sophisticated evolution of the ZeuS malware that cybercriminals created to steal usernames and passwords from users on infected systems. GameOver Zeus, or GOZ, initially spread via a malicious spam and phishing campaign that sent out e-mails appearing to come from reputable organizations such as the Federal Reserve Bank, the Federal Deposit […]

  • Data Loss Prevention (DLP)

    Data loss prevention, or DLP, refers to technology or software developed to protect and prevent the potential for data loss or theft. Data loss protection software is designed to monitor, detect and prevent the loss of data while it’s at rest, either in on-premises storage drives or in the cloud, as well as when it’s […]


Disclaimer: Apache Spark definition / meaning should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional. All content on this website is for informational purposes only.