Apache Spark
Spark Overview:
Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter. It provides clean, language-integrated APIs in Scala and Java, with a rich array of parallel operators. Spark can run on top of the Apache Mesos cluster manager, YARN, Amazon EC2, or at standalone mode.
Apache Spark:
Spark is a fast, easy-to-use and flexible data processing framework. It is an open-source engine developed specifically for handling large-scale data processing and analytics.
About Apache Spark:
Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab by Matei Zaharia. It was Open Sourced in 2010 under a BSD license. It was donated to Apache software foundation in 2013, and now Apache Spark has become a top level Apache project from Feb-2014.
Spark features:
Speed − Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory, and 10 times faster when running on disk. This is possible by reducing number of read/write operations to disk. It stores the intermediate processing data in memory.
Speed − Spark helps to run an application in Hadoop cluster, up to 100 times faster in memory, and 10 times faster when running on disk. This is possible by reducing number of read/write operations to disk. It stores the intermediate processing data in memory.
Multiple programing languages supports− Spark provides built-in APIs in Java, Scala, or Python. Therefore, you can write applications in different languages. Spark comes up with 80 high-level operators for interactive querying.
Analytics and more − Along with MapReduce. Spark also supports SQL queries(SparkSQL), Streaming data(SparkStreaming), Machine learning (MLib), and Graph algorithms(GraphX).
0 comments:
Post a Comment