Member-only story

A Case Study of Spark: Understanding the Analytics Engine for Big Data and ML

Fintelics
4 min readJan 5, 2022

--

A Case Study of Spark: Understanding the Analytics Engine for Big Data and ML

Apache Spark is an open-source, distributed processing system and unified computing engine used for big data tasks. It utilizes in-memory caching and optimized query implementation for urgent queries for data of all sizes. In simple terms, Spark is a high-speed and general computing engine for large-scale data processing.

The high-speed part implies that it’s faster than traditional methodologies that work with big data, such as the traditional MapReduce. The secret to high-speed computational power is that Spark runs on Random Access Memory (RAM). This makes processing much faster than on disk drives.

The general part implies that it can be used for accomplishing different tasks such as running distributed SQL, employing machine learning (ML) algorithms, building data pipelines, working with graphs or data streams, ingesting data into a database, and much more.

Three key components make Spark the best in solving big data problems at scale, which encourage many businesses working with huge volumes of unstructured data to include Apache Spark into their technology stack.

--

--

Fintelics
Fintelics

Written by Fintelics

Software consulting company that focuses on emerging technology such as AI, Blockchain, Cloud Computing, and Data Engineering, MERN Stack, and Fintech

No responses yet