Hello Sumit,

Hope you are doing well.

Regarding the usecase please kindly give us some time we will get back to you at the earliest

Spark is for manipulating data in hdfs, so EMR is amazon cluster you can use spark for manipulating the data as sparkĀ  provide high speed for accessing the data from hdfs in comparison to mapreduce program in hadoop so you can use spark over it.


Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark and Presto in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.

Amazon EMR securely and reliably handles your big data use cases, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.


Apache Spark on Amazon EMR

Apache Spark is an open-source, distributed processing system commonly used for big data workloads. Apache Spark utilizes in-memory caching and optimized execution for fast performance, and it supports general batch processing, streaming analytics, machine learning, graph databases, and ad hoc queries.

Apache Spark on Hadoop YARN is natively supported in Amazon EMR, and you can quickly and easily create managed Apache Spark clusters from the AWS Management Console, AWS CLI, or the Amazon EMR API. Additionally, you can leverage additional Amazon EMR features, including fast Amazon S3 connectivity using the Amazon EMR File Sytem (EMRFS), integration with the Amazon EC2 Spot market, and resize commands to easily add or remove instances from your cluster.