Big Data for Data Science

Standard Course
Advanced
Early Access

About the Course

Explore the foundations of cloud computing and big data using AWS and Spark. This course guides you through building and managing end-to-end data projects, equipping you with the skills to solve real-world big data challenges and pursue careers in data science and analytics.

Learning Outcomes

By the end of this course, participants will be able to:

  • Explain the key principles of cloud computing and describe the core AWS services for compute, storage, data streaming, and machine learning.
  • Set up, configure, and manage AWS EC2 instances to run scalable and flexible cloud applications.
  • Use Spark and EMR to process and analyze large datasets through distributed computing, and perform data operations using Spark DataFrames.
  • Develop, train, and deploy machine learning models using AWS SageMaker, and apply them in real-world scenarios.
  • Use AWS services like Athena, Quicksight, and Boto3 to create end-to-end data pipelines that enable querying, analyzing, and visualizing data

Curriculum

  • Module 1: Intro to AWS EC2 (Amazon Elastic Compute Cloud)

    Overview:

    This module will introduce the participants to the ecosystem of AWS and one of the essential services known as EC2.

    Topics to Cover:

    • Why Cloud Computing?
    • What is AWS?
    • Create and connect to an EC2 instance
  • Module 2: AWS S3 (Simple Storage Service)

    Overview:

    This module covers the essential storage service of AWS known as S3.

    Topics to Cover:

    • Creating S3 buckets
    • Understand the restrictions of S3 buckets
    • Work with S3 buckets through AWSCLI
  • Module 3: AWS Kinesis & Firehose

    Overview:

    This module examines the power of Kinesis and data streaming.

    Topics to Cover:

    • What is data streaming?
    • Understand Kinesis Firehose
    • Work with Kinesis Firehose

  • Module 4: EMR (Hadoop & Hive)

    Overview:

    This module focuses on the usage of big data and distributed computing.

    Topics to Cover:

    • What is Big Data?
    • What is Distributed Computing?
    • Working with EMR through AWS CLI

  • Module 5: Intro to Apache Spark & Databricks

    Overview:

    This module teaches participants the language of Spark and utilizing Databricks.

    Topics to Cover:

    • Examine the similarities of Spark and EMR
    • Understand the structures of Spark
    • Working with basic Spark queries

  • Module 6: Spark DataFrame

    Overview:

    This module examines an application of Spark through Spark DataFrame.

    Topics to Cover:

    • Structures and functions of DataFrame
    • Working with essential functions in Spark DataFrame
    • Understand the embedded functions

  • Module 7: Databricks Spark Machine Learning

    Overview:

    This module teaches participants to work with Spark Machine Learning.

    Topics to Cover:

    • What is Spark Machine Learning?
    • Using a language model example in Spark ML
    • NLP process in Spark ML

  • Module 8: AWS SageMaker

    Overview:

    This module focuses on the use of SageMaker in AWS.

    Topic to Cover:

    • What is SageMaker?
    • Applications of SageMaker
    • Deploying ML models using SageMaker

  • Module 9: AWS Boto3, Athena & Quicksight

    Overview:

    This final module teaches participants about using boto3 to connect to AWS services through Python. Also focuses on Athena to organize data and Quicksight as a real-time dashboard.

    Topic to Cover:

    • What are Athena and Quicksight?
    • Connecting data from S3 bucket to Athena then to Quicksight
    • Setting up a pipeline for data flow
    • Working with EC2 instances and S3 buckets through Boto3

Tools

AWS
EC2 Instances
AWS S3 Buckets
AWS EMR
Sagemaker
AWS Athena
AWS Quicksight
Databricks
Spark
Python
SQL
Linux
Jupyter
Ready to start learning?

Get access to top-rated courses, real projects, and job-ready skills.

Have questions?

We’re here to help. Talk to our advisors. 

STUDENT REVIEWS

What our graduates are saying

Recommended if you're interested in Big Data for Data Science
Standard Course

AI Automation

Standard Course

Introduction to GitHub Actions

Standard Course

GCP Fundamentals

Standard Course

Introduction to Large Language Models

Learning Track

DevOps Engineering Track

Learning Track

MLOps Engineering Track

Learning Track

Cloud Engineering Track

Learning Track

Artificial Intelligence (AI) Engineering Track

Common Questions

Find answers to your questions about the Learning Track
  • Standard Courses: Focused, short courses that build foundational or intermediate skills through hands-on exercises, enabling you to apply what you learn immediately.
  • Track Courses: Structured learning paths that guide you from beginner to advanced levels. They include practical projects that integrate multiple tools and workflows, aligned with industry best practices, helping you gain the skills and confidence to tackle real-world challenges.

No. Track Courses are only accessible through the Professional or Unlimited+ subscription plans.

  • Standard Plan gives you access to all Standard Courses.
  • Professional Plan gives you access to both Standard and Track Courses within your chosen domain.
  • Unlimited+ Plan provides full access to all courses — both Standard and Track — across all domains.

 

Yes, all courses are designed to be self-paced. Learn when it fits your schedule.

Each course includes prerequisites if needed. Many Standard Courses are beginner-friendly.

Still have questions?

If you have other queries or specific concerns, don’t hesitate to let us know. Your feedback is important to us, and we aim to provide the best support possible.

Your Learning Journey Awaits 🚀

Grow your skills, build projects you’ll be proud of, and unlock new opportunities — all at your pace.

Download Big Data for Data Science Course Package