Big Data for Data Science

Online | Self-paced | Start Anytime
Intermediate
Early Access

About the Course

Curious about Cloud Computing and Big Data? The Big Data for Data Science course is the perfect fit. In this course, you’ll explore essential Amazon Web Services (AWS) tools and Spark, learning to build and manage an end-to-end data project. By the end of the course, you’ll be well-equipped to tackle cloud computing challenges and ready to pursue careers in big data.

Curriculum

  • Module 1: Intro to AWS EC2 (Amazon Elastic Compute Cloud)

    Overview:

    This module will introduce the participants to the ecosystem of AWS and one of the essential services known as EC2.

    Topics to Cover:

    • Why Cloud Computing?
    • What is AWS?
    • Setup credentials for EC2
    • Create and connect to an EC2 instance
  • Module 2: AWS S3 (Simple Storage Service)

    Overview:

    This module covers the essential storage service of AWS known as S3.

    Topics to Cover:

    • Creating S3 buckets
    • Understand the restrictions of S3 buckets
    • Connect and work with S3 buckets through AWSCLI
  • Module 3: AWS Kinesis & Firehose

    Overview:

    This module examines the power of Kinesis and data streaming.

    Topics to Cover:

    • What is data streaming?
    • Understand Kinesis Firehose
    • Work with Kinesis Firehose through boto3
    • Using sample data to emulate scraping social media

  • Module 4: EMR (Hadoop & Hive)

    Overview:

    This module focuses on the usage of big data and distributed computing.

    Topics to Cover:

    • What is Big Data?
    • What is Distributed Computing?
    • Understand the workings of a cluster
    • Working with EMR through AWS CLI

  • Module 5: Intro to Apache Spark & Databricks

    Overview:

    This module teaches participants the language of Spark and utilizing Databricks.

    Topics to Cover:

    • Creating a community edition Databricks account
    • Examine the similarities of Spark and EMR
    • Understand the structures of Spark
    • Working with basic Spark queries

  • Module 6: Spark DataFrame

    Overview:

    This module examines an application of Spark through Spark DataFrame.

    Topics to Cover:

    • The structures and functions of DataFrame
    • Creating and working with essential functions in Spark DataFrame
    • Understand the different embedded functions

  • Module 7: Databricks Spark Machine Learning

    Overview:

    This module teaches participants to work with Spark Machine Learning.

    Topics to Cover:

    • What is Spark Machine Learning?
    • Using a language model example in Spark ML
    • NLP process in Spark ML

  • Module 8: AWS SageMaker

    Overview:

    This module focuses on the use of SageMaker in AWS.

    Topic to Cover:

    • What is SageMaker?
    • Create and utilizing SageMaker
    • Applications of SageMaker
    • Deploying ML models using SageMaker

  • Module 9: AWS Boto3, Athena & Quicksight

    Overview:

    This final module teaches participants about using boto3 to connect to AWS services through Python. Also focuses on Athena to organize data and Quicksight as a real-time dashboard.

    Topic to Cover:

    • What are Athena and Quicksight?
    • Connecting data from S3 bucket to Athena then to Quicksight
    • Setting up a pipeline for data flow
    • Working with EC2 instances and S3 buckets through Boto3

Learning Outcomes

By the end of this course, participants will be able to:

  • Explain the key principles of cloud computing and describe the core AWS services for compute, storage, data streaming, and machine learning.
  • Set up, configure, and manage AWS EC2 instances to run scalable and flexible cloud applications.
  • Create and manage S3 buckets for efficient data storage, and interact with them using AWSCLI for file operations.
  • Implement real-time data streaming pipelines with Kinesis Firehose and process data programmatically using Python’s boto3 library.
  • Use Spark and EMR to process and analyze large datasets through distributed computing, and perform data operations using Spark DataFrames.
  • Develop, train, and deploy machine learning models using AWS SageMaker, and apply them in real-world scenarios.
  • Use AWS services like Athena, Quicksight, and Boto3 to create end-to-end data pipelines that enable querying, analyzing, and visualizing data

Tools

AWS
EC2 Instances
AWS S3 Buckets
AWS EMR
Sagemaker
AWS Athena
AWS Quicksight
Databricks
Spark
Python
SQL
Linux
Jupyter
Big Data for Data Science
Original price was: $399.00.Current price is: $200.00.
what you will get
HOW IT WORKS

Upgrade your skills with our short courses

Ranked #1 Data Training Program

4.9/5
4.96/5
4.95/5
4.95/5
student success

What our graduates are saying

OUR ALUMNI ARE WORKING AT
Recommended if you're interested in Big Data for Data Science
Learning Track

MLOps Engineer Track

Learning Track

Big Data Engineer Track

Learning Track

Cloud Engineer Track

Learning Track

Large Language Model (LLM) Engineer Track

Short Course

Data Streaming

Short Course

Data Migration

Short Course

Data Lake Architecture

Short Course

AI Autiomation and RPA

Career Track to Advance Your Career

Join our comprehensive career tracks designed to accelerate your professional growth and help you achieve your goals

Unlock Your Potential with Expert Guidance

Our mentorship services provide personalized support and insights from industry experts to help you navigate your career journey with confidence

Empower Your Workforce

Enhance your team’s skills and productivity with our tailored corporate training courses, designed to meet your organization’s unique needs

FAQ

Frequently asked questions about the bootcamp

The course is structured into weekly modules, each containing video lectures, reading materials, assignments, and quizzes. You can complete the modules at your own pace, but we recommend following the weekly schedule to stay on track.

You can get support in multiple ways:

  • TA Support on Slack: Our teaching assistants are available on Slack to answer your questions and provide guidance.
  • Peer Community on Discord: Join our Discord community to discuss course topics, share ideas, and collaborate with fellow students.

TAs are available on Slack from 9 AM to 6 PM (ET) Monday to Friday. Outside these hours, you can still post your questions, and TAs will respond as soon as they are back online.

After enrolling in the course, you will receive an invitation link to join the Discord community. Follow the link to create an account or log in to your existing account.

The Discord community offers peer-to-peer support, where you can discuss course topics, share resources, collaborate on projects, and network with fellow learners

The optional mentoring service includes one-on-one sessions with an experienced mentor who can provide personalized guidance, feedback on your progress, and help you set and achieve your learning goals.

Please talk to our Program Advisors to sign up for Mentorship services for an additional cost

Yes, you will have lifetime access to the course materials, including any updates made to the content in the future.

We accept all major credit cards, PayPal, and bank transfers. You can choose your preferred payment method at checkout

Ready to kick start your career

Contact our advisors now to learn more about our programs and courses. They are here to answer all your questions and help you embark on a successful journey.

Inquire about our programs
Speak to our advisors

"*" indicates required fields

Name*
This field is for validation purposes and should be left unchanged.
View our Big Data for Data Science course package
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.