Data Engineering Bootcamp (Self-paced)

Combine self-paced materials with mentor sessions for a comprehensive and flexible learning experience

Your search for a comprehensive self-paced learning program to become a data engineer ends here with WeCloudData’s Data Engineering Bootcamp. Crafting effective Data Engineering programs is challenging due to the evolving nature of tools and platforms in the data landscape, a task traditional university programs may struggle to keep up with. At WeCloudData, our industry-seasoned instructors provide hands-on experience, teaching essential skills for a seamless transition into a modern Data Engineering role. The program emphasizes building a robust project portfolio and stands out with post-graduation job support, mentorship, and referrals to facilitate a smooth transition. Enroll now and discover why WeCloudData is the best choice for your data engineering journey.

Explore our Program Package to find:

Access our comprehensive ✨self-paced✨ learning materials 24/7

Learn on your own terms and schedule. Our self-paced approach allows you to delve into the world of data without the pressure of deadlines. Take control of your learning journey, progressing at a pace that suits your lifestyle.

Learn Anytime, Anywhere

Learn at your own pace without deadlines, tailoring your journey to fit your lifestyle.

Interactive Modules

Dive deep into data concepts through videos, quizzes, and hands-on exercises that bring theory to life.

TA Support

Get dedicated support from experienced teaching assistants for guidance whenever needed.

Real-World Applications

Our course content is rooted in practical, industry-relevant scenarios, providing insights directly applicable to your professional endeavors


About the Program

The Applied Data Engineering Certificate Program has been tailored for new graduates, IT professionals, and career switchers aiming to enter the dynamic field of data engineering. Our self-paced learning option provides flexibility for individuals to acquire new skills at their own pace, supported by on-demand TA, weekly mentoring sessions, and interactive review workshops. This program is designed to accommodate the schedules of working professionals, allowing you to balance learning with your existing job commitments during weekdays and weekends. Unlike the full-time program, the self-paced option empowers you to progress at your convenience, and real client projects are available to help you gain practical experience. While this self-paced journey demands dedication, the learning outcomes promise to be exceptionally rewarding!

What you will get
Top data bootcamp voted by students and partners

A Seamless and Enriching Learning Experience

Start Learning Anytime

Engage with self-paced materials designed for learning anytime, anywhere, and at your own pace.

Get TA Support

Teaching Assistants (TAs) are readily available on Slack to assist you with any questions during the bootcamp.

Review Session

Participate in live, interactive workshops with instructors for dynamic review and reinforcement of the covered materials in each module.


Personalized Mentorship

Book a weekly meeting with a mentor who will guide you through your learning and career journey, providing personalized support.


Explore Our Comprehensive Curriculum

WeCloudData Bootcamps are designed to be project-based. We not only cover essential theories, but also teach how to apply tools and platforms that are in high demand today. Our program curriculum is also highly adaptive to the latest market trends. 

  • Linux and Docker
    Module 1
    Linux and Docker
    This module teaches students the fundamentals of Linux operating systems and containerization. We will train students to have decent enough command line skills so that they can work with containers, automation, and cloud CLIs. It equips students with the necessary skills to be able to work on big data, cloud computing, and data pipeline automation related projects.
    • Become familiar with linux operating systems
    • Write bash/shell scripts to automate repetitive tasks
    • Create, build, and deploy docker containers and images
    • Run applications in a docker container
    • Deploy applications using docker compose 
    • Work on a small yet complex project to apply what’s covered in this module
    • Linux Commands
    • Shell Scripting
    • Docker Commands
    • Docker File
    • Docker Compose
    • Flask Application
  • Python for Data Engineering
    Module 2
    Python for Data Engineering
    Python is one of the core skills of a data engineer and is highly popular in the job market. In this module, students will learn how to use Python for different data engineering tasks and utilize Python to interact with Cloud Containers, Servers, and Serverless tools. You will also learn to use several AWS services including EC2, S3, Lambda, and IAM.
    • Use different Python libraries for various data engineering use cases
    • Build and deploy Python applications on Cloud instances
    • Deploy Serverless applications using Python for AWS Lambda
    • Complete two mini-projects to improve Python and AWS skills
    • Python
    • AWS EC2
    • AWS S3
    • AWS Lambda
    • Docker
    • Python OOP
    • Python Logging
  • Modern Data Stack
    Module 3
    Modern Data Stack
    Data warehouse is a popular data engineering infrastructure in most companies. This module focuses on teaching students the modern data stack: Airbyte, Snowflake, dbt, and Reverse ETL. Students will learn how to work with modern data warehouse such as Snowflake and Amazon Redshift, create data models, and use dbt to orchestrate SQL-based ELT transformation pipelines.
    • Learn the internals of relational databases (RDBMS)
    • Build data models and work with modern data warehouses such as Snowflake and Redshift
    • Understand data connectors and ingestion tools such as Fivetran and Airbyte
    • Write dbt SQL workflows to transform data in data warehouse
    • Understand the basics of reverse ETL and different business use cases
    • Complete two mini-projects 
    • ELT
    • ETL
    • Reverse ETL
    • Data Connectors
    • Data Modelling
    • Data Warehouse
    • Dimensional Modeling
    • OBT (One-Big-Table)
    • Wide Tables
    • Snowflake
  • Big Data and Data Lake
    Module 4
    Big Data and Data Lake
    In this module, students will learn to work with big data technologies such as Apache Spark and Hadoop. Data Lake concept will be introduced so students understand the different use case scenarios of big data storage. Students then learn how to develop Spark applications to process big data. Spark jobs will be deployed in local mode, in AWS EMR, as well as Databricks platform. This module will go in-depth about Spark internals and Spark job optimizations.
    • Learn the principles of big data and distributed systems
    • Understand the pros and cons of Data Lake vs Data Warehouse
    • Learn different use cases of Data Lake and how to set up staging, processed, and production zones
    • Develop Spark ETL scripts and submit jobs to Databricks and AWS EMR
    • Deploy Serverless Spark jobs to AWS Glue
    • Process big data using federated queries services such as Athen and Preto
    • Complete three mini-projects to showcase your end-to-end big data processing skills
    • PySpark
    • Spark Optimization
    • EMR
    • MapReduce
    • Hadoop
    • Hive
    • Presto
    • Athena
    • Databricks
    • Spark Job Tuning
  • Build Data Pipelines
    Module 5
    Build Data Pipelines
    In this Module, students will learn how to build and deploy end-to-end a data pipelines for data integration and ETL. We will introduce the most popular ways of building dataflows and compare different popular tools.
    • Deploy and configure Apache Airflow in production environment
    • Get familiar with managed Airflow services on AWS
    • Develop Airflow DAGs (Direct Acyclic Graph) and set up dependencies among different operators
    • Orchestrate end-to-end data pipelines using Airflow and run complex ETL jobs
    • Understand the current landscape of data pipelining and orchestration. 
    • Understand the pros and cons of Airflow compared to Dagster and Prefect.
    • Learn how to orchestrate Serverless dataflows using AWS Lambda and Step Functions
    • Airflow Deployment
    • Data Pipelines
    • Pipeline Orchestration
    • Data Automation
    • AWS Lambda
    • AWS Step Function
    • Perfect
    • Dagster
  • NoSQL Database
    Module 6
    NoSQL Database
    In this module, students will learn how to work with NoSQL databases. We will help students understand the CAP theorem and motivation behind NoSQL databases. Since there are many NoSQL database engines, we choose to focus on DynamoDB and Elasticsearch.
    • Understand the CAP theorem
    • Understand the NoSQL use cases
    • Survey the NoSQL database landscape
    • Learn how to do data modelling in DynamoDB and Elasticseach
    • Learn how to ingest data into NoSQL databases
    • Understand log file ingestion and log file analysis with Elasticsearch and the ELK stack
    • Learn how to scale applications using DynamoDB
    • CAP Theorem
    • NoSQL
    • DynamoDB
    • Elasticsearch
    • ELK
    • Log Analysis
    • Data Modelling
  • Data Lakehouse and Streaming
    Module 7
    Data Lakehouse and Streaming
    A data lakehouse is a data architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data. In this module, students will learn how to set up Change Data Capture (CDC), data ingestion, Kafka, Apache Hudi/Iceberg, and Spark Streaming.
    • Set up CDC using Debezium
    • Set up Hudi or Iceberg on AWS EMR
    • Ingest data into Apache Kafka
    • Manage upserts with Apache Hudi/Iceberg
    • Working with streaming data using Spark Streaming
    • Complete an end-to-end Data Lakehouse project
    • Spark Streaming
    • Data Lakehouse
    • Upserts
    • Change Data Capture
    • CDC
    • Debezium
    • Apache Spark
    • Apache Flink
    • Apache Kafka
    • Streaming Data Processing
    • Real-time Data
  • Career Preparation
    Module 8
    Career Preparation
    Before entering the 1-1 career mentoring, students will learn about the Data Science job market and build job search skills. Career coaches will teach graduates how to structure resumes, apply for jobs, and ace the interviews. Students work in groups for peer mock interview practice.

    Career services included in the bootcamp include

    • Resume workshops
    • Group interview practice
    • Portfolio project mentoring
    • Coding interview practice and additional resources (Leetcode/hackerrank)
    • Peer programming practice and code reviews

    Career services included after graduation (6 months)

    • One-on-one career mentoring sessions with data scientists for 6 months after graduation
    • One-on-one resume critique
    • One-on-one mock interview sessions with data science mentors
    • Job referrals and networking sessions
    • Research
    • Leetcode
    • System Design
    • Networking
    • Salary Negotiation

Nail your job search journey via 1-on-1 mentoring

From the beginning of your self-paced bootcamp, our experienced mentors are here to guide you every step of the way. Have questions about the materials? Need career advice to navigate your chosen field? Your dedicated mentor is ready to provide insights, support, and valuable guidance. Together, we’ll shape your learning experience and pave the way for your successful career

Tailored Acceleration and Support:

Your mentor is there to address questions promptly, ensuring a tailored and efficient learning journey

Enhanced Motivation:

Stay motivated and engaged with your studies as your mentor provides encouragement, feedback, and a supportive presence throughout your self-paced bootcamp

Industry Insights:

Gain valuable career advice, insights, and industry knowledge from your mentor, enhancing your understanding and preparing you for success in your chosen field.

Real Client Projects

Gain Hands-on Experience with Real-client Projects ✨

One of the best ways to get the experience needed for a data & tech career is to start with a project. WeCloudData is one of the few companies who offers this opportunity. In our bootcamp, we’ll give you an opportunity that many graduates don’t have: work on something meaningful and important right away. You will be able to even contribute ideas or solutions that make an impactful change!

*Please note that the Real Client Project is now optional and not included in the Self-paced bootcamp tuition.

Interested in Real Client Project?
Portfolio Projects

Build real project experience to differentiate

We also have a capstone project that gives our students the chance to synthesize their learning and build a portfolio piece they can showcase on their resume or LinkedIn profile. This helps them stand out from other applicants when applying for jobs or opportunities.


Tuition and Scholarship


We believe in investing in the future of our industry and supporting individuals in their professional journeys. That’s why WeCloudData is proud to offer scholarships for individuals looking to pursue professional development or make a career change. 

The Women in Tech scholarship supports and empowers women pursuing careers in the technology industry. This scholarship aims to address gender diversity and underrepresentation in tech by providing financial assistance, mentorship, and access to our comprehensive bootcamp. We celebrate the achievements of aspiring women technologists and equip them with the skills to thrive in the tech sector.

The Laid-off Support is a special offering designed to provide financial assistance and support to individuals who have recently been laid off from their jobs. We understand the challenges and uncertainties that come with job loss, and we want to help those affected continue their professional development in the technology sector. This provides reduced pricing and access to our comprehensive range of courses, allowing laid-off individuals to acquire new skills, enhance their knowledge, and increase their chances of finding new opportunities in the competitive job market.

The Fresh Grads Scholarship is a scholarship program that offers financial assistance to outstanding individuals who have recently completed their undergraduate or postgraduate studies within 12 months and are looking to kickstart their careers in data and tech. The scholarship aims to bridge the gap between academia and industry, empowering fresh graduates to become competent data professionals through an intensive and comprehensive bootcamp.

student success

What our graduates are saying

Laura Vieira

Graduated 2021 | Reviewed on 17 October 2021

“Amazing course and support”

The course was really great a little too fast if you are not in a technical field already. You will need to study hard reviewing classes and making the labs and assignments but you always have the support of the TAs (they are awesome) or even the instructors and your own classmates that are always helping each other via Slack. There will be a final project where you will need to present a pipeline that you created (don’t worry they will be helping you!). Then, they will be helping you to find a job. Shaohua is always looking for the best for his students, he wants to make sure that you have the best experience with them.


Let WeCloud Accelerate Your Career in Tech

Start your career today!

Want more details about this program? Unsure about which path to take? Apply now to reserve a spot or make an appointment with our learning advisor. 

View our Data Engineering Bootcamp (Self-paced) course package


Data Engineering Bootcamp (Self-paced)

Bootcamp Application

"*" indicates required fields

Why are you interested in joining this bootcamp?
Our Program Advisor team will reach out to you to schedule an online meeting. Please note that meeting time is in Eastern Time (ET).
View our Data Engineering Bootcamp (Self-paced) course package