Program  

Courses
Corporate
Our Students
Resources
Bootcamp Programs
Short Courses
Portfolio Courses
Bootcamp Programs

Launch your career in Data and AI through our bootcamp programs

  • Industry-leading curriculum
  • Real portfolio/industry projects
  • Career support program
  • Both Full-time & Part-time options.
Data Science & Big Data

Become a modern data engineer by learning cloud, Airflow, Spark, Data lake/warehouse, NoSQL, and real-time data pipelines

Become a data analyst through building hands-on data/business use cases

Become an AI/ML engineer by getting specialized in deep learning, computer vision, NLP, and MLOps

Become a DevOps Engineer by learning AWS, Docker, Kubernetes, IaaS, IaC (Terraform), and CI/CD

Short Courses

Improve your data & AI skills through self-paced and instructor-led courses

  • Industry-leading curriculum
  • Portfolio projects
  • Part-time flexible schedule
AI ENGINEERING

Beginner

Intermediate

Advanced

Portfolio Courses

Learn to build impressive data/AI portfolio projects that get you hired

  • Portfolio project workshops
  • Work on real industry data & AI project
  • Job readiness assessment
  • Career support & job referrals

Build data strategies and solve ML challenges for real clients

Help real clients build BI dashboard and tell data stories

Build end to end data pipelines in the cloud for real clients

Corporate Partners

We’ve partnered with many companies on corporate upskilling, branding events, talent acquisition, as well as consulting services.

AI/Data Transformations with our customized and proven curriculum

Do you need expert help on data strategies and project implementations? 

Hire Data, AI, and Engineering talents from WeCloudData

Our Students

Meet our amazing alumni working in the Data industry

Read our students’ stories on how WeCloudData have transformed their career

Resources

Check out our events and blog posts to learn and connect with like-minded professionals working in the industry

Let’s get together and enjoy the fun from treasure hunting in massive real-world datasets

Read blogs and updates from our community and alumni

Explore different Data Science career paths and how to get started

Blog

Student Blog

Building Digital Marketing Dashboard Using Python, Docker, Airflow in Google Cloud (Part 1)

October 28, 2019

This blog series is posted by WeCloudData’s Data Science Immersive Bootcamp student Bob Huang (Linkedin)

OVERVIEW:

The digital marketing project gives you the ability to manage and analyze your marketing data from different platforms such
as Google Analytic, Gmail, Eventbrite, and Google Ad. You can find your emails based on their sent status, campaign, and
type to easily create and edit your email content. You can visualize the summary of public marketing event data and analyze
the conversion rate. You can also create customized dashboards using the acquired data for your purpose.

Part 1 of this blog will mainly focus on the tools and data pipeline infrastructure.

FEATURE SUMMARY:

The prospective scope of this project is closed to Software as a service (SaaS). According to Wikipedia, software as a service
is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted.
It is sometimes referred to as “on-demand software”, and was formerly referred to as “software plus services” by Microsoft.
We feature in the following aspects:

  1. Low cost: Monthly GCP service cost starting from $40.
  2. Customizable: We select useful data when doing data ingestion, choose storage methods, customize dashboard
    layout, do mini machine learning projects on data, migrate marketing data with other data, etc.
  3. Secure: Using Kubernetes clustering, all authentication keys will not be exposed to the public.
    Data ownership: You own all the data that we retrieve using APIs from various social platforms. If you don’t
    want to do data update, we are able to provide you program that query all historical data once to do analyst using
    the traditional method like Excel.
  4. Open source: All programs, services, and applications are open source product without cost. If you have
    a strong technical team, you can maintain the services once it is deployed without our support.
  5. Easy to use: Superset is an easy to learn software that non-technical person can use to build customized
    dashboard.
  6. We have good recommendations about how you can use your data to build meaningful visualizations and perform useful statistics
    or machine learning analysis. Details will be in the second part of the blog post.
  7. Big data: All the components that this project uses are scalable. For example, Kubernetes can do scaling
    and load balancing automatically.

PROCEDURES:

  1. Collect data from different sources: Build a Docker container that hosts Apache Airflow with various DAGs
    that gather emails, event registrations and other information from different sources and store them into Google BigQuery.
  2. Visualization using Apache Superset: Build a Docker container that hosts Apache Superset. Connect Superset
    to BigQuery then create dashboards to display data.
  3. Host Docker applications on Google Cloud: Create a Google Compute instance with Kubernetes that host multiple
    Dockers that serve the entire project. The advantage of Kubernetes includes auto scaling, application isolation,
    also good security.
  4. Extensions: Consider creating more Docker containers to set up applications like machine learning model
    based on email data, Dash-Plotly application, etc.

FLOWCHART:

PROJECT COMPONENTS:

  1. Google Cloud Platform – Project monitoring (https://cloud.google.com/) Google Cloud Platform, offered by Google, is a suite of cloud computing services that run on the same infrastructure that Google uses internally for its end-user products. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics, and machine learning.
  2. Apache Superset – Front end (https://superset.incubator.apache.org/) Superset is an easy to use data visualization tools that have fantastic templates. Non- technical people can quickly learn and create customizable dashboards based on business purposes. It supports various database connections and has security modules.Superset recently supports BigQuery connections. To containerize Superset, we refer to this Github example. (https://github.com/amancevice/superset/blob/master/Dockerfile) For Kubernetes deployment, do the followings:To set up Superset-BigQuery connection, create tables … follow Superset official documentation.
  3. BigQuery – Back end (https://cloud.google.com/bigquery/) Follow Google official instructions to create datasets (equivalent to a database) in BigQuery.  Generate Keys with different permissions to read or write data. For the backend, we can also use Cloud SQL, MySQL, Redshift, MongoDB, PostgreSQL, etc. We will adjust according to customers’ needs.
  4. Apache Airflow – Automation (https://airflow.apache.org/) Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.Build a Docker image that host airflow. Write different DAGs that can gather data from different sources using those credential. (One source per DAG.) Create DAGs that delete obsolete data. To build the image, we mainly refer to this Github example (https://medium.com/@shahnewazk/dockerizing-airflow-58a8888bd72d), using supervisor. Mainly we follow documents in this Github repository. We write one DAG to get data from one source and store data into multiple tables. Table schemas are already predetermined by inspecting the data.In the Python code, we need to parse the full query response and store them into different columns in the table. DAG properties such as retry times, failure email, and run frequency can be specified in the .py script. DAGs will be all stored in AIRFLOW_HOME/dags. Since all the authentication files, passwords, tokens are in this Docker container, we cannot expose it to the public. Kubernetes provide a ClusterIP deployment method that will secure the Airflow Docker container, as follows:DAG script sample and some explanation:Kubernetes – Container hosting (https://kubernetes.io/) Create a Google Compute Engine instance with Kubernetes that host multiple Docker containers that serve this project. One application one Docker container. For Docker images, we build locally and push it to Google Container Registry then deploy to Kubernetes cluster.Docker – Containerization (https://www.docker.com/) Docker is a computer program that performs operating-system-level virtualization, also known as “containerization”. It is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud. Sample DockerFile for Apache Airflow:

    We set up environment variables, copy files, install dependencies and run commands to build docker.

To find out more about the courses our students have taken to complete these projects and what you can learn from WeCloudData, click here to see our upcoming course schedule.

Other blogs you might like
Student Blog
The blog is posted by WeCloudData’s student Luis Vieira. I will be showing how to build a real-time dashboard on…
by Student WeCloudData
October 21, 2020
Uncategorized
Take a central role The Bank of Canada has a vision to be “a leading central bank—dynamic, engaged and…
by Shaohua Zhang
May 21, 2020
Uncategorized
Big Data for Data Scientists – Info Session from WeCloudData…
by WeCloudData
November 9, 2019
Previous
Next

Kick start your career transformation

WeCloudData

WeCloudData is the leading data science and AI academy. Our blended learning courses have helped thousands of learners and many enterprises make successful leaps in their data journeys.

Sign up for newsletter
This field is for validation purposes and should be left unchanged.