Program  

Courses
Corporate
Our Students
Resources
Bootcamp Programs
Short Courses
Portfolio Courses
Bootcamp Programs

Launch your career in Data and AI through our bootcamp programs

  • Industry-leading curriculum
  • Real portfolio/industry projects
  • Career support program
  • Both Full-time & Part-time options.
Data Science & Big Data

Become a modern data engineer by learning cloud, Airflow, Spark, Data lake/warehouse, NoSQL, and real-time data pipelines

Become a data analyst through building hands-on data/business use cases

Become an AI/ML engineer by getting specialized in deep learning, computer vision, NLP, and MLOps

Become a DevOps Engineer by learning AWS, Docker, Kubernetes, IaaS, IaC (Terraform), and CI/CD

Short Courses

Improve your data & AI skills through self-paced and instructor-led courses

  • Industry-leading curriculum
  • Portfolio projects
  • Part-time flexible schedule
AI ENGINEERING

Beginner

Intermediate

Advanced

Portfolio Courses

Learn to build impressive data/AI portfolio projects that get you hired

  • Portfolio project workshops
  • Work on real industry data & AI project
  • Job readiness assessment
  • Career support & job referrals

Build data strategies and solve ML challenges for real clients

Help real clients build BI dashboard and tell data stories

Build end to end data pipelines in the cloud for real clients

Corporate Partners

We’ve partnered with many companies on corporate upskilling, branding events, talent acquisition, as well as consulting services.

AI/Data Transformations with our customized and proven curriculum

Do you need expert help on data strategies and project implementations? 

Hire Data, AI, and Engineering talents from WeCloudData

Our Students

Meet our amazing alumni working in the Data industry

Read our students’ stories on how WeCloudData have transformed their career

Resources

Check out our events and blog posts to learn and connect with like-minded professionals working in the industry

Let’s get together and enjoy the fun from treasure hunting in massive real-world datasets

Read blogs and updates from our community and alumni

Explore different Data Science career paths and how to get started

Blog

Consulting

Consulting Case Study: Lookalike Models for Audience Expansion

October 19, 2021

Background

Our client is one of the largest news publishers in North America. With their print and digital formats reach millions of readers every week, they lead the national discussion by engaging audiences through its prestigious coverage of news, politics, business, investing and lifestyle topics, across multiple platforms.

The WeCloudData team worked with the client’s digital marketing and data analytics team on an audience segmentation and expansion project for customer acquisition.

Problem Statement

WeCloudData helped the client set ML strategies on how to generate look-alike users for the certain types of customers with similar behaviours or interests, further, to provide guidance on the marketing and bidding decisions with the most up-to-date and precise information.

The key challenge of the project is that the client collects hundreds of millions of session data generated by millions of readers on a daily basis. To drive subscriptions, the client is hoping to target anonymous users who will become high-LTV subscribers. The preliminary data cleaning and analysis must be done. Hence, we started our work on the following aspects:

  1. Preliminary data transforming and analysis
  2. Look-alike model development
  3. Model evaluations and testing
  4. Workflow automation

Tools used: Snowflake, Spark on Databricks, AWS (S3, EC2, Airflow), Machine Learning

Milestones

  1. Similarity-based Look-alike Model: Nearest Neighbors (NN) + Clustering
    1. Simple and easy to understand
    2. Difficult to test (A/B testing required)
    3. No feature importance to interpret
    4. Not with high precise but effective to detect “Neighbors (Targeted customers from the pool) Strangers (Unwanted customers for this segment)” with the defined Similarity Score

To solve the scalability problem, we also introduced the hashing algorithms, Locality Sensitive Hashing (LSH) to reduce the computational cost when calculating the distance.

Precision vs Recall: i.e., “cost of targeting the wrong user is much smaller than the cost of failing to target the right user”. Also don’t want to waste resources on the wrong users though – Finding a balance is important.

  1. Classification Models
    1. More explanatory power – Feature importance and confusion matrix
    2. Randomly sampling users is difficult and introduced bias – Training models in Spark will improve reliability significantly
    3. Easier to evaluate results on test data
  2. Model Deployment and Data Flow
  1. The model automation:
    1. Audience segment creation in Cloud
    2. A batch job runs daily or hourly to find lookalikes to augment the segment size (real-time list generation possible)
    3. User selects number of lookalikes based on similarity score
    4. New users appended back to original segment and sent to 3 party Ad Manager

Future works

  • Availability to adjust the metrics in determine the “similarity score” based on business needs in the future
  • Test on different segments and larger samples as the data gathered
  • Continue engineering features for the model interpretation
  • Optimized the AI data pipeline

Conclusion

Beam Data successfully delivered this half year project within digital media industry. It showed our capability in handling large amount of data and provide the data-driven insights in new areas. Throughout the project, one of the biggest challenges is to gain the variety types of domain knowledges in a short time and communicate with cross-functional teams to convey the tasks. In addition, we also quickly adjusted in grasping the client’s tech stacks to deliver the compatible works smoothly.

Beam Data then successfully gained trust with the client and continued the relationship with the same project team for further work contents such as the AI model optimization, pipeline design and other different data inquiries.

Other blogs you might like
Student Blog
The blog is posted by WeCloudData’s student Luis Vieira. I will be showing how to build a real-time dashboard on…
by Student WeCloudData
October 21, 2020
Uncategorized
Take a central role The Bank of Canada has a vision to be “a leading central bank—dynamic, engaged and…
by Shaohua Zhang
May 21, 2020
Uncategorized
Big Data for Data Scientists – Info Session from WeCloudData…
by WeCloudData
November 9, 2019
Previous
Next

Kick start your career transformation

WeCloudData

WeCloudData is the leading data science and AI academy. Our blended learning courses have helped thousands of learners and many enterprises make successful leaps in their data journeys.

Sign up for newsletter
This field is for validation purposes and should be left unchanged.