Consulting Case Study: Recommender Systems

October 19, 2021

Client Info

Our client is one of Canada’s most well-established and decorated news outlets. They have been the recipient of numerous journalism awards and have a reach of millions of readers for their print and digital content across all news categories.

In the early to mid 2010s, our client began to shift its focus towards their digital platform. With a significant weekly readership and the rapid transition to digital content, the client first created a data pipeline which could collect and store the millions of rows of clickstream data their users generated on a daily basis. Next, in order for the client to leverage their collected user clickstream data to enhance the online user experience, the WeCloudData team was tasked with developing recommender system models whereby users can receive more personalized article recommendations.

Problem Statement

Our client aims to utilize a recommender system in order to:

  1. Increase user website engagement through the recommendation of more relevant articles
  2. Grow their current userbase and retain subscribed users long-term

Given that our client handles millions of users on a daily basis, leveraging big data tools was necessary in order to process the raw data and generate user-specific recommendations in a timely manner.


In order to meet the technical requirements for recommender system development as well as other emerging data needs, the client has built a mature data pipeline through the use of cloud platforms like AWS in order to store user clickstream data, and Databricks in order to process the raw data. With these data tools in place, the WeCloudData team was able to:

  1. Process the raw user clickstream data with Python & Spark to develop an array of recommender models. These models utilized traditional methods like content-based filtering and collaborative filtering, as well as more advanced deep learning techniques with BERT.
  2. Generate user article recommendations and write the recommendations back to a NoSQL database.
  3. Automate article recommendation generation through Databricks built-in job scheduler.
  4. AB Test the article recommendations generated from our developed models against the current champion model.


This architecture demonstrates how data collected from our client’s website is stored and fed into databricks for model development. The recommendations generated from our models are then written back into a NoSQL database and displayed back on their website via an API


Over the course of this project, the WeCloudData team tackled the development of several recommender models by taking advantage of collected user clickstream data and article meta data. This was performed in order to generate more personalized article recommendations with the goal of increasing user engagement. Given that these models are ran several times a day to update a user’s recommendations, the aim of subsequent projects will focus on further optimizing these models in order to maximize their performance while minimizing costs.

Join our programs and advance your career in Data ScienceMachine Learning Engineering

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Other blogs you might like
Student Blog
This blog series is posted by WeCloudData’s Data Science Immersive Bootcamp student Bob Huang (Linkedin) Continuing from the first…
by Student WeCloudData
October 28, 2019
“Discover Data Science, Engineering, and Analysis with WeCloudData. Uncover unique roles, skills, and tools in specialized bootcamps. Propel your…
by WeCloudData
January 24, 2024
Student Blog
The blog is posted by WeCloudData’s Big Data course student Udayan Maurya. This Live Twitter Sentiment Analyzer helps track…
by Student WeCloudData
May 26, 2020

Kick start your career transformation