Building Data Pipeline in AWS for Retail Data
The blog is posted by WeCloudData’s Data Engineering course student Rupal Bhatt. Here is a Donut Chart prepared from processed data. Our data passes through several processes before meeting a dashboard and giving us a full picture like the one above. This is an attempt to show you one way of processing such data. Big […]
Kijiji House Price Analysis using Python

This is the first project that I have done for WeCloudData. The purpose of this project is to find the relationship between housing prices in Toronto(GTA) in relation to location, house size, number of bedrooms and number of bathrooms. We start by scraping data from Kijiji through the URL requests. Then we parse our data source […]
Predictive Churn Modeling Using Python

This blog is posted by WeCloudData’s Data Science Bootcamp student Austin Jung. Customer churn is a common business problem in many industries. Losing customers is costly for any business, so identifying unhappy customers early on gives you a chance to offer them incentives to stay. In this post, I am going to talk about machine […]
Building Superset Dashboard and Pipeline using Apache Airflow and Google Cloud SQL

The blog is posted by WeCloudData’s Data Science Bootcamp student Ryan Kang. Like Amazon AWS, Google Cloud is a popular cloud used by data analytics companies. Google Cloud allows continuous automation of workflow and big data computation. In this blog, I will briefly introduce how I set up Google Cloud for workflow. Each Google Cloud account […]
Web Scraping – Fishing Ontario

The blog is posted by WeCloudData’s Data Science Bootcamp student Weichen Lu. Once, I was talking with my colleague about outdoor activities, and he told me that he is a fishing enthusiast. It didn’t bring up my attention at first since I am not a fishing guy. However, he proposed an idea to use Google […]
Visualizing New York City Taxi Data

[Student Project] Visualizing New York City Taxi Data This blog is created by WeCloudData’s Data Science Bootcamp alumni Yaoyu Cui. Please find the complete dashboard on https://goo.gl/gXGTEw Tableau has been one of the most popular visualization tools among the Data Science community. Besides its ability of data preprocessing and programming, it also provides powerful mapping […]
Credit Scoring Using Machine Learning

The credit score is a numeric expression measuring people’s creditworthiness. The banking usually utilizes it as a method to support the decision-making about credit applications. In this blog, I will talk about how to develop a standard scorecard with Python (Pandas, Sklearn), which is the most popular and simplest form for credit scoring, to measure […]
Fraud Analytics: ML Tutorial on Dealing with an Imbalanced Dataset

This blog is posted by WeCloudData’s Immersive Bootcamp student Anthony Chen. Fraud analytics provide a certain challenge that people may glance over at first. The problem of the imbalanced dataset. How do we approach it? What angle should we start at? What kind of performance measures do we use? The goal of this article is […]
Building Digital Marketing Dashboard Using Python, Docker, Airflow in Google Cloud (Part-2)

This blog series is posted by WeCloudData’s Data Science Immersive Bootcamp student Bob Huang (Linkedin) Continuing from the first half of the digital marketing blog post, This is Part 2 that mainly focusing on the data analysis business insights of different social platforms Email: There are lots of information in our emails. We can write codes […]
Building Digital Marketing Dashboard Using Python, Docker, Airflow in Google Cloud (Part 1)

This blog series is posted by WeCloudData’s Data Science Immersive Bootcamp student Bob Huang (Linkedin) OVERVIEW: The digital marketing project gives you the ability to manage and analyze your marketing data from different platforms such as Google Analytic, Gmail, Eventbrite, and Google Ad. You can find your emails based on their sent status, campaign, and […]
