Live Twitter Sentiment Analysis

The blog is posted by WeCloudData’s Big Data course student Udayan Maurya. This Live Twitter Sentiment Analyzer helps track present sentiment for a given track word. In this document, I will describe the work flow I followed to develop this SaaS app. Contents Data Pipeline Map Data Collection Preparing Data for Data Analysis Training the […]
From Web Scraping to Useful Data Frames — How to Scrape a Website

The blog is posted by WeCloudData’s Big Data course student Laurent Risser. Toronto is known for its crazy housing market. It’s getting harder and harder to find an affordable and convenient place. Searching for “How to find an apartment in Toronto” on Google leads to dozens of pages of advice, which is a pretty good indicator […]
An Introduction to Big Data & ML Pipeline in AWS

The blog is posted by WeCloudData’s Big Data course student Abhilash Mohapatra. This story represents an easy path for below items in AWS : Build an Big Data Pipeline for both Static and Streaming Data. Process Data in Apache Hadoop using Hive. Load processed data to Data Warehouse solution like Redshift and RDS like MySQL. […]
An Introduction to Data Pipeline with Spark in AWS

The blog is posted by WeCloudData’s Big Data course student Abhilash Mohapatra. This story represents an easy path to Transform Data using PySpark. Along with Transformation, Spark Memory Management is also taken care. Here Freddie-Mac Acquisition and Performance Data from year 1999–2018 is used to create a Single o/p file which can further be used for Data Analysis or Building Machine […]
Eric’s Career Switch Journey from Civil to Data
It has been approximately one year since I decided to make a career switch from Civil Engineering to the Data Science. After working as a Data Analyst at Slalom for 3 months, I think now would be a good time to share my experience. I will try to present this blog as 3 distinct parts: […]
Kijiji House Price Analysis using Python

This is the first project that I have done for WeCloudData. The purpose of this project is to find the relationship between housing prices in Toronto(GTA) in relation to location, house size, number of bedrooms and number of bathrooms. We start by scraping data from Kijiji through the URL requests. Then we parse our data source […]
Predictive Churn Modeling Using Python

This blog is posted by WeCloudData’s Data Science Bootcamp student Austin Jung. Customer churn is a common business problem in many industries. Losing customers is costly for any business, so identifying unhappy customers early on gives you a chance to offer them incentives to stay. In this post, I am going to talk about machine […]
Building Superset Dashboard and Pipeline using Apache Airflow and Google Cloud SQL

The blog is posted by WeCloudData’s Data Science Bootcamp student Ryan Kang. Like Amazon AWS, Google Cloud is a popular cloud used by data analytics companies. Google Cloud allows continuous automation of workflow and big data computation. In this blog, I will briefly introduce how I set up Google Cloud for workflow. Each Google Cloud account […]
Web Scraping – Fishing Ontario

The blog is posted by WeCloudData’s Data Science Bootcamp student Weichen Lu. Once, I was talking with my colleague about outdoor activities, and he told me that he is a fishing enthusiast. It didn’t bring up my attention at first since I am not a fishing guy. However, he proposed an idea to use Google […]
Visualizing New York City Taxi Data

[Student Project] Visualizing New York City Taxi Data This blog is created by WeCloudData’s Data Science Bootcamp alumni Yaoyu Cui. Please find the complete dashboard on https://goo.gl/gXGTEw Tableau has been one of the most popular visualization tools among the Data Science community. Besides its ability of data preprocessing and programming, it also provides powerful mapping […]
Credit Scoring Using Machine Learning

The credit score is a numeric expression measuring people’s creditworthiness. The banking usually utilizes it as a method to support the decision-making about credit applications. In this blog, I will talk about how to develop a standard scorecard with Python (Pandas, Sklearn), which is the most popular and simplest form for credit scoring, to measure […]
Fraud Analytics: ML Tutorial on Dealing with an Imbalanced Dataset

This blog is posted by WeCloudData’s Immersive Bootcamp student Anthony Chen. Fraud analytics provide a certain challenge that people may glance over at first. The problem of the imbalanced dataset. How do we approach it? What angle should we start at? What kind of performance measures do we use? The goal of this article is […]
Building Digital Marketing Dashboard Using Python, Docker, Airflow in Google Cloud (Part-2)

This blog series is posted by WeCloudData’s Data Science Immersive Bootcamp student Bob Huang (Linkedin) Continuing from the first half of the digital marketing blog post, This is Part 2 that mainly focusing on the data analysis business insights of different social platforms Email: There are lots of information in our emails. We can write codes […]
Building Digital Marketing Dashboard Using Python, Docker, Airflow in Google Cloud (Part 1)

This blog series is posted by WeCloudData’s Data Science Immersive Bootcamp student Bob Huang (Linkedin) OVERVIEW: The digital marketing project gives you the ability to manage and analyze your marketing data from different platforms such as Google Analytic, Gmail, Eventbrite, and Google Ad. You can find your emails based on their sent status, campaign, and […]
Interview with Shaohua Zhang, Data Scientist and CEO of WeCloudData – by Reena Shaw

This is a repost of Reena Shaw’s interview with our CEO published on Medium. Thanks, Reena (Linkedin Medium) for doing this interview! During my interviews with various data scientists, Shaohua Zhang is someone who struck me as unique for two reasons: 1) his incredible commitment and generosity to share his experience, and 2) his transition […]
Introduction to Machine Learning In Healthcare

Machine learning applications in healthcare was a great hit with the NYC audience. At least 130 enthusiastic attendees joined the Bots and AI Meetup on December 10th, with the crowd extending far to the back of the room. Lucy He of Flatiron Health kicked off the night with an examination of machine learning’s impact in medical study cohort selection. […]