Blog

Student Blog

Visualizing New York City Taxi Data

October 28, 2019

[Student Project] Visualizing New York City Taxi Data

This blog is created by WeCloudData’s Data Science Bootcamp alumni Yaoyu Cui.

Please find the complete dashboard on https://goo.gl/gXGTEw

Tableau has been one of the most popular visualization tools among the Data Science community. Besides its ability of data preprocessing and programming, it also provides powerful mapping functionalities. In this blog, a specific task was given regarding a specific New York Taxi company’s pickup data for the year of 2014. The task specifies the use of Python, SQL tools, local weather, and Tableau. To make it more interesting and to demonstrate the mapping functionality of Tableau, I found a Shapefile of New York City (link below).

https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-nynta.page

This is what it looks like in Tableau:

The task data contains four months of pickup locations (latitude and longitude), Date/Time, and Base, that’s it (see image below). A separate file of weather info was also provided including date, temperature, humidity, wind speed, precipitation, etc.

Data Preprocessing:
Before setting up the stage, we must ask what is the link between this data? Is there any useful information we can get out of it? To answer these questions, the data was broken down into more pieces as so:

The ‘week’ column represents the day of the week; note that the raw data of latitude and longitude was transformed into its neighborhood NTA name corresponding to NYC shapefile for a later purpose. The process was done in Python using a package called GeoPandas. The shapefile provided by NYC used an uncommon Coordinate Reference System (CRS). It took me quite a while to figure out the corresponding CRS code:

The three files were then joined in Tableau, and more columns were generated using Tableau functions:

Visualization:

The image below is the final outcome of the dashboard of Manhattan in April:

Note that Tableau provides many powerful interaction options. The dashboard was made out of three sheets, and the filter of one sheet will update on all sheets using the same data source. Tooltip of summary info will appear on hover. All the neighborhoods, days, and hours can work as a filter, and different filters can exist simultaneously (image below selecting rush hour of a certain day in a certain neighborhood):

Data Analysis:

Now let’s talk about the data and what we have found (Tableau provides data summary on sheet level, but not on the dashboard):

The data contains 1.8 million pickups in three months, 81% of which are from Manhattan and 18.76% are from Manhattan Midtown South.

From April to June 2014, New York City had seven consecutive rainy days, each lasted about two days. Out of the seven rainy days, there were five obvious abnormal pickup peaks from Manhattan. Expect on May 10th and June 9th, the pickups show no increase at all.

Other than the weather factor, the most influential factor is the day of the week. The bottom pickups are always on Mondays, where the peaks are on Fridays and Saturdays. Regarding the hour, a local peak would appear during the morning rush hour, by 14:00 the pickups would already surpass the morning peak, by 17:00 rush hour, it would triple the morning peak, having about 3 pickups per min.

Conclusion:

Tableau is a convenient tool for tasks like data science/analytics; it works well with SQL database. Built-in data preprocessing and programming function saves a considerable amount of time on editing. Tableau performs very well with geological data and visualizations. On top of all, Tableau provides many audience-friendly interaction features.

To see Yao’s original blog post please click here. To follow and see Yao’s latest blog posts, please click here.

To find out more about the courses our students have taken to complete these projects and what you can learn from WeCloudData, click here to see our upcoming course schedule.

SPEAK TO OUR ADVISOR
Join our programs and advance your career in Business IntelligenceData Science

"*" indicates required fields

Name*
This field is for validation purposes and should be left unchanged.
Other blogs you might like
Student Blog
This blog is posted by WeCloudData’s Data Science Bootcamp student Austin Jung. Customer churn is a common business problem…
by Student WeCloudData
October 28, 2019
Consulting
Client Info Our client is one of Canada’s largest construction vehicle suppliers. They employ thousands of skilled technicians across…
by Beam Data
October 19, 2021
Student Blog
The blog is posted by WeCloudData’s student Amany Abdelhalim. In this article, I am illustrating how to collect tweets…
by Student WeCloudData
June 23, 2020

Kick start your career transformation

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.