Blog

Student Blog

Visualizing New York City Taxi Data

October 28, 2019

[Student Project] Visualizing New York City Taxi Data

This blog is created by WeCloudData’s Data Science Bootcamp alumni Yaoyu Cui.

Please find the complete dashboard on https://goo.gl/gXGTEw

Tableau has been one of the most popular visualization tools among the Data Science community. Besides its ability of data preprocessing and programming, it also provides powerful mapping functionalities. In this blog, a specific task was given regarding a specific New York Taxi company’s pickup data for the year of 2014. The task specifies the use of Python, SQL tools, local weather, and Tableau. To make it more interesting and to demonstrate the mapping functionality of Tableau, I found a Shapefile of New York City (link below).

https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-nynta.page

This is what it looks like in Tableau:

The task data contains four months of pickup locations (latitude and longitude), Date/Time, and Base, that’s it (see image below). A separate file of weather info was also provided including date, temperature, humidity, wind speed, precipitation, etc.

Weather Info WeCloudData.com

Data Preprocessing:
Before setting up the stage, we must ask what is the link between this data? Is there any useful information we can get out of it? To answer these questions, the data was broken down into more pieces as so:

Data Preprocessing weclouddata.com

The ‘week’ column represents the day of the week; note that the raw data of latitude and longitude was transformed into its neighborhood NTA name corresponding to NYC shapefile for a later purpose. The process was done in Python using a package called GeoPandas. The shapefile provided by NYC used an uncommon Coordinate Reference System (CRS). It took me quite a while to figure out the corresponding CRS code:

Data preprocessing pt2 weclouddata.com

The three files were then joined in Tableau, and more columns were generated using Tableau functions:

3 files have been merged weclouddata.com

Visualization:

The image below is the final outcome of the dashboard of Manhattan in April:

data visualization weclouddata.com

Note that Tableau provides many powerful interaction options. The dashboard was made out of three sheets, and the filter of one sheet will update on all sheets using the same data source. Tooltip of summary info will appear on hover. All the neighborhoods, days, and hours can work as a filter, and different filters can exist simultaneously (image below selecting rush hour of a certain day in a certain neighborhood):

data graph weclouddata.com

Data Analysis:

Now let’s talk about the data and what we have found (Tableau provides data summary on sheet level, but not on the dashboard):

The data contains 1.8 million pickups in three months, 81% of which are from Manhattan and 18.76% are from Manhattan Midtown South.

From April to June 2014, New York City had seven consecutive rainy days, each lasted about two days. Out of the seven rainy days, there were five obvious abnormal pickup peaks from Manhattan. Expect on May 10th and June 9th, the pickups show no increase at all.

Other than the weather factor, the most influential factor is the day of the week. The bottom pickups are always on Mondays, where the peaks are on Fridays and Saturdays. Regarding the hour, a local peak would appear during the morning rush hour, by 14:00 the pickups would already surpass the morning peak, by 17:00 rush hour, it would triple the morning peak, having about 3 pickups per min.

Conclusion:

Tableau is a convenient tool for tasks like data science/analytics; it works well with SQL database. Built-in data preprocessing and programming function saves a considerable amount of time on editing. Tableau performs very well with geological data and visualizations. On top of all, Tableau provides many audience-friendly interaction features.

To see Yao’s original blog post please click here. To follow and see Yao’s latest blog posts, please click here.

To find out more about the courses our students have taken to complete these projects and what you can learn from WeCloudData, click here to see our upcoming course schedule and course catalog.

SPEAK TO OUR ADVISOR
Join our programs and advance your career in Business IntelligenceData Science

"*" indicates required fields

Name*
This field is for validation purposes and should be left unchanged.
Other blogs you might like
Blog
Last month I wrote a blog called, ‘The Future of Data Science: Job Trends, Skills, and Technologies You Need…
by WeCloudData
November 9, 2023
Blog
TThe integration of Artificial Intelligence (AI) and Large Language Models (LLMs), into medical diagnosis healthcare is revolutionizing patient care….
by WeCloudData
January 30, 2025
Blog, Consulting
Machine learning has revolutionized email spam detection, offering sophisticated solutions to combat the continuous influx of unwanted emails. Deep…
by WeCloudData
March 12, 2025

Kick start your career transformation