Building Data Science Portfolio

Inquire about Data Science programs
Become a Data Scientist

Contact our advisors now to learn more about our programs and courses. They are here to answer all your questions and help you embark on a successful journey.

"*" indicates required fields

Name*
This field is for validation purposes and should be left unchanged.

Career switchers who want to become a data scientist will have to have project experience, PERIOD.

There are a few exceptions to the bold statement we made above:

  • If you’re a CS or Stats graduate and can do well in Python, SQL challenges then you have a better chance of getting into the FANNG type of tech firms. But you will still need to present a strong data aptitude.
  • If you already work in the industry as a data analytics professional then your related work experience is more important than a project portfolio.

Unfortunately, most of the career switchers don’t come from CS or Stats background. In the previous chapter on data science job market we showed that the job market can be quite competitive and demanding. In order to stand out, one needs to show significant proof and efforts related to that. Ultimately, you’re proving to the hiring managers that you can get the job done.

Finding a good data scientist is not an easy task from a hiring company’s perspective. It’s unlike hiring a software developer where coding skill, data structures, and algorithms are some of the best indicators. Data Scientists need to have strong sense of data and business use cases as well and those are usually harder to test.

That’s why managers are increasingly welcoming the idea of a take-home quiz/assignment that the candidates need to complete. We’ve learned two things from it:

  • One needs to have worked on many hands-on projects in order to perform well in take-home assignments with a limited time limit usually ranging from a few hours to a few days)
  • Having a strong project portfolio can make you stand out and insert a lot of confidence in the hiring manager that you can get the job done

Data Science Portfolio

What is a portfolio? A portfolio is a collection of projects, code, document, and other things that can help you showcase your skills. These usually go beyond degrees and certifications and show practical skills that the candidates have obtained.

  • Personal Projects
    • Data pipeline diagrams
    • Project summaries
    • Working notebooks and scripts
    • Visualizations related to the project
  • Github
    • Code committed and pushed to your github branches that employers can look into
  • Blogs
    • Blog posts that document your learning journey and your views on the data science world. We suggest Medium as a blogging service or WordPress. You can also set up your own github pages.
Data Science Blogs on Medium
  • Presentation
    • Powerpoint decks and presentation videos that may help potential empoyers gain more insight into your soft skills

You can probably tell that preparing a portfolio takes a lot of effort. It’s not just about writing the code and train the ML models… save as data science projects in real life. It’s definitely worth the effort.

  • Would you hire a graphic designer without seeing her past work?
  • Would you hire a web developer without seeing the apps he could build?

If you want to differentiate, then start to create a portfolio.

How to Get Started with Data Science Portfolio

Here’s a brief introduction to the procedure one can follow to build a project:

Data Science portfolio project procedure
  1. To begin with, you want to pick a domain. Select something you’re interested and don’t follow other people on Kaggle. Employers often ask about your motivation and why you choose certain topics.
  2. Then decide the data you want to use for analysis. It will be great if you plan to collect data on your own through web scraping or web data APIs. It makes the project more complete and interesting. You also need to think about how to store the data and where to ingest the data which will add more engineering flavour into your project. Try to use a database if possible and write SQL queries to manipulate the data as this is how things are done most of the time in real life.
  3. Apply a few different ML algorithms in you’re building models. Make sure you build ML pipelines and spend effort on feature engineering and model interpretation.
  4. Lastly, make some conclusions and summarize things you’ve learned about this dataset and particular use case. Try to build a dashboard if possible and present your project to a friend, colleague, or your fellow students.

Here’re some projects WeCloudData’s graduates built during their bootcamp learning phase:

Real-time Twitter Sentiment Dashboard (pipeline built with streaming data on AWS and Spark)
Real-time Twitter Sentiment Dashboard (pipeline built with streaming data on AWS and Spark)

Stock price prediction pipeline using real-time Finance data and social media data
An end-to-end job recommendation web app built on top on AWS and docker containers. Pipelines automated using Apache Airflow.
An end-to-end job recommendation web app built on top on AWS and docker containers. Pipelines automated using Apache Airflow.

If you’re interested in getting started with portfolio projects but don’t know how to get started, do make sure that you check out WeCloudData’s portfolio project course. You will be working with experienced mentors, instructors, and TAs to learn how to build a portfolio.

Importance of real project experience

Nothing beats real experience! In this section, we will discuss why having real company project experience can help you stand out easily. Here’re some of the benefits of real client projects:

  • You will gain experience working with real companies and clients so your work can be articulated in interviews and helps you build more credibility
  • You will get more interviews because of the real experience
  • You will be able to negotiate for higher starting salary
  • For career switchers, real projects help you close the experience gap.

For career switchers coming from non-data and non-tech background, your past work experience doesn’t carry much weight and often times creates disadvantages. It’s pretty easy for hiring managers to make biased assumptions that you’re not a good fit without even looking into your DS skills and capabilities. And therefore, Candidate B below will have orders of magnitude more likelihood of getting interviews.

Working on real client projects means the work you do will carry more weight on your resume and help build more trust. Employers will have more confidence when they are deciding among different candidates. You will also learn things that you don’t have the opportunities to practice yourself:

  • Project scoping and requirements gathering
  • Business communications
  • Task prioritizations
  • Business/Data presentations
  • Associating your work with measurable business values

A pipeline diagram of one of the real projects WeCloudData students work on.

Want to work on real projects? WeCloudData has a few options for you:

In the next chapter, we will share some tips and tricks on job search.