Upon completing the course, you will be able to:
Use key components of Apache Hadoop: HDFS, MapReduce with streaming, Hive, and Spark to store and process gigantic amounts of data.
Import, clean, and query data using Spark SQL and Spark RDDS.
Use the Spark Machine Learning Library (MLlib) to conduct Machine Learning models.
Use Amazon Web Services (AWS) to deploy Hadoop and Spark clusters.
Use a cluster to process large datasets that cannot fit on your personal computer.
It is hard, but you can do it. Plus, we're here to help.
Chief Instructor and Co-founder
A self-trained data scientist and an expert in applied big data technologies, Shaohua has nine years of experience in applied data science and has built a reputation for building high-performance data science teams. He is currently a senior data scientist at Kik Interactive Inc., helping the billion-dollar Canadian tech unicorn grow its big data initiative. Prior to Kik, Shaohua built a high-performance data science team at BlackBerry that focused on building innovative data science solutions for marketing, CRM, and product teams. He specializes in user interest graph modelling, targeted advertising, scalable location intelligence, and large-scale recommendation engines for mobile personalization. He also collaborated with Ryerson’s Data Science Lab on several big data research projects. Shaohua also helped build the big data course at Ryerson University, where he trained more than 150 professionals on big data technologies such as Hadoop, Spark, and data sciences.