Program  

Courses
Location
Corporate
Our Students
Resources
Bootcamp Programs
Short Courses
Portfolio Courses
Bootcamp Programs

Launch your career in Data and AI through our bootcamp programs

  • Industry-leading curriculum
  • Real portfolio/industry projects
  • Career support program
  • Both Full-time & Part-time options.
Data Science & Big Data
Data Engineering

Become a data analyst through building hands-on data/business use cases

Become an AI/ML engineer by getting specialized in deep learning, computer vision, NLP, and MLOps

Become a DevOps Engineer by learning AWS, Docker, Kubernetes, IaaS, IaC (Terraform), and CI/CD

Short Courses

Improve your data & AI skills through self-paced and instructor-led courses

  • Industry-leading curriculum
  • Portfolio projects
  • Part-time flexible schedule
AI ENGINEERING
Portfolio Courses

Learn to build impressive data/AI portfolio projects that get you hired

  • Portfolio project workshops
  • Work on real industry data & AI project
  • Job readiness assessment
  • Career support & job referrals

Build data strategies and solve ML challenges for real clients

Help real clients build BI dashboard and tell data stories

Build end to end data pipelines in the cloud for real clients

Location

Choose to learn at your comfort home or at one of our campuses

Corporate Partners

We’ve partnered with many companies on corporate upskilling, branding events, talent acquisition, as well as consulting services.

AI/Data Transformations with our customized and proven curriculum

Do you need expert help on data strategies and project implementations? 

Hire Data, AI, and Engineering talents from WeCloudData

Our Students

Meet our amazing alumni working in the Data industry

Read our students’ stories on how WeCloudData have transformed their career

Resources

Check out our events and blog posts to learn and connect with like-minded professionals working in the industry

Let’s get together and enjoy the fun from treasure hunting in massive real-world datasets

Read blogs and updates from our community and alumni

Explore different Data Science career paths and how to get started

Blog

Consulting

Consulting Case Study: Real-time Data Streaming Pipeline Optimization

October 19, 2021

Background

Our client is providing advanced agriculture tools and digital information to farmers to become more profitable. The company utilizes sensor solutions and provides real-time and actionable insights. It also provides farmers with the power to control their operating costs. Their product is a solution that saves farms over $20,000 annually by improving energy efficiency and reducing machine maintenance through predictive analytics.

The main service that WeCloudData team provided to them was on these two parts:

  1. Comprehensive data streaming pipeline optimization
  2. Real-time data visualization using Quickset

The new proposed pipeline turns out to be way more efficient and functional in terms of the massive amount of data collection, visualization, and in-time notifications in communicating with end users.

Problem Statement

The Client uses AWS as the main cloud provider. They use Kinesis Firehose and AWS Lambda to transform and store the data the devices collect. The data is served to the client’s app via RDS and Dynamo DB. The app provides some time-series analytics, energy consumption and cost associated with it.

However, with the pressure of increasing amount of real-time data collection and its in-time analysis, the client wanted to update the pipeline infrastructure to make it more robust, reliable, and scalable. The current pipeline randomly breaks, takes a long time to process data for frontend users, DynamoDB has a rate limit. A few changes were proposed to the client by WeCloudData to improve the pipeline reliability and scalability.

Tools used: AWS (IoT Core, Kinesis Data Firehose, Kinesis data Analytics, S3, Lambda, DynamoDB, API Gateway, SNS, Athena, Quickset)

Challenges

The current pipeline is quite sophisticated and took some time to understand the data get transformed and consumed by the end-users. The infrastructure has a loosely coupled structure that needed a detailed overview and complete understanding in the entire data flow.

Original way of data collection and storage

During the overview of the pipeline, a few flaws were discovered and patched immediately. A few pipeline design changes were proposed by WeCloudData team to improve reliability and reduce the cost of the infrastructure.

Key results

We discovered that there were a few glue crawlers running every hour on buckets related to some devices. These crawlers contributed to the extensive infrastructure cost increase. It was recommended to pause the crawlers and enable glue metadata registry on the Kinesis level. This approach significantly saves the time and amount of idle tasks and makes it less reliable on glue crawlers.

We proposed a few designed changes, one of the most suitable method is to use Athena and QuickSight for data analytics and data visualization (See appendix for dashboard that WeCloudData team created).

Proposed pipeline to address issues with DynamoDB and provide visualization to end-users.

The team also add one step of pre-aggregating the data per minute base instead of saving each data point (per second base) from each device using Kinesis Analytics. This should result in less intensive computation of some statistics and prove cost-saving benefits. We also recommended differentiating devices per type which allows streamlining the process of deploying new devices.

In addition to aggregation, we deployed a pre-trained anomaly detection model provided by Amazon that is built on Random Cut Forest algorithm. This extra functionally will output an anomaly score for each appliance that the device is connected to, the lambda function checks for any abnormal score and notify the user via text message using SNS service.

Prototype pipeline to aggregate data and detect anomalies

Conclusion

The proposed ideas on the data infrastructure has been test out to significantly reduce the cost of infrastructure and make the pipeline more resilient. By taking this opportunity, WeCloudData gained the consulting experiences in smart agriculture industry which implement the application of IoT solutions.

Appendix: Dashboard of the real-time voltage usage (demo)

The client has a subscription-based app targeted at farmers where they can loginand visualize key information related to their energy expenditure. This information is collected by sensors provided by the client and store in AWS S3 and AWS DynamoDB.

The client requested the creation of QuickSight Dashboards templates which could provide valuable KPIs and metrics to their customers about their energy expenditure. Among these metrics, the client mentioned during our first contact it would be nice to have predictions and forecasts included.

Other blogs you might like
Student Blog
The blog is posted by WeCloudData’s student Luis Vieira. I will be showing how to build a real-time dashboard on…
by Student WeCloudData
October 21, 2020
Uncategorized
Take a central role The Bank of Canada has a vision to be “a leading central bank—dynamic, engaged and…
by Shaohua Zhang
May 21, 2020
Uncategorized
Big Data for Data Scientists – Info Session from WeCloudData…
by WeCloudData
November 9, 2019
Previous
Next

Kick start your career transformation

WeCloudData

WeCloudData is the leading data science and AI academy. Our blended learning courses have helped thousands of learners and many enterprises make successful leaps in their data journeys.

Sign up for newsletter
This field is for validation purposes and should be left unchanged.