WeCloudData is one of the fastest growing Data & AI training companies in the world. Since 2016, WeCloudData has trained and helped thousands of students and clients level up their data skills and mature their data organizations. Understanding the job market is a central business need for many organizations and for all HR departments and recruiters. By leveraging data engineering techniques combined with a cloud toolchain, WeCloudData helped a client achieve a continuous flow of current job market data with analytical capabilities and dashboards to drive the business forward and stay competitive.
The client required current (but also historical) job data to help:
- Address stakeholder questions and communicate job market trends
- The leadership team make data-driven business decisions
- Match employees & clients with the most relevant jobs
With these business needs the client could not just perform manual ad hoc searches on job boards once a month or once a week. Furthermore, one cannot combine and aggregate data from publicly available job boards into custom graphs or dashboards. The client needed to build its own internal data pipeline with enough flexibility to meet the business requirements for a job market analysis platform & dashboard.
In order to meet the business requirements for a job market analysis platform & dashboard, WeCloudData helped the client leverage a suite of cloud platforms & tools to enable a data pipeline in multiple stages:
- Ingest job data from multiple sources and store the raw data in a cloud data lake
- Process the raw data with Python & Spark
- Load the intermediate and final data sets in a data lake, Postgres database and Redshift
- Push the data to downstream analytics, BI and dashboard applications
This architecture is flexible enough to ingest from a variety of data sources and allow different business units to use the analytics, BI or dashboard tool of their choice to pull, aggregate and query the jobs data daily or query historical snapshots of the entire database.
WeCloudData helped a client build a flexible data pipeline to address the needs from multiple business units requiring different sets, views and timelines of job market data. The team was able to achieve this by leveraging cloud as well as open source tools in a modular set up, taking advantage of relatively cheap cloud storage, a versatile programming language in Python and Spark’s powerful processing engine. The client intends to build on and improve this data pipeline by moving towards a more serverless architecture and adding DevOps tools & workflows.