The main reason why data engineering can be a separate position and not merged into software engineering is that data engineers need to understand data. A good data engineer not only needs to understand the knowledge of systems and development, but also needs to have a deep understanding of data. Therefore, to be a good data engineer, you need to have the ability of data and various system codes at the same time. To become a data engineer, you need to learn a few things.
Data: A Data Engineer is supposed to know data modeling very well, know how to transform a business requirement into schemas and tables in a database or data warehouse. In addition, data engineers must have good SQL ability, the difficulty of using SQL for a data engineer is higher than that for Data Analysts and BI. Therefore, if you want to become a data engineer, learning SQL well is a must. In addition, data engineers must be sensitive to data. When you see a form of raw data, you should be able to quickly spot the patterns and find some of the problems. This is very important, because data engineers build their data engineering work on the basis of these raw data. This is also a feature that distinguishes data engineers from software engineers.
Programming: Data engineers also need strong coding skills, after all, we are using code and programs to complete projects. So it is necessary to learn various programming languages. The most important of these are SQL and Python. The importance of SQL has been mentioned earlier, while Python plays the same import role in Data Engineering.In a Data Engineering project, Data Engineers use Python to do data extraction, data transformation, oading, etc. In many cases, data engineers use both SQL and Python at the same time. When designing big data, data engineers will use some spark packages, such as Pyspark, SparkSQL, etc. Of course, more advanced big data development will also use Scala, Java, etc.
Cloud services: The current projects are mainly on cloud services, such as Aws, AZURE, GCP. Therefore, a data engineer must be proficient in using some cloud services, such as AWS, EC2, S3, Lambda, EMR and so on.
Databases: Various databases exist in this industry, including relational and NoSQL databases. As a data engineer, you need to know how to use these databases, how to query, download, and upload data from these databases.
Of course, data engineers need to have more knowledge than mentioned here, such as software engineering ability, documentation ability, etc. But these are the basic skills you should have to become a data engineer.