Learning how to become a Data Engineer is a worthwhile endeavor in this day and age due to the high demand for the skills. The great thing is this demand is set to increase so there’s still plenty of room for more Data Engineers.
The purpose of this Doc is just to provide all the relevant links you’ll need on the path to becoming a Certified Data Engineer in South Africa as well as a few pointers that helped me fast track the process.
Steps To Take
Step #1: Understand what a Google Cloud Platform (GCP) Data Engineer is.
Before spending your time and resources on pursuing the GCP Professional Data Engineer Certification, make sure that it’s the right role for what you want to do. The roles of a Data Engineer, Data Scientist and Data Analyst may overlap in some aspects but these are completely different roles with different expectations in terms of the value you bring to the team
Secondly, make sure that GCP is the platform you are interested in and not one of the other well recognized platforms like AWS or Azure. They have their own material that you need to look at.
Step #2: Get the necessary funds.
Once you are certain about the role, talk to a current or potential employer/sponsor about it and see if they would be willing to cover the cost associated with attaining the certification, otherwise just make sure you have enough funds to complete the process. The main costs are the exam registration, which is $200 at the time of writing, and enrolling for the online course on Coursera, which is $49 per month. At most it Should take 4 months to comfortably complete the course and have an understanding of the work, so all in all you need a budget of $400.
A quick run through the content
The exam covers 4 main topics, which are discussed in the Exam Guide. Below I’ll just give a brief description of what to expect from each topic, as well as a few tips when a have any.
Designing data processing systems
This topic is all about understanding the products on GCP so we can string them together to solve real world problems, while following best practices in a Proof of Concept (POC) environment.
Google does not expect you to memorize every little detail about each product. Just know a product well enough that if a client was explaining a problem to you, the right products for the job can start coming to mind immediately.
Tip: As you go through the course and get exposed to different products, start thinking about how you would use the products to solve some problems that people around you have.
Building and operationalizing data processing systems
It’s one thing to get everything to work in a POC environment and a completely different thing to get something into production. A Data Engineer is most valued for having the ability to do just that.
Being able to do this is something you gain with experience, but until then attention to detail and following Best Practices can get you a long way.
Operationalizing machine learning models.
There are thousands of Machine Learning models in existence, but only a few make it to production and it’s harder than it looks. Not for a Data Engineer though.
You’re not expected to build models, but after the Data Scientists have had their fun and developed a model, you’re the one who applies best practices to make it a usable product. This also involves understanding the products at a level that can only be reached with some hands on experience, so make use of the Qwiklab Labs to get this experience.
Tip: Not all the labs on Qwiklabs are free. However, you get a bit of credits when you sign up, so use them wisely and focus on the products that you’re finding to be the most difficult to understand.
Ensuring solution quality
Once we pass the hurdles of getting into production, we need to make sure that our systems and models are secure, scalable and reliable for all our users.
Google does a lot of the heavy lifting in this regard thanks to how their products are built. However, we’ll always get those few special cases where something extra is required. That’s when the knowledge of a Data Engineer comes in handy.
Next is a List of the products you need to know. I’m listing them by starting with the ones that you are most likely to encounter in the exam based on my experience.
Big Query, Vision AI, Text-to-speech, Speech-to-text, Cloud Natural Language API, Cloud Translation, Cloud Dataflow, Cloud Pubsub, Cloud Dataproc, Cloud Storage, Cloud Bigtable, Cloud Spanner, Cloud SQL, Cloud Datastore, Transfer Appliance, Virtual Private Cloud, Cloud Data transfer Service, Cloud Composer, Hybrid connectivity, Cloud load balancing, Cloud Memorystore, Cloud Armor
The approach I took during preparation was to, at the very least, read the overview for each product just to have an understanding of what it does on a high level. Only dig deeper into the documentation of the products that confuse you the most, otherwise the online course gives enough detail on each product to pass so listen carefully.
Wrapping it up
The fact that you’ve already shown interest in getting certified is a good indicator that something about this field resonates with you on a deep level. Trust that gut feeling and believe that you’re already a Data Engineer, because you are. You just need the piece of paper to prove it. So just do you throughout the preparation process and don’t wait until after the exam to start feeling like a Data Engineer.