Data is complex, intricate, and more often than not, a complete mess without the right tools to clean it up.
Meet Hao, from the Shopee Data Infrastructure (DI) Engineering team — the crew behind Shopee’s massive data platform, and the ones who ensure that the right data is delivered to the right people.
“Getting data to where it needs to be.”
Hao (H): The DI team’s objective in Shopee is simple, but cannot be understated. We ensure that data flows smoothly, efficiently and accurately to the teams that need them, via the platforms, systems and processes we have put in place.
Cleaning up data using the tools that we have developed in-house, is arguably the first and most important step in making sure we are using the heaps of data we’ve accumulated for the right purposes.
In our current age of digitalisation, the need for data is undeniable. With the right data insight, companies are able to identify trends and patterns, potentially detect issues and drive business decisions.
“We build the foundations for data flow.”
H: To be clear, we are not the Data Engineering (DE) team.
In Shopee, the DE team provides customised data solutions within the company to different users who might require our data in different forms.
On the other hand, a DI Engineer works on providing and improving the underlying infrastructure and systems that other teams, including DE, might use to help them fulfill their own data-centric requirements and goals.
We are constantly developing and building upon these data platforms and systems, to allow data to flow just a little faster, cheaper and more accurately for everyone in the company.
“A culture of growth.”
H: One of the greatest challenges we face, is the fact that many of our systems are highly interconnected. Oftentimes, when we have a performance issue with one of our systems, it is likely to be caused by another interlinked system.
Having a good understanding of all our systems and how they interact with each other is one of the most crucial strengths our team must build in order to overcome these issues. To do so, we conduct sharing sessions with the team to introduce these various systems, and help one another understand the intricacies and capabilities of each platform.
A culture of learning in something we prioritise in the DI team — we never blame each other for our mistakes, instead, we choose to learn from them.
“Open up to open source.”
H: In the Shopee DI team, we are very open to using a variety of popular open source technologies such as Kafka, Hadoop, Presto, HBase, Spark, Hive, and Druid to build our data platform.
By using open-source technologies, we are able to build on the expanse of tools we’ve been exposed to, and easily adapt to different challenges as efficiently as possible.
Working on Shopee’s data platform is a very unique opportunity, as there aren’t many places that are experiencing growth as rapid as Shopee is in the market. To be prepared for any challenges that may come our way in a hyper-growth environment, we are constantly exploring new systems and looking out for ways to improve our current infrastructure.
One such project which I’m immensely proud of is our new in-house, one-stop cluster management system for Shopee’s data resources and assets.
“Know your tools.”
H: We’re always looking for fresh faces to join the DI team. Having the technical knowledge on how to work and transform data using SQL, Scala, Java and Python is fundamental in the team.
However, what really sets a great candidate apart from a good candidate is their love and passion for working with technical challenges. You have to be excited about your work, and possess an innate curiosity to learn not only how to use complex tools, but how these tools work behind the scenes.
Here at the Shopee Engineering Team, we are passionate about solving real-world problems using the right technology.
If you love technology as much as we do, join our team today.
Keen to know what the Data Engineering team works on? Check out their feature here.
*All photos were taken before the implementation of COVID-19 Safety Measures.