Befriend domain knowledge if you want to succeed in a world full of data

Mehul Batra
3 min readJan 19, 2021

Some of the Crucial aspects of Domain Knowledge, Data Specialist should keep in mind and should be keen to learn

The source problem, the business is trying to resolve: A Data Specialist needs to prioritize planning ahead with a supportable & logical business strategy, followed by the implementation. To give an example, constructing a Machine Learning model to predict the paths to follow for the fastest delivery without knowing the actual location where our warehouses are set up or going to set up and what are the hotspots for our business clients without knowing them getting answers from the model will be too generic and fruitless, adapting to the business sector & gaining the necessary knowledge of the domain by knowing the number of warehouses, the exact geo-location, expansion areas and what are the hotspots of our customers base will be more beneficial to the business overall, to build a suitable model which suits the requirements rather than just using the technical abilities to build the prediction algorithm right away using some powerful library Scikitlearn doing all the powerlifting. The key takeaway from this experience is the importance of domain knowledge to shape your decisions. It is relatively straightforward to apply a model but the true value comes in questioning your decisions and careful evaluation.

Set of specialized information held by the business: In most organizations, there’s a tremendous amount of legacy business information hidden in the company data. Without that domain knowledge, the information in the data is often missed, leading to data quality issues. Additionally, without domain and business expertise, it’s difficult to imagine the work of the data specialist aligning very well with the strategy of the business.

Domain-specific data collection understanding: Approx 1.7MB of data is being generated worldwide each second which accounts for 2.5 quintillion bytes of data per day. That’s a whole lot of data to harness & process. Understanding what portion, the how & when to process that chunk of data, is important. Not only would it reduce inefficiency in the business operations, but also, timelines & finances/costs are the biggest constraints for a business. Being able to trim down to just the bare minimum for the required analysis helps reduce costs & processing time as well and as we are advancing to the era of edge computing every millisecond of processing time matters.

Why data science team need to work with data engineers as a team, not just a data provider?

Data engineers must be able to work effectively in collaboration with data scientists and communicate results and recommendations to colleagues as we see the ML model is prepared by the Data Science team but …… productionize by the engineering team without proper knowledge transfer and understanding the model might land up in trouble with respect to metrics, performance, and optimizations. Most problems with data are team issues, They are not technical issues (at least not at the start). Technology usually gets blamed because it’s way easier to blame technology than to pick out the team itself. Until you solve your team issues, you won’t hit the really tough technical issues or create the value with the big data you set out to create.

With many organizations shifting from monolithic applications to microservices, I think a data engineering strategy becomes more and more important; without a data strategy and collaboration, a data scientist will be left to try and piece together information from a number of disconnected DBs.

What I learned from my mentor & which is gonna stay with me forever — “Teamwork divides the task and multiplies the success“