Why I believe in an artisanal approach to data science
Neural networks and Deep Learning - some of the most advanced machine learning techniques - emulate the way the human brain works. What eludes many, however, is that these approaches emulate the fast, effortless pattern recognition that is the domain of the human subconscious as well as of animals. What puts humans apart from animals and arguably has enabled the intellectual achievements of humankind including science and art is the logical, conscious thinking. There is no doubt that the subconscious - and its pattern-recognition machine - accounts for most of the thousands of decisions a human being makes every day. However, just as the CEO of a company drives the strategy and therefore is the highest-paid employee, the subconscious escalates decisions where its machinery appears inadequate to the conscious.
The development of predictive models and other algorithms is very similar. While most aspects of it can be automated by machine learning and other techniques, the data scientist plays a crucial function in supervising the working of the machine and intervening whenever a novel problem is encountered or the automatically generated algorithms are flawed. The best models are built when machines only do low-risk grunt word but the data scientist keeps firm control of the overall mechanics of the model and the features used to predict outcomes. This heavy involvement of the data scientist is what I call an artisanal approach.
For example, when building a model for a bank that estimates recovery rates on defaulted debt (a so-called LGD model), I often encounter extreme data scarcity for some collateral types (in a given year, banks auction off very few satellites or dairy factories that had been pledged as collateral by their defaulted customers) while other collateral types may have plenty of data in other domains (e.g., there is a liquid market in second-hand forklifts, even if my banking client is not a very active participant). An artisanal approach can combine logic-driven structural approaches (e.g., an expert-driven estimate for recovery on rare collateral types) with analytics wherever suitable data is available (e.g., build a sophisticated model to estimate resale-value of forklifts based on externally sourced market data). Such approaches have helped my clients to rapidly grow in and even enter completely new markets (from micro-lending in Southeast Asia to digital auto insurance in Latin America) with highly predictive models even though neither machine learning nor traditional statistical approaches would have found enough data.
If you want to know more about my artisanal approach to developing algorithms and decision models, you may want to read my book on Algorithmic Bias
- which in the introduction gives an overview over my approach and throughout the book introduces specific practices and tools that put my concept of "artisanal" into practice - and read up on my latest thinking about building analytical solutions for Risk Management.