This article was originally published by Forbes.com
The insights available through effective machine learning models can help businesses develop customer personas, clarify marketing efforts, improve the user experience for their products and services, and much more. However, none of that can happen if an ML model isn’t carefully and correctly set up from the beginning. A flawed ML model can produce misleading outputs that may lead to costly missteps.
Most of us are familiar with the concept of “garbage in, garbage out”—that is, we know it’s important to input quality data to get quality information out. But there’s more to developing a truly effective ML model than that. It’s important to understand what even the best ML models can—and can’t—do and to be aware that it’s far from a “fix it and forget it” process. Here, 16 members of Forbes Technology Council share essential steps in the creation and maintenance of an effective machine learning model.
1. Begin With Operational Leaders’ Insights
Operational leaders understand what’s working for their business. However, this business sense isn’t scalable or repeatable to drive impact. That’s where ML models excel. While creating models, companies should start with operational leaders’ insights and then build, test and deploy the model in production. However, the models must be built for actionability to make them fully effective. – Vasudeva Akula, VOZIQ
The most essential step in creating an effective machine learning model is to identify your problem and test your assumptions before jumping into data or algorithms. Understand the key decisions that will be made with the help of the machine learning model, as this can impact people and communities, and determine the constraints and trade-offs of the machine learning model. – Swathi Young, TechNotch Solutions
Narrow down and simplify the problem as much as possible. Models work best on issues where the boundaries are clearly defined and suitably small. Additionally, developers will have significantly fewer issues collecting, managing and feeding data to get the expected result. Machine learning models are black boxes already; our job should be to make them as transparent as possible. – Julius Černiauskas, Oxylabs
4. Define What ‘Effective’ Means
Put a clear definition in place of what “effective” means—essentially, a domain-specific Turing test for the model. Can it perform its specific task at a human level or better? General-purpose models are far into the future. For now, when building ML models, we are working to capture the expertise of the humans already within the business in a codified form to help productivity scale beyond the person-hours available. – Tishampati Dhar, Aerometrex LTD.
The most crucial component of an effective machine learning model is a unique and diverse dataset, combined with a unique model of reinforcement learning. The way data is collected and processed is very important in a world where machine learning is becoming widely accessible and where everyone can use a variety of generic datasets. –Andrey Ustyugov, Planner 5D
6. Ensure There’s High-Quality, Consistent Metadata And Reference Data
Models have become commodities, and one can “pip install” models and grab them off the shelf, placing state-of-the-art models just an API call away. The most important task of MLOps is to make high-quality data available and ensure there is consistency when it comes to the metadata and reference data. Systematic improvement of data quality on a basic model is better than chasing state-of-the-art models with low-quality data. – Amit Verma, Neuron7.ai
7. Be Selective About The Data Points Used
We live in an age where there’s never a lack of data—in fact, having too many data points is a common problem I’ve seen. This is where feature selection can help. Feature selection is the process of choosing which features will be used by the algorithm. This step is important because it can help improve your model’s performance by removing features that are not relevant to the task at hand. – Patrick Zhang, Protecht Inc.
8. Follow Test-Driven Development Methods
Test-driven development has been around for a while as a standard engineering discipline. ML models need the same approach; without it, the costs of correction could range from writing off development costs to reputational damage and suboptimal outcomes. This test-first approach will be the forcing function to start with the right data for both training and testing the model. – Ravi Nemalikanti, NCR
9. Establish A Model Validation Framework
The purpose of a machine learning model is to provide consistent, understandable and actionable value in production. An essential step to this is correctly simulating how a model will perform. Establishing a model validation framework that can test performance over historical data builds confidence that the right model was chosen and ensures high-quality results in production. – Tom Shea, OneStream Software
10. Understand Which Features Are Important To The Model
Ensuring a model isn’t biased or unfair is one of the hardest but most critical steps. In order to do this, you need to understand how the model works, which means you need to understand which features are important to the model. Tools such as Captum and Transformers-Interpret can help you understand how your model is working, giving you the insight to improve it over time and remove biases. – Pete Hanlon, Moneypenny
11. Deliver Regular Doses Of Quality Data
Algorithms in ML need regular doses of quality data. Effectiveness depends on the ability to constantly monitor this process. Selecting and delivering relevant data determines the quality of the machine learning model used by the organization. – Robert Strzelecki, TenderHut
12. Remember Random Resampling
The right level of sampling is an essential but not sufficient step toward building an effective ML model. Normal sampling may not detect rare events and is less effective around minority classes (such as high-value customers), which are many times more important to the business than the majority classes. Random resampling with deliberate over/under sampling helps in real-world ML applications, where imbalanced datasets prevail. – Pramod Konandur Prabhakar, Pelatro PLC
13. Include A Feedback Loop
It is essential to have a feedback loop mechanism built into the machine learning model. Machine learning is a journey, not a destination. Once the ML model is put to use, we will learn more about how it is performing, and the feedback loop ensures that the model is constantly updated and gets better. – Selva Pandian, DemandBlue
14. Test Outcomes Against Real-World Evidence
ML is limited by the defined variables and filters. As a result, the relevance of the output can be compromised by blind spots and biases. To ensure a more robust approach to methodology, it’s useful to periodically test the outcome against real-world evidence. This allows for adjustments to filters and variable hierarchies and/or reinterpretation of learnings. – Susan Lang, XIL Health, LLC
15. Determine When Predictions Can Be Considered Accurate
One key part of developing a machine learning model is understanding the limitations of the model so that you can define the conditions under which its predictions can and cannot be considered accurate. AI is a helpful tool, but it’s not a panacea, and too often businesses employ a model for use cases it wasn’t designed to handle but take the results as gospel, leading to misinformed business decisions. – Ethan Kellough, Highlight
16. Be Aware Of ML’s Limitations
Understanding the limitations of machine learning models is important. Solutions that depend solely on ML don’t generally solve the most complex problems; in fact, missing/conflicting data may even render these models unusable. Hybrid solutions that combine the best of ML with other AI approaches should be considered when dealing with real-world challenges faced by real humans. – AJ Abdallat, Beyond Limits