Having a solid foundation for real-world ML is the main cause of success for new initiatives, and is a stimulating area of research and engineering in its own right, but the execution of ML can even be challenging for companies with mature engineering strength, and it goes without saying that there can be drawbacks and misconceptions in attempts to create the jump among machine learning research and ML in production environments. A regularly overshadowed and often under-appreciated feature of getting it right is the infrastructure that allows robust, well-managed research and serves customers in production applications.
The Benefits & Practice
It’s critical to make sure the proper protocols and practices are well-known to continue reaping the benefits of a well-designed ML foundation.
One area is perfect governance. This covers everything from ethical distresses to regulatory requirements. You should aim to create the governance process as smoothly as possible. Likewise, historical tracking is another key component here and helps assuage temporal drift. Model tracking over time is hard, and wants fine-grained temporal data; a distributed model logging framework, such as an internal tool we made called Rubicon, can help possess track of your model training, testing, and deployment history.
With historical tracking, retrain and loss thresholds are user-provided and are used to repeatedly refresh models over time. In turn, this leads to more unified model reproducibility — the instant ability to produce historical models for validation against current data conditions — and a strong understanding of where drift has happened along with the areas it has exaggerated. Also, practicing journaled knowledge retention mitigates context loss (how many of us have returned to even simple projects and asked “what is going on here?!”), and guarantees that even though models are being restrained and published automatically based on time, changes to the underlying code and simple updates are easily identified.
The burden on model developers here is three-fold:
Explicit test creation is required. This needs configurations for variables like the time period of data for training and hyper-parameter selection. It’s not planned to stop human error but is a remunerated cost over time.
Success/Accuracy definitions must be defined in advance. This is a matter of range for model variance over time, well-defined as a compromise between business requirements and technical limitations.
Knowledge of the language of model implementation required. While this is a technical hurdle, it lets for very permissive test definitions.
Finally, to democratize machine learning and fully harness its potential, organizations must be capable of creating experiments repeatedly and automatically verified. Setting up this foundational environment is what permits us to improve brilliant algorithms that deliver value at scale and I hope it can be used as a guide for your organization, too.