A wave of obsession for all-things machine learning (ML) has carried away over the technology and business communities — and society more approximately — in the last several years, and reasonably so; machine learning-enabled products and services can present myriad benefits to an organization — not least the ability to attach large swaths of data to create earlier tedious tasks more easy and well-organized.
A key lever in setting the foundation for a successful ML program is building a culture and an atmosphere that permits you to trial these efforts at scale: quickening the rate of scientific experimentation on the road to production and, eventually, to business value. The cloud is an integral part of these efforts, and it can permit teams to advance and organize well-governed, correct ML models to high-volume production environments. Beyond production deployments, a solid infrastructure covers the way for large-scale testing of models and frameworks, permits for greater exploration of the interactions of deep learning tools, and empowers teams to rapidly onboard new developers and guarantee that future model changes do not have masked effects.
Here, We’ll organize some tactical and procedural guidelines for setting the foundation to carry effectual machine learning to production across your enterprise over automated model integration/deployment (MI/MD).
High-Level Challenges & Production ML Concerns
Machine learning can be difficult enough in production environments when considering the requirement of addressing adversarial learning (a subfield of ML exploring its applications under hostile conditions) such as cybersecurity and money laundering. Adversarial attacks — from causative to exploratory — inspire your model to modify in response to carefully devised inputs, reducing efficacy.
In cybersecurity and other complex domains, decision boundaries often need robust context for human interpretation, and modern enterprises of any size create far more data than humans can examine. Even absent such adversarial concerns, user activity, network deployments, and the modest advances of technology effect data drift over time.
With this in mind, production ML concerns are almost world-wide. Data and model governance disturb all models, and retraining is a fact of life, so automating the production process is key for sustainable performance.
Common production concerns that must be solved for when building an ML foundation include:
- Model problems in production. Models need to be trained, updated, and deployed flawlessly, but issues can arise with dissimilar data sources, multiple model types in production (supervised/unsupervised), and multiple languages of application.
- Temporal drift. Data fluctuates over time.
- Context loss. Model developers fail to recall their reasoning over time.
- Technical debt. Known to be a problem in production learning environments. ML models are hard to fully understand by their creators, and this is even tougher for employees who are not ML experts. Automating this process can lessen technical debt.
The ideal system can address these overarching ML production considerations while also serving common adversarial concerns, including:
- Ancient data and model.
- Model monitoring and accuracy tracking over time
- Facility to work with distributed training systems
- Custom tests per model to authorize accuracy
- Disposition to production model servers
Model Management & Setting a Technical Foundation
While each organization differs, these are high-level considerations for effective model management:
- Ancient training data with fine-grained time controls
- Dispersed training functionality
- Capability to support multiple languages
- Robust testing and reporting support
- Model accuracy must be inferred easily
- Model feature-set, methodology, and code tracking
- Origin of data and definitions for internal data definitions
- Open Source tooling
- Custom retrain and loss functions on a cron-like basis to revive stale models
- Minimal influence on model developers and dedicated ML engineers
On the technical side, several tools/processes will be critical in meeting these requirements:
- A strong CI/CD server. For example, Jenkins has brilliant support, build, reporting, and command plugins for almost all use cases, and its dispersed functionality can be a future benefit.
- Flexible platform for cloud service deployment. AWS’s EC2, S3, and EMR are good instances.
- Git integration. This is significant when generating code is tagged against particular versions for production release artifacts.
- Model accuracy. Submit precision and test results to an external server, such as GRPC.
- Integration. Add model serving layer into streaming applications.