How to avoid false starts and get ahead of everyone else
How to avoid false starts and get ahead of everyone else
Are you ready for machine learning? Do you have a plan? In this article, Atakan Cetinsoy from BigML goes over six things that every organization needs to be aware of when they’re devising their own machine learning strategy.
As of the end of 2017, the top five public companies of the world by market capitalization were Apple, Alphabet, Microsoft, Amazon, and Facebook — all digital natives. One of the common traits between these companies is the fact that they deal with 1s and 0s instead of tangible assets and they are in possession of vast amounts of already digitalized business and consumer data. Add on top large R&D budgets, access to elite academic talent, open-mindedness towards systematic experimentation and it is no surprise that, to date, they have been able to leverage machine learning to its fullest by launching numerous internal and end-user facing smart applications.
Having noticed these positive examples, by now, most business leaders in other industries have figured out that this machine learning thing really matters; and it ain’t going away anytime soon. So the waves of automation and data-driven decision making have recently started crushing on their shores as these businesses slowly but surely make headway with their digital transformation initiatives. This includes banks, insurance companies, manufacturers, healthcare organizations, consumer discretionary, and staples industries among others. But no matter where you are in the process of adopting machine learning, there is one critical thing you need to know if you want to avoid a mess of false starts and get ahead of everyone else.
The very idea of a platform, if implemented correctly, is that it is designed so that all aspects of the workflow are accounted for and work well together. The current experience of ML projects taking months and years to find their way to production is unacceptable and not scalable. To a large extent, they rely on building bespoke ML systems by letting expensive experts cobbling together open source components. And even then, once they finally have it working, they are not done because there is no easy way to put the models they have hand crafted into production.
Figure 1: What companies really need from ML.
To be clear, open source is not the problem itself. There are lots of great open-source tools. In fact, BigML relies on open source as well for certain aspects. The problem is that the open-source tools are typically focused on one thing, so you end up with a big puzzle of incompatible pieces that have to be glued together with custom code that likely won’t stand the test of time. And you get to write the glue.
With a MLaaS platform like BigML, you get powerful machine learning, a full API, visualizations that make exploration and rapid prototyping easy, and white-box models that can be quickly put into production. Everything is already put together, saving you all that time and any future headaches due to accumulating technical debt.
In a custom ML implementation scenario, the Pareto’s Principle implies that 80% of that system will be produced by just 20% of your team. So, in a team of five people, there is almost assuredly one critical employee. When that leaves, no one left on the team will fully know how to finish or maintain it. And guess what? That one person is in huge demand. Are you certain that you can keep them long enough to finish the project? For that matter, are you sure what they are building has been tested?
When you adopt a machine learning platform, you are joining a community of tens of thousands of users all over the world, who have built hundreds of millions of models. You can rest easy knowing that the platform has been thoroughly tested and has survived the ravages of real-world data in all its varieties.
Figure 2: Explore more ML!
If you decide to roll-your-own solution based on disparate open source components, how many of your employees are going to understand how to use it? The truth is machine learning is the modern spreadsheet for massive data, and most knowledge workers can benefit from it. This insight hasn’t escaped the attention of tech players like Facebook, Airbnb, LinkedIn, Google – all relying on ML for future innovation:
“In late 2014, we set out to redefine Machine Learning platforms at Facebook from the ground up and to put state-of-the-art algorithms in AI and ML at the fingertips of every Facebook engineer.”
That should be the vision of your company, as well — except, you don’t need to spend millions of dollars and years of research to build your own easy-to-use ML tools.
Speaking of everyone using machine learning, adopting a Machine Learning platform has another significant advantage: it makes it easy to collaborate. Resources, like models, can be shared with a secret link making it possible to send someone a URL that when clicked lets them interact with the model you built and then just as easily use it to make predictions. Commonly used resources, like a dataset, can also be shared in a gallery making it possible for a small team to curate data and then share it for everyone to use. In a private deployment, which allows your company to operate its MLaaS in a private cloud or even on-premises, these resources can be shared privately with everyone within your organization. This is especially conducive for better interpretability and risk management in the new era of data privacy regulations like the GDPR of European Union that goes into effect later in 2018.
There are often other steps that need to be performed, like transforming your data, filtering, augmenting with new features, etc. It is extremely rare that a real-world problem will be solvable without implementing a workflow composed of such steps. The good news is that these workflows are often reusable, running the same series of steps over and over with new data. The bad news is that if you are rolling out your own solution, then you are rebuilding these workflows every time. State of the art MLaaS platform also come with their own data transformation language, and workflow automation capabilities, that make it possible to separate the workflow logic from the data. The beauty of this is that these workflows can then be easily shared and reused, extending the functionality of the platform.
The importance of automation can’t be understated. Your data is not static; it will change, and you need to build a system that can adapt along with it. If refreshing your models requires re-inventing the wheel, taking weeks and months by the time you update the models, they may no longer be relevant! However, automating the entire workflow with MLaaS scripting capabilities, models can be rebuilt every day if needed. This is possible because MLaaS platforms such as BigML include APIs. In fact, the API is in some ways the core product. This means that every single action can be done programmatically with bindings in many languages to make it as easy as possible to get started programming.
This rounds up the conceptual reasons why MLaaS makes sense, but as the saying goes, the proof is in the pudding. On that front, BigML customers keep adding to the creative ways machine learning applications are deployed without expensive consultants and custom glue code. For instance, NDA Lynn recently launched its automated NDA checker service, to begin with, training their models on hundreds and then thousands of variations of Non-disclosure Agreements.
This collection of data produced interesting patterns that can serve as early warning signs for NDA Lynn customers looking to address any undue risks before agreeing to the terms stated in their NDA. This simple, narrow-AI example will likely find its way to many other types of contracts over time as digital data samples increase in size and the need to manage risks in a quantifiable way mounts in today’s ultra-competitive legal marketplace.
Hopefully, you can see the benefits of bringing a machine learning platform to your company. We find that this platform message resonates with people who are innovators, the doers in a company that wants actionable results, and more often than not, the people who are specifically tasked with evaluating new technologies for their company.
However, it’s still the early days of machine learning. Just like in the early days of e-commerce sites when everyone who wanted a shopping cart would hack together some CGI and HTML into a custom system. And the people that could code those monstrosities were in high demand and paid handsomely for their effort. Sound familiar?
But who does that now? Well, no one.
That’s because the entire process has been commoditized. And this is a good thing because those early days saw a lot of repetitive work and wasted time. The same thing is happening with machine learning right now. But it should be clear which choice is more important to the success of your company. This resistance will change eventually, but by then everyone will be using machine learning!