Preis: Free for subscribers
Erhältlich ab: April 2018
With all the hype around machine learning, there’s plenty of people asking what it is exactly. If you want a quick primer on what’s important, read this.
The main goal of machine learning is to study, engineer, and improve mathematical models which can be trained with context-related data (provided by a generic environment), to infer the future and to make decisions without complete knowledge of all influencing elements (external factors). In other words, an agent (which is a software entity that receives information from an environment, picks the best action to reach a specific goal, and observes the results of it) adopts a statistical learning approach, trying to determine the right probability distributions and use them to compute the action (value or decision) that is most likely to be successful (with the least error)
I prefer using the term inference instead of prediction. This only avoids the weird (but not uncommon) idea that machine learning is a sort of modern magic. Moreover, it’s possible to introduce a fundamental statement: an algorithm can extrapolate general laws and learn their structure with relatively high precision only if they affect the actual data. So the term prediction can be freely used, but with the same meaning adopted in physics or system theory. Even in the most complex scenarios, such as image classification with convolutional neural networks, every piece of information (geometry, color, peculiar features, contrast, and so on) is already present in the data and the model has to be flexible enough to extract and learn it permanently.
Let’s take a look at a number of different machine learning models, beginning with supervised learning.
In supervised learning, algorithms run on a training set that is made up of input and expected output. Starting from this information, the agent will gradually correct its parameters which in turn reduces the magnitude of a global loss function. After each iteration, if the algorithm is flexible enough and data elements are coherent, the overall accuracy increases and the difference between the predicted and expected value becomes close to zero. Of course, in a supervised scenario, the goal is training a system that must also work with samples it has never seen before. This makes it necessary to allow the model to develop a generalization ability and avoid a common problem called overfitting, which causes an over-learning due to an excessive capacity (we’re going to discuss this in more detail in the next chapters, however we can say that one of the main effects of such a problem is the ability to predict correctly only the samples used for training, while the error for the remaining ones is always very high).
Examples of machine learning applications include:
Unsupervised learning is, as the name suggests, based on the absence of any supervisor and therefore of absolute error measures. It’s useful when it’s necessary to learn how a set of elements can be grouped (clustered) according to their similarity (or distance measure).
There are a number of different ways unsupervised learning is applied in the world, such as:
SEE ALSO: Machine learning and data sovereignty in the age of GDPR
Even if there are no actual supervisors, reinforcement learning is also based on feedback provided by the environment. However, in this case, the information is more qualitative and doesn’t help the agent in determining a precise measure of its error. In reinforcement learning, this feedback is usually called reward (sometimes, a negative one is defined as a penalty) and it’s useful to understand whether a certain action performed in a state is positive or not. The sequence of most useful actions is a policy that the agent has to learn, so to be able to make always the best decision in terms of the highest immediate and cumulative reward. In other words, an action can also be imperfect, but in terms of a global policy it has to offer the highest total reward. This concept is based on the idea that a rational agent always pursues the objectives that can increase his/her wealth. The ability to see over a distant horizon is a distinction mark for advanced agents, while short-sighted ones are often unable to correctly evaluate the consequences of their immediate actions and so their strategies are always sub-optimal.
Reinforcement learning is particularly efficient when the environment is not completely deterministic, when it’s often very dynamic, and when it’s impossible to have a precise error measure. During the last few years, many classical algorithms have been applied to deep neural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state (usually a screenshot or a memory dump).
This piece has been taken from Machine Learning Algorithms.
Learn more about Machine Learning – explore Packt’s Machine Learning eBooks and Videos.
This article is part of the latest JAX Magazine issue – “The state of Machine Learning“:
Machine learning is all the rage these days and if your company is not in the conversation, perhaps you want to hear how trivago, BigML, Studio.ML, Udacity, AWS and Skymind put this technology to good use.
We recently talked about the importance of ethics education among developers so now it’s time to have the “how to develop ML responsibly” talk with you. Last but not least, if you thought machine learning and DevOps don’t mix well, think again.
I wish you a happy reading!
Want to learn machine learning? Or do you need to brush up on basic concepts? Today, we go over two essential reference materials for anyone just starting out on their machine learning adventure: the machine learning glossary and the rules of ML.
Things can be confusing if you’re just starting out on your machine learning journey. ML might be on the bleeding edge, but it can be hard for developers in different fields to catch up. However, the rewards for doing so are fairly substantial: we talk all the time about how well ML specialists are compensated for their skills.
So, what’s a developer to do if they want to level up their ML credentials? While you can always take a course or a boot camp, those can be expensive. We already went over some of the great open source options for machine learning, artificial intelligence, and more that are all available online. There’s a whole internet full of open source tools for machine learning, including OpenAI and TensorFlow.
Today, we’re taking a look at two useful tools from the Google Developers team: the Rules of ML and the Machine Learning Glossary. I highly recommend reading the whole rule book by Martin Zinkevich; it’s an incredible resource for anyone working on machine learning, whether they’re a beginner or just brushing up on their ML skills.
Machine learning is a pretty new discipline, so there really aren’t a whole lot of hard and fast rules. However, there are an awful lot of guidelines and helpful generalizations to follow.
“Do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.”
Making things work in machine learning has a lot to do with engineering and less to do with algorithms. That’s not to say ML algorithms aren’t necessary and useful, it’s just that many of the problems that you as a developer will face will be solvable with a background in engineering or computer science.
Martin Zinkevich has a very basic approach to all ML problems:
Following this general approach covers a lot of ground. Increasing complexity means you’re throwing up future roadblocks. Remember the golden rule of all development projects – keep it simple, stupid.
Not to be outdone by a set of simple guidelines, they also give budding ML specialists three simple rules to follow before they even start out with machine learning. The rules carry on well into developing your first pipeline, feature engineering, and refining complex models, but we’re only going to focus on the foundation today.
Rule #1: Don’t be afraid to launch a product without machine learning.
Do you need machine learning? Do you really, really need it? Sure, ML is super cool and extremely topical in tech right now, but don’t let it become a solution in search of a problem. ML has very defined parameters of success; it might not work out for what your project needs.
Besides, by definition, ML needs an awful lot of data. You might not have access to the right sort of datasets, or even access to any datasets.
Rule #2: Design and implement metrics.
Metrics are important. Without any kind of measuring stick, how can you tell if your project is working? How can you determine if there are any problems?
This is where data collection comes into play. When you’re designing a project, see if there are ways to gather data from the start, if only because it’s easier to get permission from users from the get-go. Having a wealth of historical data makes it easier to prove if that one initiative or tweak to the system actually did anything.
Now would also be a good time to invest in a decent storage system for all that data you’ll be collecting.
Rule #3: Choose machine learning over complex heuristics.
A heuristic is the way any approach to problem solving. So, simple heuristics are easy to implement; complex ones less so. Machine learning is easier to update than a complex heuristic.
Not to be outdone, the Google Development team has also released a comprehensive Machine Learning Glossary. Terminology in technology is complex; let’s simplify things with a very helpful reference sheet that clearly explains what we mean by cross-entropy, one-hot encoding, or a softmax.
Frankly, I find this to be incredibly useful, if only because there are a lot of overlapping terms in computer science. Clarity is crucial to writing clean code. Writing clean code isn’t just efficient; it helps out future developers who follow in your footsteps.
Machine learning may be difficult, but there are a lot of options out there to make it easier for anyone just starting out. These tools from the Google Development team are incredibly useful for beginners as well as anyone looking to brush up on their ML skills.
Remember, keep it simple and good machine learning comes from good engineers. You can do it!
Every year, Stack Overflow surveys the state of the developer community. What trends, tools, and technologies did they find? Julia Silge, a data scientist at Stack Overflow, dives deep into the data to show the most loved technologies of 2018.
As a data scientist at Stack Overflow, I use machine learning in my day-to-day work to make our community the best place possible for developers to learn, share, and grow their careers. It has been amazing to see the increasing interest and investment of the software industry as a whole in machine learning over the past few years. Skilled data scientists can use analysis and predictive modeling to help decision makers understand where they are and where they can go. In March, Stack Overflow released its 2018 Developer Survey results, the eighth year we have surveyed the developer community; this year we had over 100,000 qualified respondents and it was clear that machine learning in software development is an important trend that’s here to stay. But what are the key tools and technologies to watch?
Our survey was about 30 minutes long and covered a diverse range of topics, from demographics to job priorities, but a large section focused on technology choices. We asked respondents what technologies they have done extensive development work in over the past year, and which they want to work with over the next year. We can understand how popular a technology is with this kind of question, but by combining the questions, we can understand how loved or dreaded a technology is, in the sense of what proportion of developers that currently work with a technology do or do not want to continue to do so.
The most loved technology this year among our list of frameworks, libraries, and tools is TensorFlow, a machine learning library released as open source by Google in 2015. TensorFlow won out this year over beloved, popular web frameworks like React and Node.js, last year’s winners. We didn’t ask about TensorFlow on last year’s survey because it had just started to gain wide popularity. TensorFlow’s emergence onto the scene has been so dramatic that it exhibits one of the highest year-over-year growth rates ever in questions asked on Stack Overflow.
SEE ALSO: Find the outlier: Detecting sales fraud with machine learning
TensorFlow is typically used for deep learning (a specific kind of machine learning usually based on neural networks) and its status in our survey is a demonstration of the rise of tools for machine learning. Notice that PyTorch is the third most loved framework; PyTorch is another open source deep learning framework, but one developed and released by researchers from Facebook.
As a data scientist at Stack Overflow, I spend a lot of time thinking about how technologies are related to each other, and we can specifically think about that in the context of machine learning technologies on the Developer Survey this year. If we look at all the technologies we asked about on the survey, from languages to databases to IDEs to platforms, which were used most often in the context of machine learning? For example, which technologies are most highly correlated with TensorFlow?
Here we see which technologies are most likely to be used by a developer who also uses TensorFlow, compared to those who do not. The most highly correlated technology is Torch/PyTorch; this is interesting because it is effectively a competing framework. Next comes the popular Jupyter Notebook IDE that is used by many data scientists and then the two big language players when it comes to machine learning, Python and R. Python have a larger user base, but I personally am an R developer. Most developers interact with TensorFlow via the Python API, but R has excellent support for TensorFlow as well. The other technologies here include other IDEs that focus on data science and/or Python work, like RStudio and PyCharm, and big data technologies such as Apache Spark, Apache Hadoop, and Google BigQuery.
SEE ALSO: A basic introduction to Machine Learning
Python is the programming language most correlated with TensorFlow, and in fact, Python has a solid claim to being the fastest-growing major programming language. This year Python again climbed in the popularity ranks on our survey, passing C# this year much like it surpassed PHP last year. Software developers are aware of this, and this year we found that Python was the most wanted language, meaning that out of developers who are not working with each technology, the highest percentage want to start this coming year.
This plot shows the top 15 languages that were most wanted. What languages were least wanted this year? We find that VBA, Delphi/Object Pascal, Cobol, and Visual Basic 6 are the least attractive today. These languages certainly lack the name recognition of Python, but more substantively, they do not have the large, vibrant communities working on modern problems like machine learning. June 2017 was the first month that Python was the most visited tag on Stack Overflow in high-income countries like the United States and the United Kingdom. We find that the incredible growth of Python is driven largely by data science and machine learning, rather than web development or systems administration.
Machine learning offers organizations the opportunity to use their data to make good decisions; our own data at Stack Overflow demonstrates that our industry is embracing this possibility and that the use of machine learning is on the rise. If you are a developer interested in machine learning, TensorFlow, and deep learning are likely not the best place to start. Instead, focus on gaining statistical competency and putting it into practice in your daily work.
The future of digital technology is here. 2017 saw incredible progression for things like data science, artificial intelligence, and machine learning. Where will they go in 2018? In this article, Maria Thomas explores the future of data science and how well it can be combined with predictive analysis.
With the onset of digital technology, data is developing aggressively. Data science is one such concept which is emerging at a rapid scale. It is an integrative blend of scientific methods, designs, and systems to grasp knowledge from data. As a result, qualitative data is overlapping quantitative data gradually. On the other hand, Machine Learning is a composition of algorithms that focus on a data set to make forecasts or implement actions in order to enhance some systems. Once these algorithms are automated with no human interruption and only mechanized control, these kinds of algorithms are more popularly known as artificial intelligence (AI).
In 2017, we witnessed data science making a pathway for AI and machine learning at center stage of the technology cycle. Artificial Intelligence and machine learning (ML) had been trending topics for the whole year. AI applications have been increasingly used in numerous industries, including security, financial services, entertainment, automobiles, and more. There was a drastic growth in a variety of platforms like cloud machine learning and data science platforms.
SEE ALSO: A basic introduction to Machine Learning
In 2018, AI is gaining a momentum over various process developments. More practitioners are rising to the challenge of implementing the positive benefits of AI to the dubious lot. This is the year where we are likely to see a new focus of big data, AI, and ML on various areas like customer service, machine intelligence, process automation, workforce transformations, and more. In previous years, it was essential for data scientists and analysts to have extensive knowledge about which algorithm fits the bill. But now, machine learning and automation processes have facilitated analysts to consider different algorithms. This year we are likely to observe advancements in IoT like improved security features, commutable platforms, and edge analytics.
Utilizing data science, AI, and ML as a process is increasingly popular and being embraced across a wide range of industries and applications. Most businesses are inclined to use open source applications and data management software for resolving critical system neural networks, expediting their supply chain procedures or determining customer expectations.
According to McAfee Labs’ 2018 Threats Report, in the future machine learning will be enforced for cyber-intrusion detection, scam, and spam detection. It can also be used to detect malware in the field of cybersecurity for high-intensity machine speeds in serverless environments. With the growing number of cyber-attacks, AI and ML are both helping companies to improve security approaches. Developers might be able to implement Blockchain as a feasible way to counter network intrusion and ensure safe data keeping.
The combination of predictive analytics with data science allows enterprises to reap all sorts of benefits. For example, an organization can adopt a prognostic approach while recruiting to preserve millions in turnover and attrition. What used to take days to execute can be easily done in a matter of seconds using AI and machine learning techniques.
Big data, artificial intelligence, and machine learning are likely to generate new job opportunities in 2018. This year we are supposed to witness a steep upward trend in demand for specialists with professional competency in emerging technologies such as big data, artificial intelligence, and machine learning. Although big data and analytics are relatively trending and the most sought-after professional skill set desired by companies across different sectors, AI and ML are not far behind.
SEE ALSO: Why are so many machine learning tools open source?
AI and ML hold a lot of promise for many companies in the current year. It is assumed that almost one in five companies will use AI for decision-making purposes this year. It will assist companies in offering personalized solutions and guidance to employees in real-time. With deep learning in AI, it will be easier for companies to analyze both structured and unstructured data in the text analytics platform.
We can fondly remember the year 2017 as the emerging year of new automated analytics platform primarily focusing on combining sophisticated and automated capabilities rejuvenating every facet of data science. With the evolution of the digitization, the analytics industry witnessed the emanation of artificial intelligence and machine learning in 2017. In the coming years, it is eminent that this new automated technology would continue to seamlessly grow and deliver on the promise of offering the best sophisticated and intelligent automated analytics solutions in the digital era.
qwiueg
Machine learning inevitably adds black boxes to automated systems and there is clearly an ethical debate about the acceptability of appropriating ML for a number of uses. The risks can be mitigated with five straightforward principles.
Controversy arose not long ago when it became clear that Google had provided its TensorFlow APIs to the US Department of Defense’s Project Maven. Google’s assistance was needed to flag images from hours of drone footage for analyst review.
There is clearly an ethical debate about the acceptability of appropriating ML for military uses. More broadly, there is also an important debate about the unintended consequences of ML that has not been weaponized but simply behaves unexpectedly, which results in damage or loss.
Adding ML to a system inevitably introduces a black box and that generates risks. ML is most usefully applied to problems beyond the scope of human understanding where it can identify patterns that we never could. The tradeoff is that those patterns cannot be easily explained.
In current applications that are not fully autonomous and generally keep humans in the loop, the risks might be limited. Your Amazon Echo might just accidentally order cat food in response to a TV advert. However, as ML is deployed more widely and in more critical applications, and as those autonomous systems (or AIs) become faster and more efficient, the impact of errors also scales – structural discrimination in training data can be amplified into life-changing impacts entirely unintentionally.
Since Asimov wrote his Three Rules of Robotics in 1942, philosophers have debated how to ensure that autonomous systems are safe from unintended consequences. As the capabilities of AI have grown, driven primarily by recent advances in ML, academics and industry leaders have stepped up their collaboration in this area notably at the Asilomar conference on Beneficial AI in 2017 (where attendees produced 23 principles to ensure AI is beneficial), through the work of the Future of Life Institute and OpenAI organisation.
As AI use cases and risks have become more clearly understood, the conversation has entered the political sphere. The Japanese government was an early proponent of harmonized rules for AI systems, proposing a set of 8 principles to members of the G7 in April 2016.
In December 2016, the White House published a report summarizing its work on “Artificial Intelligence, Automation, and the Economy” which followed an earlier report titled “Preparing for the Future of Artificial Intelligence”. Together, these highlighted opportunities and areas needed to be advanced in the USA.
In February 2017, the European Parliament Legal Affairs Committee made recommendations about EU wide liability rules for AI and robotics. MEPs also asked the European Commission to review the possibility of establishing a European agency for robotics and AI. This would provide technical, ethical and regulatory expertise to public bodies.
The UK’s House of Commons conducted a Select Committee investigation into robotics and AI and concluded that it was too soon to set a legal or regulatory framework. However, they did highlight the following priorities that would require public dialogue and eventually standards or regulation: verification and validation; decision making transparency; minimizing bias; privacy and consent; and accountability and liability. This is now being followed by a further Lord’s Select Committee investigation which will report in Spring 2018.
The domain of autonomous vehicles, being somewhat more tangible than many other applications for AI, seems to have seen the most progress on developing rules. For example, the Singaporean, US and German governments have outlined how the regulatory framework for autonomous vehicles will operate. These are much more concrete than the general principles being talked about for other applications of AI.
In response to a perceived gap in the response from legislators, many businesses are putting in place their own standards to deal with legal and ethical concerns. At an individual business level, Google DeepMind has its own ethics board and Independent Reviewers. At an industry level, the Partnership on AI between Amazon, Apple, Google Deepmind, Facebook, IBM, and Microsoft was formed in early 2017 to study and share best practice. It has since been joined by academic institutions and more commercial partners like eBay, Salesforce, Intel, McKinsey, SAP, Sony as well as charities like UNICEF.
Standards are also being developed. The Institute of Electrical and Electronics Engineers (IEEE) has rolled out a standards project (“P7000 — Model Process for Addressing Ethical Concerns During System Design”) to guide how AI agents handle data and ultimately to ensure that AI will act ethically.
As long as these bottom-up, industry-led efforts prevent serious accidents and problems from happening, policymakers want to put much priority on setting laws and regulations. That, in turn, could benefit developers by preventing innovation being stifled by potentially heavy-handed rules. On the other hand, this might just store up a knee-jerk reaction for later – accidents are perhaps inevitable and the goals of businesses and governments are not necessarily completely aligned.
As the most significant approach in modern AI, ML development needs to abide by some principles which mitigate against its risks. It is not clear who will ultimately impose rules if any are imposed at all. Nonetheless, some consensus seems to have emerged that the following principles identified by various groups above are the important ones to capture in law and in working practices:
Together, these principles might be enshrined in standards, rules, and regulations, would give a framework for ML to flourish and continue to contribute to exciting applications whilst minimizing risks to society from unintended consequences. Putting this in practice for ML would start with establishing a clear scope of work and a responsible person for each ML project. Developers will need to evaluate architectures that enable explainability to the maximum extent that is possible and develop processes to filter out inaccurate and unreliable inputs from training and validation sets. This would be underpinned with audit procedures that can be understood and trusted.
This is not an exhaustive list and continued debate is required to understand how data can be used fairly and how much explainability is required. Neither will all risks be eliminated but putting the above principles into practice will help minimize them.
tralala
How can developers learn to utilize machine learning in their DevOps practice? In this article, Prasanthi Korada goes over some basic approaches that can help developers apply cutting edge tech like machine learning to their everyday work.
DevOps methodologies are rapidly increasing and generating vast and diverse data sets across the life cycle of entire application including development, deployment, and performance management. Only a robust analysis and monitoring layer can particularly harness this data for the ultimate DevOps goal that is end-to-end automation.
The rise of machine learning and its related capabilities, such as artificial intelligence and predictive analytics, has pushed organizations to explore implementing new analysis models that mainly rely on mathematical algorithms. The overall impact of these tools on data-driven automation is still limited due to busy teams of DevOps and a lack of practitioners who genuinely understand machine learning, AI, and predictive analysis.
The black box approach runs counter to conventional machine learning procedures and enables the analyst to adjust the algorithm iteratively until it becomes sufficiently accurate. Today, it is essential for DevOps engineers to know about the working of infrastructure, how to utilize DBaaS, and how to code in the cloud. Since most DevOps engineers are not mathematicians, adding machine learning algorithms to this skill set is not an easy thing.
Despite the obstacles and challenges, the adoption of machine learning is only going to grow as high salaries push a number of IT engineers into this space. Although several DevOps vendors have added machine learning to their products, this does not exempt enterprises from the need to write their code in order to optimize their automation capabilities.
Many logs take up gigabytes of storage per week when there is too much data to manage. Most of the data generated in DevOps processes is on application deployment, server logs, and transaction traces results in application monitoring. The perfect way to analyze this large scale of data in real-time is to use machine learning. Let’s have a look at how machine learning enhances the practices of DevOps.
The DevOps teams analyze the entire data set as there is a plethora of data. They set thresholds for this reason as a condition for action. They primarily concentrate on outliners instead of focusing on substantial data chunks. Here, the problem exists as outliners usually provide indications but they don’t paint the detailed picture.
DevOps teams do, occasionally, make mistakes. The professional originations of DevOps cannot resolve the problems encountered while they are in action. Machine learning systems can help them analyze the data and show what happened in recent time. It can verify from daily trends to monthly trends and provide a bird’s eye view of the application at any point in time.
Professional DevOps teams use more than one tool to view and act upon given data. Each specific device has its application monitoring ways on distinct grounds considering parameters like the health and performance of the application. These machine learning systems are capable of collecting inputs from all these tools and paint an integrated view.
If your requirement is to measure orchestration process adequately, then you can use machine learning to determine the team performance. Limitations may result due to reduced orchestration. Therefore, looking at these characteristics can help you with both tools and processes.
It deals with patterns of the investigation. Developers need to be proactive about looking for faults. If you have realized that these systems deliver specific readings in the failure event, a machine learning application can search for the particular patterns of a specific kind of fault. In that case, if you comprehend the underlying cause of the failure, you can find a way to prevent it from happening.
Providing groups with a chance to set right performance or availability issues bodes well for the quality of the application. Most often, teams don’t research failures completely since they center on getting back online as soon as possible. In case, a robot gets them running fine; the cause mostly gets lost. Be careful not to let this slide.
Without the advent of big data, AI and machine models would just remain as models and would have never been implemented. IoT and cloud computing have an inter-reliant relationship. Likewise, the real-time effectiveness of machine learning systems relies on the DevOps processes that provide agile software development. Hence, applying machine learning to DevOps enhances their capability to perform cloud-based operations more efficiently.
Are you ready for machine learning? Do you have a plan? In this article, Atakan Cetinsoy from BigML goes over six things that every organization needs to be aware of when they’re devising their own machine learning strategy.
As of the end of 2017, the top five public companies of the world by market capitalization were Apple, Alphabet, Microsoft, Amazon, and Facebook — all digital natives. One of the common traits between these companies is the fact that they deal with 1s and 0s instead of tangible assets and they are in possession of vast amounts of already digitalized business and consumer data. Add on top large R&D budgets, access to elite academic talent, open-mindedness towards systematic experimentation and it is no surprise that, to date, they have been able to leverage machine learning to its fullest by launching numerous internal and end-user facing smart applications.
Having noticed these positive examples, by now, most business leaders in other industries have figured out that this machine learning thing really matters; and it ain’t going away anytime soon. So the waves of automation and data-driven decision making have recently started crushing on their shores as these businesses slowly but surely make headway with their digital transformation initiatives. This includes banks, insurance companies, manufacturers, healthcare organizations, consumer discretionary, and staples industries among others. But no matter where you are in the process of adopting machine learning, there is one critical thing you need to know if you want to avoid a mess of false starts and get ahead of everyone else.
The very idea of a platform, if implemented correctly, is that it is designed so that all aspects of the workflow are accounted for and work well together. The current experience of ML projects taking months and years to find their way to production is unacceptable and not scalable. To a large extent, they rely on building bespoke ML systems by letting expensive experts cobbling together open source components. And even then, once they finally have it working, they are not done because there is no easy way to put the models they have hand crafted into production.
Figure 1: What companies really need from ML.
To be clear, open source is not the problem itself. There are lots of great open-source tools. In fact, BigML relies on open source as well for certain aspects. The problem is that the open-source tools are typically focused on one thing, so you end up with a big puzzle of incompatible pieces that have to be glued together with custom code that likely won’t stand the test of time. And you get to write the glue.
With a MLaaS platform like BigML, you get powerful machine learning, a full API, visualizations that make exploration and rapid prototyping easy, and white-box models that can be quickly put into production. Everything is already put together, saving you all that time and any future headaches due to accumulating technical debt.
In a custom ML implementation scenario, the Pareto’s Principle implies that 80% of that system will be produced by just 20% of your team. So, in a team of five people, there is almost assuredly one critical employee. When that leaves, no one left on the team will fully know how to finish or maintain it. And guess what? That one person is in huge demand. Are you certain that you can keep them long enough to finish the project? For that matter, are you sure what they are building has been tested?
When you adopt a machine learning platform, you are joining a community of tens of thousands of users all over the world, who have built hundreds of millions of models. You can rest easy knowing that the platform has been thoroughly tested and has survived the ravages of real-world data in all its varieties.
Figure 2: Explore more ML!
If you decide to roll-your-own solution based on disparate open source components, how many of your employees are going to understand how to use it? The truth is machine learning is the modern spreadsheet for massive data, and most knowledge workers can benefit from it. This insight hasn’t escaped the attention of tech players like Facebook, Airbnb, LinkedIn, Google – all relying on ML for future innovation:
“In late 2014, we set out to redefine Machine Learning platforms at Facebook from the ground up and to put state-of-the-art algorithms in AI and ML at the fingertips of every Facebook engineer.”
That should be the vision of your company, as well — except, you don’t need to spend millions of dollars and years of research to build your own easy-to-use ML tools.
Speaking of everyone using machine learning, adopting a Machine Learning platform has another significant advantage: it makes it easy to collaborate. Resources, like models, can be shared with a secret link making it possible to send someone a URL that when clicked lets them interact with the model you built and then just as easily use it to make predictions. Commonly used resources, like a dataset, can also be shared in a gallery making it possible for a small team to curate data and then share it for everyone to use. In a private deployment, which allows your company to operate its MLaaS in a private cloud or even on-premises, these resources can be shared privately with everyone within your organization. This is especially conducive for better interpretability and risk management in the new era of data privacy regulations like the GDPR of European Union that goes into effect later in 2018.
There are often other steps that need to be performed, like transforming your data, filtering, augmenting with new features, etc. It is extremely rare that a real-world problem will be solvable without implementing a workflow composed of such steps. The good news is that these workflows are often reusable, running the same series of steps over and over with new data. The bad news is that if you are rolling out your own solution, then you are rebuilding these workflows every time. State of the art MLaaS platform also come with their own data transformation language, and workflow automation capabilities, that make it possible to separate the workflow logic from the data. The beauty of this is that these workflows can then be easily shared and reused, extending the functionality of the platform.
The importance of automation can’t be understated. Your data is not static; it will change, and you need to build a system that can adapt along with it. If refreshing your models requires re-inventing the wheel, taking weeks and months by the time you update the models, they may no longer be relevant! However, automating the entire workflow with MLaaS scripting capabilities, models can be rebuilt every day if needed. This is possible because MLaaS platforms such as BigML include APIs. In fact, the API is in some ways the core product. This means that every single action can be done programmatically with bindings in many languages to make it as easy as possible to get started programming.
This rounds up the conceptual reasons why MLaaS makes sense, but as the saying goes, the proof is in the pudding. On that front, BigML customers keep adding to the creative ways machine learning applications are deployed without expensive consultants and custom glue code. For instance, NDA Lynn recently launched its automated NDA checker service, to begin with, training their models on hundreds and then thousands of variations of Non-disclosure Agreements.
This collection of data produced interesting patterns that can serve as early warning signs for NDA Lynn customers looking to address any undue risks before agreeing to the terms stated in their NDA. This simple, narrow-AI example will likely find its way to many other types of contracts over time as digital data samples increase in size and the need to manage risks in a quantifiable way mounts in today’s ultra-competitive legal marketplace.
Hopefully, you can see the benefits of bringing a machine learning platform to your company. We find that this platform message resonates with people who are innovators, the doers in a company that wants actionable results, and more often than not, the people who are specifically tasked with evaluating new technologies for their company.
However, it’s still the early days of machine learning. Just like in the early days of e-commerce sites when everyone who wanted a shopping cart would hack together some CGI and HTML into a custom system. And the people that could code those monstrosities were in high demand and paid handsomely for their effort. Sound familiar?
But who does that now? Well, no one.
That’s because the entire process has been commoditized. And this is a good thing because those early days saw a lot of repetitive work and wasted time. The same thing is happening with machine learning right now. But it should be clear which choice is more important to the success of your company. This resistance will change eventually, but by then everyone will be using machine learning!
Amazon has been using predictive models for decades. Now, they want to put machine learning in the hands of every developer with their AWS AI division. In this article, Cyrus Vahid, an AI specialist from AWS, explains some basic models for deep learning and goes over how Amazon has a service for every use case.
Machine Learning, in short, is software that can learn and improve patterns and rules from experience (a.k.a. data) without having to explicitly implement those rules. In machine learning, instead of implementing rules, we fit a model. Let us look at a simple example of a linear model. If our data points are scattered in the following distribution, we can fit the model into data using the green line.
Figure 1: Linear regression
A 2D line can be represented as following equation:
f(x)=ax+b
In which x is the input data. Fitting a model is finding optimal parameters a and b, so that equation f is the optimal representation of unseen data points, which gives a model the ability to generalize beyond direct experience.
Amazon has been building such predictive models since mid-1990’s. Over the years, ML has become a code component of Amazon and consequently AWS. The mission of the AWS AI team is to “Put machine learning in the hands of every developer and data scientist”.
To achieve this goal, AWS services cover a large set of use cases including Computer Vision, Natural Language Processing, pre-trained algorithms for most common use cases, ML tools and frameworks, GPU, and FPGA.
Complex tasks such as vision often require solving differential equations in high-dimensional spaces. The mathematics of such equations are simply not solvable. Such problems are addressed through function approximation, in which we find a function that produces the same results within acceptable error threshold of an ϵ value. More formally, if f is our high dimensional function we want to find a function F so that:
|F(x)-f(x)|<ϵ
Deep Neural Networks are proven universal function approximators and are, thus, capable of solving high dimensional equations through approximation given that certain conditions are met.
Like most fields of science and engineering, to solve a complex problem, looking into evolution is the best inspiration. Looking at the human brain was, therefore, the natural place to look into. Our brain encodes information in a distributed network of neurons connected through a network of connection or synopses. When a stimulus evokes a recall, all related components of a specific memory are called together and the information is reconstructed. Connections between neurons have different strengths and that controls how much a piece of information participates in the recall.
In the most abstract way, neural networks mimic this process by encoding data in floating point vector representations, connecting these nodes to one another, and assigning a weight to each connection to represent the importance of a node for activation.
A deep neural network has an input layer, and an output layer, and one or more “hidden” layers, which act as intermediary computational nodes. The value of each node is computed by multiplying inputs to a node by associated weights or computing a weighted average of all the inputs to a node.
Figure 2: Multilayer Perceptron Value of hidden node h11 is computed as:
h11=Φ (I1 × w11 + I2 × w21 + ⋯ + In × wn1) in which Φ is a non-linear function that is called activation function. Generally if input to a node is X={x1, x2, …, xn} and associated weight vector W = {w1, w2, …, wn }; then f(xi, wi ) = Φ (b + ∑i (wi, xi))
Learning process involves: (1) computing output values throughout layers the forward pass, (2) comparing the output to the actual output we were expecting, (3) computing how much the computed output differs from expected output, and (4) adjusting the weights on the backward pass by distributing blame on the basis of the distance calculated in step (3).
Deep Learning has become an essential part of machine learning in recent times due to development of GPU processors performing over 1012 floating point operations per second (TFLOPS) and an explosion in available data. These improvements have resulted in the development of powerful algorithms that solve complex problems such as translation.
On the software side, the mathematics of deep learning is now being abstracted in libraries and deep learning frameworks. Some of the most popular frameworks include:
Deep learning frameworks are built to optimize parallel processing on GPU and optimize training computation for the underlying hardware. Using these frameworks, a complex image classification problem can be reduced to a few hundred lines of code without requiring any knowledge of mathematics behind the scene while leaving optimization for a large part to the framework. In short, deep learning frameworks turn a research problem into a programming task.
AWS provides a Deep Learning AMI that supports all the above-mentioned frameworks and more. The AMI is available as a community AMI and comes in Ubuntu and Amazon Linux flavors. Customers like ZocDoc use the Deep Learning AMI to build patient confidence using TensorFlow.
Some of the most researched areas of deep learning are:
Computer Vision
Image classification, scene detection, face detection, and many more use cases for both videos and still pictures fall under this category. The application of computer vision in medical imagery and diagnosis is on the rise, while self-driving cars heavily make use of computer vision algorithms. There are several methods for a developer to benefit from application of computer vision in the products they develop without a requirement to implement models themselves.
Simplest method involves Amazon Rekognition API for images and videos. The API provides an SDK and REST endpoint for various computer vision tasks.
Figure 3: Example for image labelling using Amazon Rekognition Java API
If the API-based service is not sufficient, then there are further choices available to explore.
The next best choice is to use Image Classification Algorithms from amongst Amazon SageMaker built-in algorithms. With this algorithm, you can use your own dataset and labels in order to fit the model to your data. Training a model would then be no more than a few lines of code using Amazon SageMaker python SDK or SageMaker APIs.
Figure 4: Image Classification in Amazon SageMaker
Even if this solution is not sufficient, there is always the possibility of building your own algorithms or using model zoo. Amazon SageMaker provides you with a fully-manages, end-to-end, zero setup ML environment which you can use to develop your ML code, train your model, and publish the endpoint to an elastic environment based on Amazon Elastic Container Service. Training and hosting a model in Amazon SageMaker is a single line of code per task using Python SDK and a few lines, should you choose to use SageMaker API. You can always develop your models from existing code base and altering the model to fit your problem.
Using MXNet and gluon model_zoo, you have access to a variety of pre-trained models. You can simply import a model from model_zoo, train it with your data and perform inference. There are several available models in model_zoo.
Figure 5: Using mxnet.gluon.model_zoo.vision.squeeznet for a computer vision task
If you would still like to go deeper, you can start your model development from scratch. The following example in MXNet and gluon is a sample code for building a simple network for hand-written digit recognition using Apache MXNet and gluon.
Figure 6: Gluon sample code for MNIST
The final depth you might be interested in is to implement your own code if you choose not to use MXNet or TensorFlow. You can still develop your model elsewhere, using your platform of choice and host it on Amazon SageMaker.
Other notable use cases for ML are:
Natural Language Processing
AWS provides Translation, Conversational Chat Bot, Text to Speech, and Voice to Text (ASR) services as API services, while Amazon SageMaker built-in algorithms provide Topic Modeling, Word Embedding, and Translation algorithms.
Recommendation
Amazon SageMaker built-in algorithms include a state-of-the art distributed factorization machine that can train across several GPU instances.
Forecasting
Amazon SageMaker built-in algorithms include a state-of-the art forecasting algorithm called DeepAR. Using this algorithm you can perform time-series prediction on your own dataset.
There are other algorithms implemented as part of SageMaker built-in algorithm. I encourage you to refer to documentation for further details.
Amazon Rekognition Video can track people, detect activities, and recognize objects, faces, etc. in videos. Its API is powered by computer vision models that are trained to accurately detect thousands of objects and activities, and extract motion-based context from both live video streams and video content stored in Amazon S3. The solution can automatically tag specific sections of video with labels and locations (e.g. beach, sun, child), detect activities (e.g. running, jumping, swimming), recognize, and analyze faces, and track multiple people, even if they are partially hidden from view in the video.
AWS DeepLens is a deep-learning enabled, fully programmable video camera. It can run sophisticated deep learning computer vision models in real-time and comes with sample projects, example code, and pre-trained models so developers with no machine learning experience can run their first deep learning model in a very short time.
Amazon SageMaker makes model building and training easier by providing pre-built development notebooks, popular machine learning algorithms optimized for petabyte-scale datasets, and automatic model tuning. It simplifies and accelerates the training process, automatically provisioning and managing the infrastructure to both train models and run inference to make predictions using these models.
Amazon Translate uses neural machine translation techniques to provide highly accurate translation of text from one language to another. Currently it supports translation between English and six other languages (Arabic, French, German, Portuguese, Simplified Chinese, and Spanish), with many more to come this year.
Amazon Transcribe converts speech to text, allowing developers to turn audio into accurate, fully punctuated text. It supports English and Spanish with more languages to follow. In the coming months, Amazon Transcribe will have the ability to recognize multiple speakers in an audio file, and will also allow developers to upload custom vocabulary for more accurate transcription for those words.
Amazon Lex is a service for building conversational interfaces into any application using voice and text. The service provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. Amazon Lex enables you to easily build sophisticated, natural language, conversational bots (“chatbots”).
Amazon Polly turns text into lifelike speech, allowing programmers to create applications that talk, and build entirely new categories of speech-enabled products. It uses advanced deep learning technologies to sound like a human voice.
With dozens of voices across a variety of languages, developers can select the ideal voice and build speech-enabled applications that work in different countries.
Amazon Comprehend can understand natural language text from documents, social network posts, articles, or any other textual data. The service uses deep learning techniques to identify text entities (e.g. people, places, dates, organizations), the language the text is written in, the sentiment expressed in the text, and key phrases with concepts and adjectives, such as ‘beautiful,’ ‘warm,’ or ‘sunny.’ Amazon Comprehend has been trained on a wide range of datasets, including product descriptions and customer reviews from Amazon.com, to build language models that extract key insights from text. It also has a topic modeling capability that helps applications extract common topics from a corpus of documents.
There are a number of issues that arise when data scientists and ML researchers meet DevOps to try to deploy, audit, and maintain state-of-the-art AI models in a production and commercial environment. Peter Zhokhov and Arshak Navruzyan discuss a new open source software tool called Studio.ML, which offers a number of solutions to this problem.
Machine learning (ML) and artificial intelligence (AI) enabled technologies to permeate most industries these days. Much like software in the early computer era, ML and AI have stopped being a toy for researchers and have become a serious revenue-generating instrument for businesses. Similar to how software development is being driven by business goals, there is a need to make the cycle of development to production of ML models shorter, more robust, and more easily reproduced. This need often collides with the “new and shiny thing” syndrome, a known anti-pattern in software engineering. But for ML/AI-driven enterprises, the ability to use the results of latest research often means a competitive edge and a tangible profit.
In other words, if your company’s data scientists and platform engineers make each other’s lives miserable, don’t be alarmed — it is not unusual. In fact, that is exactly the situation we found ourselves in at Sentient Technologies. When we tried to build a production-robust deep learning and distributed computing framework to work with Java, we found that all of our newly hired data scientists preferred building the models using Python instead.
To elaborate more on the origin of the problem, consider the typical data science and DevOps pipelines in Fig. 1. To be clear, we are not claiming to cover all the possible use cases with this simplified picture, only the most frequent and relevant to productizing AI/ML models. In the data science world, as seen in row A, the typical process starts with gathering the data, iterating on the data and debugging the code. Then, this is where heavy compute usually kicks in with optimizing hyper-parameters and neural network architecture, either by hand or using modern automatic methods such as neuro-evolution. Finally, the process has trained the best model. After that, data scientists usually do a nebulous thing called “pushing the model to production” which usually comes down to handing over the best model to the DevOps or engineering team.
Figure 1: The typical data science and DevOps pipeline.
At the same time, in the DevOps world shown in row B, the process starts with a container. The container has to pass a bunch of unit and regression tests, get upgraded to staging, pass soak tests and then go to production. The model container then becomes a part of the microservice architecture, starts interacting with other production system components, before it gets autoscaled and load-balanced. With these two worlds in mind, one can see the cracks where things tend to fall into; when the model is not tied to the container well enough, it bounces around when the container is shipped over the rough seas.
DevOps tends to test a container functionality nominally — without checking if the model functions correctly — whereas data scientists’ tests do not take into account the fact that the model will be working in a container. The pre-processing functions locations of weight files, etc. may be in different folders. Our experience tells us that it is these types of small things that end up being overlooked.
This DevOps and data science dichotomy has pushed us to create the open-source project Studio.ML. Studio.ML aims to bridge the gap between research-y data science that has to chase the new and shiny thing as often as needed and the kind of good software engineering or DevOps practices that make results fully reproducible and avoid “runs on my machine” problems. Studio.ML also automates menial data science tasks such as hyper-parameter optimization and leverages cloud computing resources in a seamless way without any extra cognitive load on the data scientist to learn about containers or instance AMIs.
The core idea of the project is to be as non-invasive into the data scientist’s or researcher’s code as possible, storing experiment data centrally to make them reproducible, shareable, and comparable. In fact, Studio.ML can usually provide substantial value with no code changes whatsoever. The only requirement is that the code is in Python. Since the early days of us building deep learning frameworks in Java and experimenting with Lua, Python became a de-facto standard for machine learning community. This is due to its vast set of data analysis and deep learning libraries, so if you are a researcher working on the cutting edge of ML/AI, the requirement of writing Python code will likely be naturally fulfilled.
From a researcher’s perspective, if the following command line
python my_awesome_training_script.py arg1 arg2
trains the model, then
studio run my_awesome_training_script.py arg1 arg2
runs the training and stores the information necessary to reproduce the experiment such as a set of python packages with versions, command line, and state of the current working directory. The logs of the experiment (i.e. stdout and stderr streams) are stored and displayed in a simple UI:
Figure 2: How logs look like in Studio.ML
If another researcher would like to re-run the same experiment, that can be done via
studio rerun <experiment_key>
Packaging experiments to be reproducible turns out to be an incredibly powerful idea. If the experiment can be reproduced on a machine of another researcher, it can also be run on a powerful datacenter machine with many GPUs or on a cloud machine. For example,
studio run --cloud=ec2 --gpus=1 my_awesome_training_script.py arg1 arg2
will run our experiment in Amazon EC2 using an instance with one GPU as if it was to be run locally. Note that to get that same result otherwise, the researcher has to either cooperate with a DevOps engineer or learn about EC2 AMIs, instance and tenancy types, GPU driver installation, and more.
Add to that other features like hyper-parameter search, using cheap spot / preemptible cloud instances, integration with Jupyter notebooks and others, it looks like Studio.ML is out there to make data scientists life easier. But what about the other side of the barricades: the DevOps engineers? Studio.ML comes with serving capabilities as well so a built model can be served as a single command line:
studio serve <experiment_key>
On the one hand, this allows for a simple containerization and deployment of the built models. But more importantly, simple serving enables unit/regression tests to be run by the data scientists themselves to eliminate frequent failure mode when the model behaves well in training and validation. Serving uses slightly different preprocessing code that is not being tested. Another Studio.ML feature that makes DevOps engineers’ lives easier is built-in fault-tolerant data pipelines that can do batch inference on GPUs at high rates, while weeding out bad data.
Under the hood, Studio.ML consists of several loosely coupled components like experiment packer, worker, queueing system, metadata, and artifact storage that can be swapped to hone Studio.ML to individual needs of the project, such as custom compute farms or the in-house storage of the sensitive experiment data.
Studio.ML is still a fairly early-stage project and it has been shaped a lot by the machine learning community. Even at this stage, it provides reproducible experiments with much less friction such as code changes and cognitive load on data scientists than mature services such as Google Cloud ML or Amazon SageMaker. If you’re interested in the data about this, see our blog about reproducing state-of-the-art AI models using SageMaker and Studio.ML.
In summary, modern AI/ML-driven enterprise requires a combination of industry-grade reliability and reproducibility from the ML models and the research agility to leverage and contribute to the state-of-the-art data science. Studio.ML addresses these demands in a concise and non-intrusive manner. In our vision, it will continue to bridge the gap between data scientists and DevOps engineers by introducing more and more advanced ML automation features.
Machine learning’s growth continues as it permeates into unrelated industries. Travel booking might not seem like a good fit at first, but Wilco van Duinkerken of trivago explains how ML is innovating the way you find and book your next holiday.
Everyone’s heard how machine learning has huge potential, how it could upend existing systems and change the world. But that only tells us so much — to really understand the potential of machine learning, you’ve got to focus on applications and outcomes. Travel isn’t the first industry that comes to mind when you think about machine learning. However, there are impressive innovations coming out of the travel sector with a foundation in machine learning and AI technologies, which should be an inspiration to other sectors in the future.
The travel industry has changed a lot thanks to the internet. Where we once went to brick-and-mortar travel agents to book a holiday, we now book our flights and accommodation online. Or so you’d think; as recently as 2016, only 33% of people actually book their hotels online. This is a stunning figure when you consider how much of our work and personal lives has been digitized.
Travel is a deeply personal choice. Where you choose to go on holiday, where you stay, and even what airline you fly with are all choices that say something about you and your personal preferences. For many, the experience of looking at a list of hotels in a web browser traditionally hasn’t always been as good a user experience as speaking with a real person in a travel agency or speaking to someone on the phone.
Making improvements to user experience and offering enhanced personalization are two key ways of improving customers’ online travel buying experience. Machine learning presents an exciting opportunity to accelerate this change.
Today’s online consumers are producing unprecedented amounts of data. This ‘data exhaust’ is increasingly being used in innovative new ways to provide personalized services for customers. Companies like Amazon and Netflix have already shown how effective product personalization can be in driving engagement and return visits, and the travel sector is moving in the same direction.
The goal is always to offer the traveler the best possible experience. At a company like trivago, this means optimizing the number of steps (i.e. clicks) it takes for a customer to get to what they’re looking for. Machine learning technologies can achieve this by helping to personalize what the customer sees. Natural language processing can be used on the hotel side to analyze hotel descriptions and customer reviews, as well as isolate the most popular features and key points of feedback. This data can then be fed into a database where it can be matched with existing customer preference data.
The information that hotels input to our platform is only part of what can be used to personalize results. Images accompanying listings can also be analyzed using neural networks, a subset of machine learning. For hotels that don’t have the time or the technical know-how to input all the relevant data, analysis of the images that accompany the listing can also yield valuable data around amenities, ambience, and scenery, all of which can be matched with user preferences to develop a more tailored results page.
Let’s have an example of these technologies in action: say a customer wants to see hotels with family-friendly pools. Presenting the customer with hotels that have pools is relatively straightforward, but pools that are specifically family-friendly? That’s much more challenging. To start to narrow down the list, natural language processing can be deployed on hundreds of user reviews, measuring the proximity of words like “clean”, “quiet”, “family” or “safety”. But often the words we’re looking for are not posted immediately next to each other, so it becomes more important to understand syntactic relationships and understanding how terms relate to each other. This is something that can only be done through advanced semantic technologies and specialized databases.
The end goal is to make the experience of searching for travel products more a search for an exciting experience, rather than a technical process of selecting features and on/off toggles. Machine learning is critical in helping platforms like trivago isolate the most unique and attractive aspects of a hotel and suggesting those experiences to customers who have already signaled their interest.
The conversation around machine learning often focuses on raw computing power, but not enough attention is paid to the significant ways in which we need to change our working patterns. Things that were manual processes not very long ago are now automated. Machine learning systems can generate sophisticated suggestions that were previously not available to teams.
This presents unique challenges and requires new specializations within teams to make the most out of machine learning. It’s not enough to keep going in the way of working that you’re used to, such as Agile or Scrum. It’s important to distribute your machine learning resource throughout your teams, making sure there is shared understanding of how that resource is going to be used, and that there is a shared understanding among different product teams around what the goals are with your machine learning implementation.
Finally, a word on user data. Machine learning needs user data. There’s no getting around it and it’s important to be honest and upfront with your users about it. Your customers’ data is currency: it’s valuable and it should be respected. It’s important to be upfront and transparent with customers about what data they are providing and what their data will be used for. If you present your customers with a clear and transparent choice over what to share and what they stand to gain if they do share, then they will be more open-minded.
Thanks to their flexible nature, neural networks and deep learning have transformed data science. Briton Park explains how to forecast oceanic temperatures by designing, training, and evaluating a neural network model with Eclipse Deeplearning4j. This tutorial presents a proof of concept, demonstrating the flexibility of neural networks and their potential to impact a variety of real-world problems.
Neural networks have achieved breakthrough accuracy in use cases as diverse as textual sentiment analysis, fraud detection, cybersecurity, image processing, and voice recognition. One of the main reasons for this is the wide variety of flexible neural network architectures that can be applied to any given problem. In this way, deep learning (as deep neural networks are called) has transformed data science: engineers apply their knowledge about a problem to the selection and design of model architectures, rather than to feature engineering.
For example, convolutional networks use convolutions and pooling to capture spatially local patterns (nearby pixels are more likely to be correlated than those far apart) and translational invariances (a cat is still a cat if you shift the image left by four pixels). Building these sorts of assumptions directly into the architecture enables convolutional networks to achieve state-of-the-art results on a variety of computer vision tasks, often with far fewer parameters.
Recurrent neural networks (RNNs), which have experienced similar success in natural language processing, add recurrent connections between hidden state units, so that the model’s prediction at any given moment depends on the past as well as the present. This enables RNNs to capture temporal patterns that can be difficult to detect with simpler models.
In this tutorial, we focus on the problem of forecasting ocean temperatures across a grid of locations. Like many problems in the physical world, this task exhibits a complex structure including both spatial correlation (nearby locations have similar temperatures) and temporal dynamics (temperatures change over time). We tackle this challenging problem by designing a hybrid architecture that includes both convolutional and recurrent components that can be trained in end-to-end fashion directly on the ocean temperature time series data.
We share the code to design, train, and evaluate this model using Eclipse Deeplearning4j (DL4J), as well as link to the data set and Zeppelin notebook with the complete tutorial. We briefly review key concepts from deep learning and DL4J. Both the DL4J website and its companion O’Reilly book, Deep Learning: A Practitioner’s Guide, provide a more comprehensive review.
Using an open-source framework such as DL4J can significantly accelerate the development of machine learning applications. Such frameworks typically solve problems such as integration with other frameworks, coordination of parallel hardware for the distributed training of algorithms, and machine learning model deployment. Rather than building their own machine learning stack from scratch, a project as complicated as creating an operating system, developers can go straight to building the application that will produce the predictions they need.
The first step in any machine learning project involves formulating the prediction problem or task. We begin by informally stating the problem we want to solve and explaining any intuitions we might have. In this project, our aim is to model and predict the average daily ocean temperature at locations around the globe. Such a model has a wide range of applications. Accurate forecasts of next weekend’s coastal water temperatures can help local officials and businesses in beach communities plan for crowds. A properly designed model can also provide insights into physical phenomena, like extreme weather events and climate change.
Slightly more formally, we define a two-dimensional (2-D) 13-by-4 grid over a regional sea, such as the Bengal Sea, yielding 52 grid cells. At each grid location, we observe a sequence of daily mean ocean temperatures. Our task is to forecast tomorrow’s daily mean temperature at each location given a recent history of temperatures at all locations. As show in the figure below, our model will begin by reading the grid of temperatures for day 1 and predicting temperatures for day 2. It will then read day 2 and predict day 3, read day 3 and predict 4, and so on.
Figure 1: How the model works
In this tutorial, we apply a variant of a convolutional long short-term memory (LSTM) RNN to this problem. As we explain in detail below, the convolutional architecture is well-suited to model the geospatial structure of the temperature grid, while the RNN can capture temporal correlations in sequences of variable length.
Understanding and describing our data is a critical early step in machine learning. Our data consist of mean daily temperatures of the ocean from 1981 to 2017, originating from eight regional seas, including the Bengal, Mediterranean, Korean, Black, Bohai, Okhotsk, Arabian, and Japan seas. We focus on these areas because coastal areas contain richer variation in sea temperatures throughout the year, compared to the open ocean.
Figure 2: A map of the seas that the model is sampling.
The original data are stored as CSV files, with one file for each combination of sea and year, ranging from 1981 to 2017. We further preprocess that data by extracting non-overlapping subsequences of 50 days from each sea, placing each subsequence in a separate, numbered CSV file. As a result, each file contains 50 contiguous days’ worth of temperatures from a single location. Otherwise, we discard information about exact time or originating sea.
The preprocessed data (available here) are organized into two directories, features and targets. “Features” is machine learning jargon for model input, while “targets” refer to the model’s expected output during training (targets are often referred to as labels in classification or as dependent variables in statistics). Each directory contains 2,089 CSV files with filenames 1.csv to 2089.csv. The feature sequences and the corresponding target sequences have the same file names, correspond to the same locations in the ocean, and both contain 51 lines: a header and 50 days of temperature grids. The fourth line (excluding the header) of a feature file contains temperatures from day 4. The fourth line of a target file contains temperatures from day 5, which we want to predict having observed temperatures through day 4. We will frequently refer to lines in the CSV file as “time steps” (common terminology when working with time series data).
Each line in the CSV file has 52 fields corresponding to the 52 cells in the temperature grid. These fields constitute a “vector” (1-D list of numerical values) with 52 elements. The grid cells appear in this vector in column-major order (cells in the first column occupy the first 13 elements, cells in the second column occupy the next 13 elements, etc.). If we append all 50 time steps from the CSV, we get a 50-by-52 2-D array or “matrix.” Finally, if we reshape each vector back into a grid, we get a 13-by-4-by-50 3-D array or “tensor.” This as similar to an RGB image with three dimensions (height, width, color channel) except here our dimensions represent relative latitude and longitude and time.
After we have formulated our prediction task and described our data, our next step is to specify our model, or in the case of deep learning, our neural network architecture. We plan to use a variant of a convolutional LSTM, which we briefly describe here.
Convolutional networks are based on the convolution operation. It preserves spatial relationships by applying the same filtering operation to each location in order within a raw signal, such as sliding a box-shaped filter over a row of pixels from left to right. We treat our grid-structured temperature data like 2-D images: to each grid cell, we apply a 2-D discrete convolution that consists of taking a dot product between a weight matrix and a small window around that location. The output of the filter is a scalar value for each location, indicating the filter’s “response” at each location. During training, the weights in the kernel are optimized to detect relevant spatial patterns over a small region, such as an elevated average temperature or a sharp change in temperature between neighboring locations in, e.g, the Mediterranean Sea. After the convolution, we apply a nonlinear activation function, a rectified linear unit in our case.
An LSTM is a variant of a recurrent layer (henceforth referred to as an RNN, which can refer to either the layer itself or any neural network that includes a recurrent layer). Like most neural network layers, RNN’s include hidden units whose activations result from multiplying a weight matrix times a vector of inputs, followed by element-wise application of an activation. Unlike hidden units in a standard feedforward neural network, hidden units in an RNN also receive input from hidden units from past time steps. To make this concrete with a simple example, an RNN estimating the temperature in the Black Sea on day 3 might have two inputs: the value of the hidden state on day 1 and the raw temperature on day 2. Thus, the recurrent neural network uses information from both the past and the present. The LSTM is a more complex RNN designed to address problems that arise when training RNNs, specifically the vanishing gradient problem.
A convolutional LSTM network combines aspects of both convolutional and LSTM networks. Our network architecture is a simplified version of the model described in this NIPS 2015 paper on precipitation nowcasting, with only one variable measured per grid cell and no convolutions applied to the hidden states. The overall architecture is shown in the figure below.
Figure 3: The model’s architecture
At any given time step, the network accepts two inputs: the grid of current temperatures (x in the figure) and a vector of network hidden states (h in the figure) from the previous time step. We process the grid with one or more convolutional filters and flatten the output. We then pass both this flattened output and the previous hidden states to an LSTM RNN layer, which updates its gate functions and its internal state (c’ in the figure). Finally, the LSTM emits an output (h’ in the figure), which is then reshaped into a grid and used both to predict temperatures at the next step and as an input at the next time step (h in the figure).
A convolutional structure is appropriate for this task due to the nature of the data. Heat dissipates through convection, meaning that temperatures across the ocean will tend to be “smooth” (i.e., temperatures of nearby grid cells will be similar). Thus, if neighboring cells have a high (or low) temperature, then a given cell is likely to have a high (or low) temperature as well. A convolutional network is likely to capture this local correlational structure.
On the other hand, an LSTM RNN structure is also appropriate because of the presence of short- and long-term temporal dependencies. For example, sea temperatures are unlikely to change drastically on a daily basis but rather follow a trend over days or weeks (short-to-medium-term dependencies). In addition, ocean temperatures also follow a seasonal pattern (long-term dependency): year to year, a single location is likely to follow a similar pattern of warmer and colder seasons over the course of the year. Note that our preprocessing (which generated sequences that are 50 days long) would have to be modified to allow our network to capture this type of seasonality. Specifically, we would have to use longer sequences covering multiple years.
Because of these two properties of the data, namely spatial and temporal dependencies, a convolutional LSTM structure is well-suited to this problem and data.
Now that we have completed our preparatory steps (problem formulation, data description, architecture design), we are ready to do begin modeling! The full code that extracts the 50-day subsequences, performs vectorization, and builds and trains the neural network is available in a Zeppelin notebook using Scala. In the following sections, we will guide you through the code.
Before we get to the model, we first need to write some code to to transform our data into a multidimensional numerical format that a neural network can read, i.e. n-dimensional arrays, also known as NDArrays or tensors. This process has much in common with traditional “extract, transform, and load” (ETL) from databases, so it is often referred to as “machine learning ETL.” It is commonly and perhaps more precisely referred to as “vectorization.” To accomplish this, we apply tools from the open source Eclipse DataVec suite, a full featured machine learning ETL and vectorization suite associated with DL4J.
Recall that our data is contained in CSV files, each of which contains 50 days of mean temperatures at 52 locations on a 2-D geospatial grid. The CSV file stores this as 50 rows (days) with 52 columns (location). The target sequences are contained in separate CSV files with similar structure. Our vectorization code is below.
val trainFeatures = new CSVSequenceRecordReader(numSkipLines, “,”);
trainFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + “%d.csv”, 1, 1936));
val trainTargets = new CSVSequenceRecordReader(numSkipLines, “,”);
trainTargets.initialize( new NumberedFileInputSplit(targetBaseDir + “%d.csv”, 1, 1936));
val train = new SequenceRecordReaderDataSetIterator(trainFeatures, trainTargets, batchSize, 10, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.EQUAL_LENGTH);
To process these CSV files, we begin with a RecordReader. RecordReaders are used to parse raw data into a structured record-like format (elements indexed by a unique id). DataVec offers a variety of record readers tailored to storage formats used commonly in machine learning, e.g., CSV and SVMLight, and Hadoop, e.g., MapFiles. It is also straightforward to implement the RecordReader interface for other file formats. Because our records are in fact sequences stored in CSV format (one sequence per file), we use the CSVSequenceRecordReader.
DL4J neural networks do not accept records but rather DataSets, which collect features and targets as NDArrays and provide convenient methods for accessing and manipulating them. To convert records into DataSets, we use a RecordReaderDataSetIterator. In DL4J, a DataSetIterator is responsible for traversing a collection of DataSet objects and providing them to, e.g., a neural network, during training or evaluation. DataSetIterators provide methods for returning batches of examples (represented as DataSets) and on-the-fly preprocessing, among other things. This tutorial illustrates the most common DL4J vectorization pattern: using a RecordReader in combination with a RecordReaderDataSetIterator. However, DataSetIterators can be used without record readers. See the DL4J machine learning ETL and vectorization guide for more information.
As shown below, we create two CSVSequenceRecordReaders, one each for the inputs and targets, respectively. The code below shows how to do this for the training data split, which we define to include files 1-1936, covering the years 1981-2014.
Since each pair of feature and target sequences has an equal number of time steps, we pass the AlignmentMode.EQUAL_LENGTH flag (see this post for an example of what to do if you have feature and target sequences of different length, such as in time series classification). Once the DataSetIterator is created, we are ready to configure and train our neural network.
We configure our DL4J neural network architecture using the NeuralNetConfiguration class, which provides a builder API via the public inner Builder class. Using this builder, we can specify our optimization algorithm (nearly always stochastic gradient descent), an optional custom updater like ADAM, the number and type of hidden layers, and other hyperparameters, such as the learning rate, activation functions, etc.
Critically, before adding any layers, we must first call the list() method to indicate that we are building a multilayer network. A multilayer network has a simple directed graph structure with one path through the layers; we can specify more general architectures with branching by calling graphBuilder(). Calling build() returns a MultiLayerConfiguration for a multilayer neural network, as in the code below.
val conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.seed(12345)
.weightInit(WeightInit.XAVIER)
.list()
.layer(0, new ConvolutionLayer.Builder(kernelSize, kernelSize)
.nIn(1) //1 channel
.nOut(7)
.stride(2, 2)
.learningRate(0.005)
.activation(Activation.RELU)
.build())
.layer(1, new GravesLSTM.Builder()
.activation(Activation.SOFTSIGN)
.nIn(84)
.nOut(200)
.learningRate(0.0005)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(10)
.build())
.layer(2, new RnnOutputLayer.Builder(LossFunction.MSE)
.activation(Activation.IDENTITY)
.nIn(200)
.learningRate(0.0005)
.nOut(52)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(10)
.build())
.inputPreProcessor(0, new RnnToCnnPreProcessor(V_HEIGHT, V_WIDTH, numChannels))
.inputPreProcessor(1, new CnnToRnnPreProcessor(6, 2, 7 ))
.pretrain(false).backprop(true)
.build();
We use the configuration builder API to add two hidden layers and one output layer. The first is a 2-D convolutional layer whose filter size is determined by the variable kernelSize. Because it is our first layer, we must define the size of our input, specifically the number of input channels (one, because our temperature grid has only two dimensions) and the number of output filters. Note that it is not necessary to set the width and height of the input. The stride of two means that the filter will be applied to every other grid cell. Finally, we use a rectified linear unit activation function (nonlinearity). We want to emphasize that this is a 2-D spatial convolution applied at each time step independently; there is no convolution over the sequence.
The next layer is a Graves LSTM RNN with 200 hidden units and using a softsign activation function. The final layer is an RNNOutputLayer with 52 outputs, one per temperature grid cell. DL4J OutputLayers combine the functionality of a basic dense layer (weights and an activation function) with a loss function (and thus is equivalent to a DenseLayer, followed by a LossLayer). The RNNOutputLayer is an output layer that expects a sequential (rank 3) input and also emits a sequential output. Because we are predicting a continuous value (temperature), we do not use a nonlinear activation function (identity). For our loss function, we use mean squared error, a traditional loss used for regression tasks.
There are several other things to note about this network configuration. For one, DL4J enables the user to define many hyperparameters, such as learning rate or weight initialization, for both the entire model and individual layers (layer settings override model settings).
In this example, we use Xavier weight initializations for the entire model but set a separate learning rate for each layer (though we use the same value for each). We also add regularization (gradient clipping to prevent gradients from growing too large during backpropagation through time) for the LSTM and output layers.
Finally, we observe that when reading our data from CSV files, we get sequences of vectors (with 52 elements), but our convolutional layer expects sequences of 13-by-4 grids. Thus, we need to add a RnnToCnnPreProcessor for the first layer that reshapes each vector into a grid before applying the convolutional layer. Likewise, use a CnnToRnnPreProcessor to flatten the output from the convolutional layer before passing it to the LSTM.
After building our neural network configuration, we initialize a neural network by passing the configuration to the MultiLayerNetwork constructor and then calling the init() method, as below.
val net = MultiLayerNetwork(conf);
net.init();
It is now time to train our new neural network. Training for this forecasting task is straightforward: we define a for loop with a fixed number of epochs (complete passes through the entire data set), calling fit on our training data iterator each time. Note that it is necessary to call reset() on the iterator at the end of each iteration.
for(epoch <- 1 to 25) {
println(“Epoch ” + epoch);
net.fit(train);
train.reset();
}
This is the simplest possible training loop with no form of monitoring or sophisticated model selection. The official DL4J documentation and examples repository provide many examples of how to visualize and debug neural networks using the DL4J training UI, use early stopping to prevent overfitting, add listeners to monitor training, and save model checkpoints.
Once our model is trained, we want to evaluate it on a held out test set. We will not get into detail about the importance of a proper training/test split here (for that, we recommend the excellent discussion in Deep Learning: A Practitioner’s Guide). Suffice it to say that it is critical to evaluate a model on data that it did not see during training. When working with time-sensitive data, we should nearly always train on the past and test on the future, mimicking the way the model would likely be used in practice. Splitting our data without taking time in account is likely to produce misleading accuracy as a result of so-called “future leakage”, where a network makes predictions about one moment based on knowledge of subsequent moments, a circumstance which models never encounter outside of training.
DL4J defines a variety of tools and classes for evaluating prediction performance on a number of tasks (multiclass and binary classification, regression, etc.). Here, our task is regression, so we use the RegressionEvaluation class. After initializing our regression evaluator, we can loop through the test set iterator and use the evalTimeSeries method. At the end, we can simply print out the accumulated statistics for metrics including mean squared error, mean absolute error, and correlation coefficient.
The code below shows how to set up the test set record readers and iterator, create a RegressionEvaluation object, and then apply it to the trained model and test set.
val testFeatures = new CSVSequenceRecordReader(numSkipLines, “,”);
testFeatures.initialize( new NumberedFileInputSplit(featureBaseDir + “%d.csv”, 1937, 2089));
val testTargets = new CSVSequenceRecordReader(numSkipLines, “,”);
testTargets.initialize( new NumberedFileInputSplit(targetBaseDir + “%d.csv”, 1937, 2089));
val test = new SequenceRecordReaderDataSetIterator(testFeatures, testTargets, batchSize, 10, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.EQUAL_LENGTH);
val eval = net.evaluateRegression(test);
test.reset();
println(eval.stats());
In the figure below, we show the test set accuracy for a handful of columns. We can see that the errors in temperature predictions of points on the grid are correlated with the values of their neighbors. For example, points on the top left edge of the grid appear to have higher errors with the rest of the points shown below, which are closer to the center of the sea. We expect to see these kinds of correlations in the model errors because of the spatial dependencies previously noted. We also observe that the convolutional LSTM outperforms simple linear autoregressive models by large margins, with a mean square error that is typically 20-25% lower. This suggests that the complex spatial and temporal interactions captured by the neural net (but not by the linear model) provide predictive power.
Figure 4: Results from the neural network model.
We have shown how to use Eclipse DL4J to build a neural network for forecasting sea temperatures across a large geographic region. In doing so, we demonstrated a standard machine learning workflow that began with formulating the prediction task, moved on to vectorization and training, and ended with evaluating predictive accuracy on a held-out test set. When architecting our neural network, we added convolutional and recurrent components designed to take advantage of two important properties of the data: spatial and temporal correlations.
Despite recent high profile successes and the fact that millions of people on a daily basis use products built around machine learning (for example, speech recognition on mobile phones), its impact on our lives remains relatively narrow. One of the major frontiers in machine learning is achieving similar success in high-impact domains, such as healthcare and climate sciences. This tutorial presents a proof of concept, demonstrating the flexibility of neural networks and their potential to impact a variety of real-world problems.
The availability of open-source, well-documented machine learning frameworks makes training models easier than ever. However, training models is not an end in and of itself; our real objective is to deploy and use these models to help us make decisions in the world. The relative ease of training predictive models belies the difficulty of deploying them. Integrating predictive models with other software introduces engineering challenges that differ from those encountered during training. Deployed models often run on different hardware; e.g., they may perform inference on a smartphone vs. training on a cluster of GPU servers, and must satisfy requirements for latency and availability.
As we consider the future of machine learning, the fatal accident involving an Uber automated vehicle raises important questions, such as: How do we decide whether the model is ready for deployment? If our test set is outdated or otherwise fails to reflect the real world, we may discover that our model underperforms once deployed. Timely detection of underperforming models requires real-time monitoring of not just uptime but also prediction accuracy. Further, through soft deployments and controlled experiments such as A/B tests, we may be able to catch a flawed model before taking it live. Few, if any, open source tools provide this sort of functionality out of the box.
This is usually the moment in the article where you would read a sentence about the risks of machine learning looming as large as its potential, an evocation of promise tempered by an admonition to be cautious. But that’s not how we feel about it. Yes, machine is hard. Sure, getting it right requires a great deal of effort and diligence. But we need to get it right, because with machine learning, we are able to make smarter decisions, and as a species, we desperately need to do that. And the decisions we make about data only matter when they influence are behavior, which means we need to take machine learning out of the lab and into production environments as efficiently and safely as possible. Best of luck!
The field of artificial intelligence has shown tremendous progress in the past decade. But there’s more to AI than chess-playing robots. Mat Leonard, the Head of Udacity's School of AI, explains how the history of deep learning is the history of a programming revolution. Are you ready for Software 2.0?
The last few years have seen amazing progress in the field of artificial intelligence. AlphaGo beat the world’s Go grandmasters, a feat thought impossible. The next version, AlphaZero, became the world’s best chess player in four hours. Cars are driving themselves, smartphones can identify skin cancer as well as trained dermatologists, and we’re on the verge of human-level universal translation. All of this is the result of not just machine learning, but actually a subset technology of machine learning called deep learning. But what is deep learning, what can it do, and how did we get to this point?
When people talk about AI these days, they are actually talking about deep learning. Artificial intelligence is a broad field that includes pathfinding algorithms such as A*, logic and planning algorithms, and machine learning. The field of machine learning consists of various algorithms with internal parameters found from example data through optimization. Deep learning is a branch of machine learning utilizing “deep” neural networks, that is, artificial neural networks with dozens of layers and millions of parameters. The recent advances in AI such as speech recognition, realistic image generation, and AlphaZero are all based on deep learning models.
Artificial neural networks have existed since the 1950s, which were known as perceptrons at that time. The perception was an algorithm that roughly approximates the way neurons operate. An individual neuron (or “unit”) summed up its input values, each value multiplied by some connection strength or weight. The sum of those values and weights was then passed through an activation function to get the output of the neuron. The neurons could be combined into layers with multiple neurons in each layer, using the output of neurons in one layer as the input to neurons in the next layer. The weights would be set such that the network performs some specified behavior. Most of the time setting the weights by hand is practically impossible. Instead, the network was “trained” using example data. That is, the input data is labeled and the weights are adjusted such that the network is able to reproduce the correct labels from the data.
After the initial excitement, researchers were blocked because they couldn’t train neural networks with more than two layers, restricting the ability of the networks to perform complex behaviors. Two decades later in the 1980s, a solution was found in the backpropagation algorithm, which allowed information to flow through the network from the output layers back to the input layers. Suddenly, researchers could train deeper neural networks with multiple layers. However, the process of training was computationally expensive and while there were some successes, neural networks weren’t seen as better alternatives to other machine learning algorithms.
In the 2012 ImageNet competition, Alex Krizhevsky and Ilya Sutskever from Geoffrey Hinton’s lab trained a deep neural network with 60 million parameters on two GPUs for a week. The goal of the competition was to identify objects in images with the lowest error rate, using a dataset of 1.2 million images as training examples. Their algorithm dominated the field with an error rate of 15%, beating the next best attempt by ten percentage points. Afterwards, deep neural networks became the only choice for computer vision problems. This combination of massive neural networks, trained on gigantic datasets and using GPUs to increase computational efficiency, is the basis of deep learning and all the amazing breakthroughs we’re seeing in AI.
Deep learning has created a new paradigm for software development. Traditional development involves a programmer building applications line by line, instructing the computer to perform specific behaviors. With deep learning, the software is written in the internal parameters of a neural network. These parameters are found by specifying the desired behavior of the program (usually with example data) and optimizing the network to reproduce that behavior. Andrej Karpathy, the Director of AI at Tesla, calls this “Software 2.0”, contrasted with “Software 1.0”, the familiar procedural and object-oriented paradigms.
Karpathy notes, “It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data (or more generally, identify a desirable behavior) than to explicitly write the program.” This can be clearly seen in the domain of computer vision. Researchers worked for decades on hard coding feature detectors so that computers could understand the contents of images. Deep learning models called convolutional neural networks learn these features from the images themselves, performing far better than any algorithm written procedurally. By collecting a large dataset of labeled images, deep learning researchers made the entire history of computer vision research obsolete.
Fifteen years ago, cloud computing and AWS didn’t exist, but today it’s an essential tool for programmers. Five years ago, Docker didn’t exist, now it’s ubiquitous and another tool that developers are expected to know. Similarly, five to ten years from now, deep learning will likely be an essential tool. The best solution for a large range of applications will be to collect and label data for a deep learning model, rather than the traditional process of hard-coding behavior. Karpathy continues, “A large portion of the programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks.” For developers and engineers, it’s becoming necessary to learn deep learning as an essential skill.
Today, there is a prevalent but incorrect assumption that only people with PhDs can understand deep learning. It is true that for the most part deep learning practitioners have come from academia wielding PhDs in computer science. However, modern deep learning frameworks such as PyTorch and Keras allow anyone with programming experience to build their own deep learning applications. Working developers also have a strong advantage because they have shipped code to production. The skill most lacking in the AI field is deploying deep learning models to production and maintaining those models afterwards. In many cases, an experienced developer who learns how to build deep learning models is more employable than a computer science PhD.
The most popular frameworks are written in Python (TensorFlow, Keras, PyTorch) but frameworks are available in other languages such as Java, C++, and JavaScript. The general workflow is to develop and train a deep learning model in a Python framework, then export the trained model to a different framework such as CoreML for iOS. Also, many companies are providing deep learning services on their platforms. For example, IBM Watson and AWS both have services for automated speech recognition that developers can incorporate into their applications. However, much of the time, developers will want to build an end-to-end deep learning application that deploys on their own platform or hardware.
There are many options for learning deep learning. Bootcamps such as Galvanize are teaching deep learning as part of their data science curriculum. There are also online options such as Google’s Machine Learning Crash Course and Udacity’s School of AI. A great way to gain experience is to work on Kaggle competitions. One can also find deep learning papers on Arxiv and implement the models; you’ll often be able to find implementations on GitHub to guide you. Regardless of the education method, it’s important to practice lifelong learning to keep up with the rapidly evolving technology industry.
Deep learning has fundamentally changed how humans interact with machines and it’s clear that AI will impact nearly every industry. Within the next five to ten years, deep learning will be another essential skill in a developer’s toolkit. Now is the perfect time to get involved in the deep learning community, just as the new era in software begins.
Open source and machine learning go together like peanut butter and jelly. But why? In this article, Kayla Matthews explores why many of the best machine learning tools are open source.
Machine learning is an extremely promising technology. Interestingly, many of the widely used and introduced tools in the machine learning community are open source — such as Google’s TensorFlow and OpenAI, which is partially funded by Elon Musk.
Because machine learning is such an exciting and potentially lucrative technology, people are often surprised that developers and companies build their offerings as open source, in effect giving them away for free. However, there are several reasons why that decision makes sense for machine learning.
Open source tools give developers the ability to tinker with them, thereby increasing the chances of rapid improvements or experimentation that could expand the usage or features of tools. Machine learning is a quickly evolving technology, and the more people that are working on tools related to it, the more likely it is that visionary ideas become realities.
With any technology that captures so much interest from the tech sector and the public alike, being able to bring products to the market quickly and know they’ll work as intended is essential. Open source tools allow both those things to happen.
SEE ALSO: A basic introduction to Machine Learning
Facebook is an example of a prolific company that makes a habit of releasing open source tools. One of them is Infer, which autonomously scans the code associated with Facebook’s mobile apps to catch bugs before release.
Facebook’s representatives say that when a greater number of people work on projects with the intent to make them better, the issues become resolved faster than they could otherwise. That’s a definite advantage for machine learning tools as well as other types of applications.
Google is another company that embraces open source software for machine learning, particularly with its DeepMind technologies. DeepMind is a company specializing in creating neural networks based on how humans learn.
Last spring, Google announced it was open sourcing Sonnet, a DeepMind project that’s an object-oriented neural network library. Among its reasons for doing so was to encourage ongoing research from developers that could support Google’s internal best practices for research and provide material for future research papers. Additionally, open source permits people to continually give back to Sonnet by perpetually using it for their projects.
The lack of research papers in the machine learning community is a longstanding issue brought up in a research paper published in 2007 arguing how open source projects could reduce the problem. It suggested offering the respective software mentioned in a research paper under an open source license at the same time the academic material gets released.
In that same publication, the scientific team also mentioned how the nature of open source tools allows developers to take their projects with them even if they change employers. That characteristic makes people theoretically more likely to contribute to Sonnet or any other machine-learning platform without worrying that they might lose the tools they’ve been using and improving for months or even years.
SEE ALSO: Achieving real-time machine learning and deep learning with in-memory computing
Analysts say the Internet of Things (IoT) and its associated connected devices will generate 44 trillion gigabytes of data by 2020. Understandably, people wonder what the overall value of that data will be and how they can make it maximally advantageous. The speed and accuracy that’s possible with machine learning both potentially make it easier to sort through data and find the most relevant material within it.
One of the persistent barriers to adoption associated with machine learning is the lack of all-encompassing experience. Although some engineers may understand particular aspects of it, it’s still very hard to find highly qualified people who can grasp all or most of the main components of the sector.
However, on the other side of things, open source projects feature a relatively small, standardized set of approaches that developers can start working with almost immediately, even if they aren’t familiar with all the aspects of the technology yet. As a result, people in the industry accept those approaches on a widespread basis and see that developers can do things now as a result of open source technologies in a few days when they used to take months to achieve.
People who work with open source machine learning tools also find they have thriving online communities at their disposal that allow them to tap into collective thinking when they run into unexpected difficulties.
Those forums currently have hundreds of answers to common problems, and as machine learning tools become even more popular, the knowledge base will expand.
These are just some of the numerous reasons why many companies and developers recognize the value in open source machine learning offerings. Their decision to provide them could give sustained momentum to the technology at large.
This article is part of The state of Machine Learning JAX Magazine issue:
Machine learning is all the rage these days and if your company is not in the conversation, perhaps you want to hear how trivago, BigML, Studio.ML, Udacity, AWS and Skymind put this technology to good use.
We recently talked about the importance of ethics education among developers so now it’s time to have the “how to develop ML responsibly” talk with you. Last but not least, if you thought machine learning and DevOps don’t mix well, think again.
I wish you a happy reading!
Companies are scrambling to find enough programmers capable of coding for ML and deep learning. Are you ready? Here are five of our top picks for machine learning libraries for Java.
The long AI winter is over. Instead of being a punchline, machine learning is one of the hottest skills in tech right now. Companies are scrambling to find enough programmers capable of coding for ML and deep learning. While no one programming language has won the dominant position, here are five of our top picks for ML libraries for Java.
It comes as no surprise that Weka is our number one pick for the best Java machine learning library. Weka 3 is a fully Java-based workbench best used for machine learning algorithms. Weka is primarily used for data mining, data analysis, and predictive modelling. It’s completely free, portable, and easy to use with its graphical interface.
“Weka’s strength lies in classification, so applications that require automatic classification of data can benefit from it, but it also supports clustering, association rule mining, time series prediction, feature selection, and anomaly detection,” said Prof. Eibe Frank, an Associate Professor of Computer Science at the University of Waikato in New Zealand.
Weka’s collection of machine learning algorithms can be applied directly to a dataset or called from your own Java code. This supports several standard data mining tasks, including data preprocessing, classification, clustering, visualization, regression, and feature selection.
We’re big fans of MOA here at JAXenter.com. MOA is an open-source software used specifically for machine learning and data mining on data streams in real time. Developed in Java, it can also be easily used with Weka while scaling to more demanding problems. MOA’s collection of machine learning algorithms and tools for evaluation are useful for regression, classification, outlier detection, clustering, recommender systems, and concept drift detection. MOA can be useful for large evolving datasets and data streams as well as the data produced by the devices of the Internet of Things (IoT).
MOA is specifically designed for machine learning on data streams in real time. It aims for time- and memory-efficient processing. MOA provides a benchmark framework for running experiments in the data mining field by providing several useful features including an easily extendable framework for new algorithms, streams, and evaluation methods; storable settings for data streams (real and synthetic) for repeatable experiments; and a set of existing algorithms and measures from the literature for comparison.
Last year the JAXenter community nominated Deeplearning4j as one of the most innovative contributors to the Java ecosystem. Deeplearning4j is a commercial grade, open-source distributed deep-learning library in Java and Scala brought to us by the good people (and semi-sentient robots!) of Skymind. Its mission is to bring deep neural networks and deep reinforcement learning together for business environments.
Deeplearning4j is meant to serve as DIY tool for Java, Scala and Clojure programmers working on Hadoop, the massive distributed data storage system with enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. The deep neural networks and deep reinforcement learning are capable of pattern recognition and goal-oriented machine learning. All of this means that Deeplearning4j is super useful for identifying patterns and sentiment in speech, sound and text. Plus, it can be used for detecting anomalies in time series data like financial transactions.
Developed primarily by Andrew McCallum and students from UMASS and UPenn, MALLET is an open-source java machine learning toolkit for language to text. This Java-based package supports statistical natural language processing, clustering, document classification, information extraction, topic modelling, and other machine learning applications to text.
MALLET’s specialty includes sophisticated tools for document classification such as efficient routines for converting text. It supports a wide variety of algorithms (including Naïve Bayes, Decision Trees, and Maximum Entropy) and code for evaluating classfier performance. Also, MALLET includes tools for sequence tagging and topic modelling.
The Environment for Developing KDD-Applications Supported by Index Structures (ELKI for short) is an open-source data mining software for Java. ELKI’s focus is in research in algorithms, emphasizing unsupervised methods in cluster analysis, database indexes, and outlier detection. ELKI allows an independent evaluation of data mining algorithms and data management tasks by separating the two. This feature is unique among other data mining frameworks like Weta or Rapidminer. ELKI also allows arbitrary data types, file formats, or distance or similarity measures.
Designed for researchers and students, ELKI provides a large collection of highly configurable algorithm parameters. This allows fair and easy evaluation and benchmarking of algorithms. This means ELKI is particularly useful for data science; ELKI has been used to cluser sperm whale vocalizations, spaceflight operations, bike sharking redistribution, and traffic prediction. Pretty useful for any grad students out there looking to make sense of their datasets!
Do you have a favorite machine learning library for Java that we didn’t mention? Tell us in the comments and explain why it’s a travesty we forgot about it!
asap
What is word2vec? This neural network algorithm has a number of interesting use cases, especially for search. In this excerpt from Deep Learning for Search, Tommaso Teofili explains how you can use word2vec to map datasets with neural networks.
Word2vec is a neural network algorithm. Although it’s fairly easy to understand its basics, it’s also fascinating to see the good results — in terms of capturing the semantics of words in a text – that you can get out of it. But what does it do, and how is it useful for our synonym expansion use case?
What we want to do is setup a word2vec model, feed it with the text of the song lyrics we want to index, get some output vectors for each word, and use them to find synonyms.
You might have heard about the usage of vectors in the context of search. In a sense, word2vec also generates a vector space model whose vectors (one for each word) are weighted by the neural network during the learning process. Let’s use an example of the Aeroplane song; if we feed its text to word2vec we’ll have a vector for each of our words:
0.7976110753441061, -1.300175666666296, i
-1.1589942649711316, 0.2550385962680938, like
-1.9136814615251492, 0.0, pleasure
-0.178102361461314, -5.778459658617458, spiked
0.11344064895365787, 0.0, with
0.3778008406249243, -0.11222894354254397, pain
-2.0494382050792344, 0.5871714329463343, and
-1.3652666102221962, -0.4866885862322685, music
-12.878251690899361, 0.7094618209959707, is
0.8220355668636578, -1.2088098678855501, my
-0.37314503461270637, 0.4801501371764839, aeroplane
…We can see them in the coordinate plan.
Image 1: Coordinate plan for aeroplane words
In the example output above I decided to use two dimensions to make those vectors more easily plottable on a graph. But in practice it’s common to use higher numbers like one hundred or more and to use dimensionality reduction algorithms like Principal Component Analysis or t-SNE to obtain 2-3 dimensional vectors that can be more easily plotted. This is because these numbers allow you to capture more information as the amount of data grows.
If we use cosine similarity to measure the distance among each of the generated vectors we can find out some interesting results.
music -> song, view
looking -> view, better
in -> the, like
sitting -> turning, could
As you can see we extract the two nearest vectors for a few random vectors, some results are good, some not much.
What’s the problem here; is word2vec not up to the task? Two things are at play here:
Let’s assume we again build the word2vec model, this time by using one hundred dimensions and a larger set of song lyrics taken from the Billboard Hot 100 Dataset, found here.
music -> song, sing
view -> visions, gaze
sitting -> hanging, lying
in -> with, into
looking -> lookin, lustin
We can see now that the results are much better and appropriate: we can use almost all of them as synonyms in the context of search. You can imagine using such a technique either at query or indexing time. There’d be no more dictionaries or vocabularies to keep up to date; the search engine could learn to generate synonyms from the data it handles.
A couple of questions you might have right about now: how does word2vec work? How can I integrate it, in practice, in my search engine? The original paper (found here) from Tomas Mikolov and others describes two different neural network models for learning such word representations: one is called Continuous Bag of Words and the other is called Continuous Skip-gram Model.
Word2vec performs an unsupervised learning of word representations, which is good; these models need to be fed with a sufficiently large text, properly encoded. The main concept behind word2vec is that the neural network is given a piece of text, which is split into fragments of a certain size (also called window). Every fragment is fed to the network as a pair of target word and context. In the case below the target word is aeroplane and the context is composed of the words music, is, my.
Image 2: Model of word2vec
The hidden layer of the network contains a set of weights (in the case above, 11, the number of neurons in the hidden layer) for each of the words. These vectors are used as the word representations when learning ends.
An important trick about word2vec is that we don’t care too much about the outputs of the neural network. Instead, we extract the internal state of the hidden layer at the end of the training phase, which yields exactly one vector representations for each word.
During such training, a portion of each fragment is used as target word while the rest is used as context. The case of the Continuous Bag of Words model, the target word is used as the output of the network, and the remaining words of the text fragments (the context) are used as inputs. It’s opposite in the Continuous Skip-gram Model: the target word is used as input and the context words as outputs (as in the example above).
For example, given the text “she keeps Moet et Chandon in her pretty cabinet let them eat cake she says” from the song “Killer Queen” by Queen and a window of 5, a word2vec model based on CBOW receives a sample for each five-word fragments in there. For the fragment | she | keeps | moet | et | chandon |, the input’s made of the words | she | keeps | et | chandon | and its output consists of the word moet.
Image 3: CBOW model based on Killer Queen
As you can see from the figure above, the neural network is composed of an input layer, a hidden layer and an output layer. This kind of neural networks—with only one hidden layer—is called shallow, as opposed to the ones having more than one hidden layer, which are called deep neural networks.
The neurons in the hidden layer have no activation function and they linearly combine weights and inputs (multiply each input by its weight and sum all of these results together). The input layer has a number of neurons equal to the number of words in the text for each word; in fact word2vec requires each word to be represented as a hot encoded vector.
Now let’s see what a hot encoded vector looks like. Imagine we’ve got a dataset with three words [cat, dog, mouse]; we’ll have three vectors, each one of them having all the values set to zero except one, which is set to one, and it’s the one that identifies that specific word.
dog : [0,0,1]
cat : [0,1,0]
mouse : [1,0,0]
If we add the word ‘lion’ to the dataset, hot encoded vectors for this dataset have 4 dimensions:
lion : [0,0,0,1]
dog : [0,0,1,0]
cat : [0,1,0,0]
mouse : [1,0,0,0]
If you have one hundred words in your input text, each word is represented as a 100-dimensional vector. Consequently, in the CBOW model, you’ll have one hundred input neurons multiplied by the window parameter minus one. If you’ve got a window of 4, you’ll have 300 input neurons.
The hidden layer instead has a number of neurons equals to the desired dimensionality of the resulting word vectors. This is a parameter that must be set by whoever sets up the network.
The size of the output layer is equal to the number of words in the input text, in this example one hundred. A word2vec CBOW model for an input text with one hundred words, dimensionality equals to fifty and window parameter set to 4 has 300 input neurons, fifty hidden neurons and one hundred output neurons.
For word2vec, CBOW model inputs are propagated through the network by first multiplying the hot encoded vectors of the input words by their input to hidden weights; you can imagine a matrix containing a weight for each connection between an input and hidden neuron. Those then get combined (multiplied) with the hidden to output weights, producing the outputs, and these outputs are then passed through a Softmax function. Softmax “squashes” a K-dimensional vector (our output vector) of arbitrary real values to a K-dimensional vector of real values in the range (0, 1) that add up to 1, representing a probability distribution. Our network is telling us the probability of each output word to be selected, given the context (the network input).
This could be phrased more vaguely but meaningfully; like “adjusted a little bit so that it’d produce a better result next time”. After this forward propagation, the back-propagation learning algorithm adjusts the weights of each neuron in the different layers to produce a more accurate result for each new fragment.
With the objective of reducing the output error, the network comes with a certain probability distribution over the output words which gets compared with the actual target word that the network is given, such information is used to adjust the weights going backwards. After the learning process has been completed for all the text fragments with the configured window the hidden to output weights represent the vector representation for each word in the text.
The Continuous Skip-gram Model looks reversed with respect to the CBOW model.
Image 4: Skip-gram for “She keeps Moet et Chandon”
The same concepts apply for Skip-gram. The input vectors are hot encoded (one for each word) ensuring that the input layer has a number of neurons equal to the number of words in the input text. The hidden layer has the dimensionality of the desired resulting word vectors, and the output layer has a number of neurons equal to the number of words multiplied by the windows size minus one. Using the example we used for CBOW, having the text “she keeps moet et chandon in her pretty cabinet let them eat cake she says” and a window of 5, a word2vec model based on the Continuous Skip-gram model receives a first sample for | she | keeps | moet | et | chandon | where the input’s made of the word moet and its output consists of the words | she | keeps | et | chandon |.
Here’s an example excerpt of word vectors calculated by word2vec on the text of the Hot 100 Billboard dataset. It shows a small subset of words plotted, for the sake of appreciating some word semantics being expressed geometrically.
Figure 5: word2vec of Hot 100 Billboard dataset
You can notice the expected regularities between me and my with respect to you and your. You can also have a look at groups of similar words, or words that at least are used in similar context, which are good candidates for synonyms.
Now that we’ve learned a bit about how word2vec algorithm works, let’s get some code and see it in action. Then we’ll be able to combine it with our search engine for synonym expansion.
Deeplearning4j is a deep learning library for the JVM. It has a good adoption among the Java people and a not-too-steep learning curve for early adopters. It also comes with an Apache 2 license, which is handy if you want to use it within a company and include it within its possibly non-open-source product. DL4J also has tools to import models created with other frameworks such as Keras, Caffe, TensorFlow, Theano, etc.
DeepLearning4J can be used to implement neural network based algorithms; let’s see how we can use it to set up a word2vec model. DL4J has an out-of-the-box implementation of word2vec, based on Continuous Skip-gram model. What we need to do is set up its configuration parameters and pass the input text we want to invest in our search engine.
Keeping our song lyrics use case in mind, we’re going to feed word2vec with the Billboard Hot 100 text file.
String filePath =
new ClassPathResource("billboard_lyrics_1964-2015.txt").getFile().getAbsolutePath(); 1
SentenceIterator iter = new BasicLineIterator(filePath); 2
Word2Vec vec = new Word2Vec.Builder() 3
.layerSize(100) 4
.windowSize(5) 5
.iterate(iter) 6
.build();
vec.fit(); 7
String[] words = new String[]{"guitar", "love", "rock"}; for (String w : words) {
Collection lst = vec.wordsNearest(w, 2); 8
System.out.println("2 Words closest to '" + w + "': " + lst);
}
1 read the corpus of text containing the lyrics
2 set up an iterator over the corpus
3 create a configuration for word2vec
4 set the number of dimensions the vector representations needs
5 set the window parameter
6 set word2vec to iterate over the selected corpus
7 obtain the closest words to an input word
8 print the nearest words
We obtain the following output, which sounds good enough.
2 Words closest to 'guitar': [giggle, piano]
2 Words closest to 'love': [girl, baby]
2 Words closest to 'rock': [party, hips]
As you can see it’s straightforward to set up such a model and obtain results in a reasonable amount of time (training of the word2vec model took around 30s on a “normal” laptop). Keep in mind that we aim to use this in conjunction with the search engine, which should give us a better synonym expansion algorithm.
Now that we have this powerful tool in our hands we need to be careful! When using WordNet we have a constraint set of synonyms to prevent blowing up the index. With our word vectors generated by word2vec we may ask the model to return the closest words for each word to be indexed. This might be unacceptable from the performance perspective (for both runtime and storage), and we must come with a strategy for how to use word2vec responsibly. One thing we can do’s constrain the type of words we send to word2vec to get their nearest words.
In natural language processing, it is common to tag each word with a part of speech (PoS) which labels which syntactic role it has in a sentence. Common parts of speech are NOUN, VERB, ADJ, but also more fine-grained ones like NP or NC (proper or common noun). We could decide to use word2vec only for words whose PoS is either NC or VERB and avoid bloating the index with synonyms for adjectives. Another technique is to have a prior look at how much information is found in the document. A short text is likely to have a relatively poor probability to hit a query because it’s composed by a few terms. We could decide to focus more on such documents and be eager to expand synonyms there, rather than in longer documents.
What’s a ‘term weight’ again? On the other hand the “informativeness” of a document doesn’t only depend on its size. Therefore, other techniques could be used, such as looking at term weights (the number of times a term appears in a piece of text) and skipping those ones with a low weight. Additionally, we can only use word2vec results if they have a good similarity score. If we use cosine distance for measuring the nearest neighbors of a certain word vector, such neighbors could be too far (a low similarity score) but still be the nearest ones. In that case we can decide not to use those neighbors.
A token filter takes the terms provided by a tokenizer and eventually performs some operations on these terms, like filtering them out, or, as in this case, adding some other terms to be indexed. A Lucene TokenFilter is based on the incrementToken API which returns a boolean value, which is false at the end of the token stream; implementers of this API consume one token at a time (e.g. by filtering or expanding a certain token). Earlier in this article you saw a diagram of how word2vec synonym expansion works. Before being able to use word2vec we need to configure and train it using the data to be indexed. In the song lyrics example, we decided to use the Billboard Hot 100 Dataset, which we pass as plain text to the word2vec algorithm, as shown in the previous code listing.
Once we’re done with word2vec training, we create a synonym filter that uses the learned model to predict term synonyms during filtering. Lucene APIs for token filtering require us to implement the incrementToken method. By APU contract this method returns true if there are still tokens to consume from the token stream and false if there are no more tokens left to consider for filtering. The basic idea’s that our token filter returns true for all the original tokens and for all the related synonyms that we get from word2vec.
Protected W2VSynonymFilter(TokenStream input, Word2Vec word2Vec) { 1
super(input);
this.word2Vec = word2Vec;
}
@Override
public boolean incrementToken() throws IOException { 2
if (!outputs.isEmpty()) {
... 3
}
if (!SynonymFilter.TYPE_SYNONYM.equals(typeAtt.type())) { 4
String word = new String(termAtt.buffer()).trim();
List<String> list = word2Vec.similarWordsInVocabTo(word, minAccuracy); 5
int i = 0;
for (String syn : list) {
if (i == 2) { 6
break;
}
if (!syn.equals(word)) {
CharsRefBuilder charsRefBuilder = new CharsRefBuilder();
CharsRef cr = charsRefBuilder.append(syn).get(); 7
State state = captureState(); 8
outputs.add(new PendingOutput(state, cr)); 9
i++;
}
}
}
return !outputs.isEmpty() || input.incrementToken();
}
1 create a token filter that takes an already trained word2vec model
2 implement the Lucene API for token filtering
3 add cached synonyms to the token stream (see next code listing)
4 only expand a token if it’s not a synonym itself (to avoid loops in expansion)
5 for each term find the closest words using word2vec which have an accuracy higher than a minimum (e.g. 0.35)
6 don’t record more than two synonyms for each token
7 record the synonym value
8 record the current state of the original term (not the synonym) in the token stream (e.g. starting and ending position)
9 create an object to contain the synonyms to be added to the token stream after all the original terms have been consumed
The above piece of code traverses all the terms, and when it finds a synonym it puts it in a list of pending outputs to expand (the outputs List). We apply those pending terms to be added (the actual synonyms) after each original term has been processed, in the code below.
if (!outputs.isEmpty()) {
PendingOutput output = outputs.remove(0); 1
restoreState(output.state); 2
termAtt.copyBuffer(output.charsRef.chars,
output.charsRef.offset, output.charsRef.length); 3
typeAtt.setType(SynonymFilter.TYPE_SYNONYM); 4
return true;
}
1 get the first pending output to expand
2 retrieve the state of the original term, including its text, its position in the text stream, etc.
3 set the synonym text to the one given by word2vec and previously saved in the pending output
4 set the type of the term as synonym
We have applied the word2vec technique so that we use its output results as synonyms only if they have an accuracy that is above a certain threshold. If you want to learn more about applying deep learning techniques to search, read the first chapter of Deep Learning for Search here, and see this slide deck.
A team of researchers at Oak Ridge National Laboratory wrote a paper in which they argue that machines will write most of their own code by 2040. Does this mean that humans won't be writing code at all? How will coding in 2040 look like? We talked with Jay Jay Billings, one of the authors about their ideas and the future of machine learning.
JAXenter: In your paper, you state that machines (with the help of ML, AI, natural language processing, and code generation technologies) will write most of their own code by 2040. Does this mean humans won’t be writing code at all? How will coding in 2040 look like?
Jay Jay Billings: I think humans will still write code in 2040. I think humans will write sophisticated code that addresses very specific problems that cannot be addressed easily by machine-generated code (MGC). The argument in the paper is that most code that we write now — what we might consider “everyday code” — will be written with code generators that are fed by machine learning, AI, and natural language processing.
So, using the Starbucks coffee example in the paper, we imagine that the heat conduction code could be auto-generated and things such as material properties pulled from databases or regression-based models. Or maybe by 2040, there will be a large database of the way coffee cools in different cups! ;-) However, if someone wanted to do something specific like test out new ways of solving the heat equation, then they would just write the code as usual or maybe do some pair programming with a really smart code generator.
I have two views on the way writing code will look in 2040.
JAXenter: In this context, is machine learning a help to coding or is it a threat?
Most code that we write now — what we might consider “everyday code” — will be written with code generators that are fed by machine learning, AI, and natural language processing.
Jay Jay Billings: MGC would be a huge help, in my opinion. Right now we can’t meet the demand for skilled programmers. There’s just too much code to write.
I think machine learning is helpful because it will essentially learn what should be written and determine the best way(s) of writing it. If you think of the Design Patterns book by the Gang of Four, they went out and discovered all those patterns manually and, in my opinion, completely changed the discussion in software engineering. There are numerous efforts right now to apply machine learning to doing the same thing and if that lets us write more code, then it is very helpful.
I think the most helpful thing though—and this doesn’t exactly come out in the paper—is that code generators are becoming increasingly more sophisticated. The more good code I can generate the better because it lets me focus on writing the parts of the code that MGC can’t help with. I think good code generators will be the most helpful and useful tools for coding by 2040.
JAXenter: Machine learning can do all sorts of things; it can even help discover new exoplanets. What’s next?
Jay Jay Billings: I think the one that is really, really going to make people think is machine learning as applied
to medicine. Something you often see in the news these days is talk about personal genomics and personalized medicine, which means that in the next few years there will be so much data about you as a patient that very specific, personal treatments can be developed, for example, based on your DNA or the microbiome in your gut.
I think machine learning is going to be huge in this area because there’s simply no way that doctors can analyze that data on their own, but large medical companies could easily develop models based on our genetic information and use machine learning to greatly improve our care.
JAXenter: What should one take into consideration when planning to use machine learning? What are the dos and don’ts?
Jay Jay Billings: I think the biggest misconception is that it is magic or otherwise so automatic that virtually no work is required. Those new to the field should be prepared to do some of the hardest work of their lives to get the data they need, clean and treat it appropriately so that good features can be extracted, and then spend the time necessary to build and train the model without overfitting it.
SEE ALSO: Machines expected to write most of their own code by 2040
JAXenter: The CERT Division of Carnegie Mellon University’s Software Engineering Institute has published an updated list of technologies that might give us headaches in the security department and machine learning is on it. Are we ready to bear the brunt of machine learning’s risks?
Jay Jay Billings: We’ve certainly already made a step in the right direction by recognizing that there are risks.
JAXenter: Gartner estimates that machine learning is within 2–5 years of mainstream adoption and the CERT/CC expects this to be “one of the most aggressive and quickly adopted technology trends over the next several years.” What do you think? How will we know when machine learning has gone mainstream? What signs should we look for?
Jay Jay Billings: Oh, I think it is already mainstream. It is in our mapping tools, our streaming media services, all the personal tracking used by companies on the internet. The big thing for me is whether or not the uninitiated can use it to their advantage, and the number of children ordering things with their parent’s Echoes and Alexas from Amazon certainly suggests that’s true.
We asked Jay Jay Billings to finish the following sentences:
In 50 years’ time, machine learning will be old news in that everyone will take it for granted, but still an area where people can make good careers.
If machines become more intelligent than humans, then I’ll be surprised and look forward to a cozy retirement.
Compared to a human being, a machine will never be limited to functioning on what the Star Trek universe describes as a Type-M planet.
Without the help of machine learning, mankind would never have made it into the modern era. The simplest form of machine learning is something called regression and we used it from the earliest days of computing to create new models of all sorts of important things.
Women are underrepresented in the tech sector —myth or reality? In addition to the Women in Tech survey, we also launched a diversity series aimed at bringing the most inspirational and powerful women in the tech scene to your attention. Today, we'd like you to meet Sepideh Nasiri, founder & CEO of Persian Women In Tech.
Is tech a boys-only club? So it seems. But the light of smart and powerful women is finally shining bright. We root for excellence and justice and, above all, we want meritocracy to win. This is our way of giving women in tech a shout-out.
A research study by The National Center for Women & Information Technology showed that “gender diversity has specific benefits in technology settings,” which could explain why tech companies have started to invest in initiatives that aim to boost the number of female applicants, recruit them in a more effective way, retain them for longer, and give them the opportunity to advance. But is it enough?
Women in Tech — The Survey
We would like to get to the bottom of why gender diversity remains a challenge for the tech scene. Therefore, we invite you all to fill out our diversity survey. Share your experiences with us!
Your input will help us identify the diversity-related issues that prevent us from achieving gender equality in technology workplaces.
Without further ado, we would like to introduce Sepideh Nasiri, founder & CEO of Persian Women In Tech.
Sepideh (“Sepi”) Nasiri is an award-winning Entrepreneur, Founder of Persian Women In Tech, the former Vice President at Women 2.0 and an advocate for Women + Diversity and Inclusion. Ms. Nasiri started her career as a Co-founder and Managing Editor of a digital and print Los Angeles Magazine and later joined a number of successful startups in Silicon Valley. Currently, Ms. Nasiri advises early-stage startups, is an avid technology enthusiast and mentors globally many female entrepreneurs and founders in technology.
I believe it started in high school, at Monta Vista High School in Cupertino, CA where Apple and Sun Microsystem were just down the street. My student body was exposed at a young age to tech and computers.
I come from an international, entrepreneurial family. Having moved from Iran to Germany and then to the US for better opportunities and life, I had the foundation to start a company at a young age right after college.
I never had the desire to have a traditional 9 to 5 job, but always wanted to utilize my skills in education and technology to help change the ways people live and think. My parents, particularly my father, has always supported me, but it took a little time to convince him that entrepreneurism—though not the safest career route—was the best option for me to live a happy, successful life. I also had mentors from a young age, from my professors and cofounders to networks I found later on in my career. Role models, mentors, and sponsors are essential to success, especially for women in STEM.
Everyone has obstacles and challenges in life, but one of my biggest challenges was being an immigrant. Legal knowledge and the right immigration status can make or break your career path. That being said, I was lucky enough to find ways around these issues and forge solutions that ended up benefitting my career.
I founded Persian Women In Tech, a platform whose mission is to build the profile of Iranian women working in technology while elevating their voices. Persian Women in Tech also provides resources like mentorship, which has helped create a larger sense of community.
I’m proud that I have the opportunity to help others, especially other women who work in the tech space.
There are many women in tech and STEM, but they tend to leave the tech workforce due to the company’s environment, lack of support systems, lack of employee support and lack of career growth opportunities.
When women aren’t well represented in STEM fields, the entire world loses out on their talents and innovations. As with many industries, STEM fields are considered a man’s world. When women and people of color have opportunities for leadership roles in STEM careers and are given a seat at the table, it will pave the way for a diversity of ideas and talent, and only drive more innovation and workplace equality.
The discussion about diversity is gaining momentum in a way that is already creating results. Large technology companies like Google and Facebook are beginning to be held accountable for their diversity and sexism problems, and it’s clear that tech companies can no longer hide behind their ‘hip’ facade as cover for their discriminatory hiring and promotion practices. I am inspired by fellow women and people of color in STEM fields who are quite literally pioneers in an industry that is largely white and male .
Many companies lack several key components, namely infrastructure to support female employees, mentorship programs, career growth opportunities, and sponsors.
About 10 years ago, tech was a lonely space for a woman; the tech industry was one that was rife with all male-dominated positions, making it all the more difficult—especially as women of color—to break through and have a seat at the table. My advice would be that it is absolutely crucial for women in tech to connect, remain connected and empower one another, particularly in times of frustration with the industry as a whole, and lift each other up with access to resources and opportunities when needed. That’s why Persian Women in Tech strives to continue growing a global network of over 2500 women, with at least 100 people in each of our chapters across the world.
Aside from fostering relationships with other women in the STEM space, my advice would be to never be afraid to speak up or ask for more, and to educate the people around you—particularly men—in the workplace about how promoting diversity in the tech workforce is simply good for business.
Don’t miss our Women in Tech profiles: