/var/blog
Showing results for 
Search instead for 
Do you mean 

IoT and Machine Learning: A networking perspective

by Trusted Contributor on ‎09-10-2017 08:01 PM

It’s well understood that IoT is going to provide a lot of data. And that data will be used to feed applications. Machine learning will help provide the algorithms that convert that data into action. And that action is what we, the consumers, will benefit from.
 
This means that IoT and machine learning will need to intersect. How is that likely to happen?
 
Some background
I am writing this after watching some Twitter action last week. Randy Bias commented on a Tyler Britten tweet:
 
Before I explain why, let me just say that I agree with Randy’s take on this. And I think in the IoT cases, this is going to be especially true. And in everyone’s favorite AI example (autonomous driving), I just don’t think there is any other way. 
 
Machine learning, data, and models
There continues to be an undercurrent of confusion when non-machine-learning people talk about machine learning (and to be fair, I am not an expert—this is just me observing other non-experts talk). 
 
In normal people speak, there are a couple of things to understand about machine learning. The algorithms that drive behavior require data. That data has to collected and then used to make models. You then train the models with more data. This is what hardens the learning part. 
 
But most applications aren’t constantly tuning the models. Once they have a model, they just execute it. There is not a continuous need to train the models at the point of execution. That’s not to say that data won’t continue to be important as models are revised, but the actual application using the data isn’t simultaneously doing what it’s designed to do and learning on the fly based on local data (in the general case). 
 
An example
In the autonomous driving example, it’s not going to be the case that your car is constantly siphoning data from its surroundings and locally tuning its driving algorithms. This would be horrifically inefficient. Not only would you have to deploy mini datacenters inside cars to handle the models (forgetting for a moment the human component of this exercise), but the data would be local, so every car would have slightly different models from which to work.
 
So even if we could solve the datacenter-in-a-car problem, we wouldn’t want the behavior drift that would occur due to everyone using their own data. While there will be cases where local data does matter, it won’t be for use cases that are ubiquitously deployed. The more consistency there must be, the stronger the case for centralizing the models and then distributing to the edge.
 
Implications
This basically means that the point of application execution is likely to be where models are used, but unlikely to to be where models are trained. So some of the talk about using edge resources as machine learning centers is a bit overblown. 
 
In fact, the primary job of the edge will be data collection and relay. All these IoT devices will generate data. Not all of that data will be relevant, so doing some data grooming at the edge would be nice. This means there will be applications for putting compute resources on the edge, but it won’t be GPU farms doing heavy computation. Simple filtering will be enough. And when you consider that broadband speeds where some of these IoT clusters will be might not always be great, filtering and even buffering for bulk transport will be important functions. It’s not hard to imagine local storage with periodic rolling uploads in a use case that kind of looks like data replication for DR. 
 
And of course this data will be a mix of sensitive and non-sensitive data, depending on the source, which will mean that IoT gateways will also have to encrypt. Let's assume these IoT gateways are low-end ARM-type platforms or even low-end x86 devices (think Atom and below, not necessarily big Broadwell-8 or larger). This means that there won’t be a ton of resources on-box to handle heavy-duty encryption, which will favor intelligent packet handling. This means basic policy handling at the edge, which essentially means that we will have to solve policy management as well. It’s a good thing SDN is everywhere at this point and that multi-vendor orchestration is well-solved (that’s a joke, btw).
 
Exceptions
There will, of course, be exceptions. People can conceive of applications that are extremely local in their behavior, and these might not need centralized data to get the benefit of a distributed set of data sources. It’s still a bit of a stretch to think that anything of real consequence can fit on some cheap, in-the-field, hardened device, though. If the data sets are small enough to do on little boxes or in smaller clusters, I kind of wonder if machine learning is really a big part of the solution. That said, someone will conceive of these types, and they will have to be accounted for. 
 
I tend to think the bigger reason for local model training is going to be security requirements around the data (think military use cases, or nuclear reactors), or limitations in broadband (mining fields out in the middle of nowhere using satellite links). In these cases, the issue is less about the models and more about the WAN connectivity. But SD-WAN should solve this (again, joking).
 
The bottom line
I think IoT becomes a great enabler for machine learning. Having rich, contextual data is going to be hugely useful. That data intersection is pretty obvious. But from an infrastructure point of view, I think the intersection is really going to be secure data transport from IoT gateways to machine learning clouds, which could be centralized or distributed. 
 
In fact, if I were a service provider (especially a regional service provider with local presence), this would be a key area that I would be planning to exploit. With every announcement of a company adding sensors to their equipment, there is an opportunity to be the cloud that handles that data. John Deere is going to need infrastructure and cloud partners. Caterpillar is going to need partners. Oil and gas companies will need partners. 
 
And this will all create opportunities for net-new relationships around a new technology combo that admittedly will look a lot like things we have seen before. 

Announcements
Juniper Networks Technical Books
Labels
About the Author
  • Mike is currently acting as a Senior Director of Strategic Marketing at Juniper Networks. Mike spent 12 years at Juniper in a previous tour of duty, running product management, strategy, and marketing for Junos Software. In that role, he was responsible for driving Juniper's automation ambitions and incubating efforts across emerging technology spaces (notably SDN, NFV, virtualization, portable network OS, and DevOps). After the first Juniper stint, Mike joined datacenter switching startup Plexxi as the head of marketing. In that role, he was named a top social media personality for SDN. Most recently, Mike was responsible for Brocade's datacenter business as VP of Datacenter Routing and Switching, and then Brocade's software business as VP of Product Management, Software Networking.
About /var/blog

Subscribe to /var/blog  RSS Icon

Follow Mike on Twitter