This section describes how machine learning packages are integrated with the IoTPy package to detect important patterns in data streams. This section is a part of Rahul Bachal's undergraduate thesis in Computer Science at the California Institute of Technology. The thesis was supervised by Prof. K. Mani Chandy.

# Introduction

Machine learning (ML) attempts to learn the characteristics of a system or process for cases where it is known that there is pattern between the inputs and outputs of a system but where the mathematical relationship between the inputs and outputs is not known and cannot be extracted[6]. Such a pattern is learned by training machine learning models, which are mathematical functions in a defined hypothesis space that are used to predict output values for input values.

A machine learning algorithm has two components: training and predicting. Given a training dataset, the algorithm first trains a model. It then uses the model to generate predictions. For example, predicting the center of an earthquake based on shake data from sensors requires using data from past earthquakes to train the model with shake data and location of sensors as input and center of earthquake as the output. Once the model is trained, it can be applied to new shake data from sensors to predict the location of an earthquake that has just started[10].

There are two main classes of machine learning: supervised learning and unsupervised learning. Supervised learning refers to scenarios where the training dataset has labeled outputs and the goal is to learn these outputs generally. The previous example using earthquake data uses supervised learning. Unsupervised learning refers to scenarios where the training dataset does not have labeled outputs and the goal is to learn patterns in the data. For example, in credit card fraud detection, the training data does not have any labels regarding if a transaction was an anomaly or not. Instead, the machine learning algorithm attempts to learn the underlying patterns in the transaction history to determine if the current transaction is an anomaly.

To abstract this process, we split machine learning algorithms into two components: training and prediction. This framework allows us to plug-and-play any machine learning algorithm by accepting user-defined functions that run training and prediction respectively. To use the framework effectively, you to take the following steps:

1. Determine the type of function desired to learn.

2. Implement a training function that uses a training dataset to train the function.

3. Implement a prediction function that uses the trained hypothesis function to generate a prediction value for an input.

These steps are general to machine learning; it is important that you recognize and separate a machine learning algorithm into these specific components.

The ML framework is extremely flexible, supporting both supervised and unsupervised learning. You only need to ensure that the training and prediction functions provided are consistent with each other.

There are two main ways you can use this framework. You can use functions defined in machine learning modules without any significant coding or you can write your own machine learning functions. We focus first on the simpler of these two – using functions defined in other machine learning modules.

# Part 1: Using machine learning modules with IoTPy

There are many Python modules for machine learning. For example, scikit-learn offers various functions and algorithms for supervised and unsupervised learning. To use these functions,you must write training and prediction functions that wrap them.

### Structure of a Training Function

A training function has the following structure:

def train_function(x, y, model, window_state): if not model: # Initialize model - constructor may be different model = Model() # Computations # Return model return model

All training functions will have the same signature. This enables the function to define and maintain an internal state to save the machine learning model. We first check whether the provided state is already initialized. If it is not, then we initialize it by setting it to be an instance of a Model class that is defined. This allows the state to hold any variables necessary for the machine learning model. At the end of this function, we return the updated state. This will be passed to the training function for the next window, allowing us to maintain the state between window transitions.

### Structure of a Prediction Function

A prediction function has the following structure:

def predict_function(x, y, model): # Computations # Return prediction value return value

All prediction functions will have the same signature. The prediction function receives the current state containing the machine learning model as updated by the training function. We can use this to predict a value for a given input. This function returns the prediction value. This function should not modify the state – the only function that modifies the state is the training function.

We begin with a simple example of supervised learning.

### Supervised Learning

Linear regression is a machine learning algorithm that tries to learn a linear model. For example, the figure below shows an example of linear regression:

INSERT EXAMPLE HERE

To run linear regression from scikit-learn, you must first write a training function. This function looks like:

def train(x, y, model, window_state): regr = linear_model.LinearRegression() regr.fit(x, y) return regr

This function uses scikit-learn to create a linear model trained on data and returns it.

Next, you must write a prediction function. This function looks like:

def predict(x, y, model): return model.predict(x)

The parameter model refers to the model returned from the training function.

You can use any function from scikit-learn or other modules similarly.

# Part 2: Incremental Algorithms

The ML framework allows users to run any machine learning algorithm by writing their own training and prediction functions or using existing algorithms such as scikit-learn. We describe two machine learning algorithms that we develop from scratch for streaming data. These algorithms have similar functionality as their standard counterparts written for static data. However, we show that our algorithms are more efficient and show better prediction accuracy. These functions also illustrate the extensibility available in ML for expert developers who have understanding of streaming systems.

The design of incremental algorithms can be explained using a simple example. When a sliding window moves over a stream, most of the values in the window stay the same. Consider a sliding window that looks at a stream [1, 5, 3, 6, 4, 2, ...]. A window with size 5 and step size 1 will look like [1, 5, 3, 6, 4] , then [5, 3, 6, 4, 2], and so on. From the first window to the next, we lose one value from the beginning, 1, and add one value to the end, 2. The rest of the values in the window stay the same. We can reuse computation between windows to learn more efficiently. This leads us to create native streaming machine learning algorithms that learn incrementally. These provide better performance and in some cases improved prediction accuracy due to the incremental learning.

Two such algorithms were developed: linear regression and kmeans. These algorithms were chosen for native streaming implementation as these are widely used for machine learning applications.

### Linear Regression

Linear regression is a ML algorithm that attempts to fit a linear function to model data.

Ordinary least squares linear regression is solved by the equation,

^{T}X)

^{-1}X

^{T}y

where X is the data matrix with dimensions n*m, y is the output vector with dimensions n*1, and w is the weight vector with dimensions m*1 for n = number of data points and m = number of features. This approach works for 1 feature and hence for data with 1 feature, we do linear regression using incremental matrix inversion. For data with more than 1 feature, however the cost of matrix inversion makes least squares linear regression unusable. We chose stochastic gradient descent algorithm for data with more than 1 feature. Stochastic gradient descent uses the gradient of the error function to find the optimal parameters of the model until a local minimum is found. Our implementation assumes that the local minimum for the weight vector does not change significantly when the sliding window shifts. This assumption is valid as long as the underlying data comes from similar distributions.

### KMeans

The KMeans algorithm works by initializing centers randomly and then assigning each point to its closest center. Each center is then updated to the mean of the data assigned to it. The algorithm iterates until the centroid assignment stabilizes. Mathematically, the algorithm performs the following steps:

- Initialize k random clusters.

_{1},..., m

_{k}

- Partition the data into k sets

_{1},..., S

_{k}

where for each,

_{i}={x

_{j}: ||x

_{j}-m

_{i}||

^{2}<=||x

_{j}-m

_{l}||

^{2}∀ l, 0<=l<k}

- Update each cluster

_{i}=(1)/(|S

_{i}|) ∑x

_{j}∈S

_{i}x

_{j}

The iteration repeats Steps 2 and 3 until the centroid assignment stabilizes. The vanilla batch algorithm performs the above steps for each shift of the window. However, in streaming data, since the data distribution is the same, the centroids do not move very far between windows. This allows us to improve the efficiency of the algorithm by using the previous centroids as the input for the next window. This ensures that if the new centroids are close to the previous ones, then the number of iterations needed to converge will be minimal.

To use a machine learning algorithm, call stream_learn with function=name, where name is the name of a class that contains train and predict functions. To use the incremental algorithms, call stream_learn with function = "ML.LinearRegression.LinearRegressionStream " for linear regression or "ML.KMeans.KMeansStream" for kmeans.