The Central Idea

Build Stream applications by using

Python + STREAMS + Agents


Streams: A stream is a list of arbitrary length which can be modified only by appending values to the tail of the list. A stream is a Python data structure, similar to a list. Later, we will see that a stream can also be implemented as a NumPy array of unbounded size.

s = Stream()

creates a stream s. You can give a stream a name and set its initial value as in:

s = Stream(name='temperature', initial_value=[75, 77])

which gives the name ‘temperature’ to the stream and sets its initial value to the list [75, 77]. A stream’s name can be useful for debugging but is not used in the code.


Agents: An agent is a persistent object that reads and modifies streams. An agent executes forever, extending its output streams when its input streams are extended. You can create an agent by calling a wrapper function from the IoTPy library.

a = wrapper(func, in_streams, out_streams, ....)

where wrapper is a function from the IoTPy library, func is any Python function such as those in Python’s libraries such as SciPy, in_streams is a list of the agent’s input streams, and out_streams is a list of the agent’s output streams. This statement creates an agent, a, which executes forever, waiting for its input streams to be extended and then extending its output streams by executing function func on a portion of its input streams. func is the familiar type of function: it terminates and operates on standard data structures such as lists and arrays.

Multicore and distributed Systems: A process and a VM (virtual machine) are instances of agents: they have input and output streams. An application for a multicore computer is built by connecting output streams of processes to input streams of processes. A distributed system consists of a network of multicore computers that communicate by passing messages. A multicore computer is represented as a VM, and a distributed application is created by connecting output streams of VMs to input streams of VMs.

Getting started

You need only basic Python 2.7 to build applications that operate on streams.

Download IoTPy and look at some of the examples. Download NumPy and other libraries only when you need them.  The "Getting Started" paragraphs in each section tell you which parts to read first to start building applications. You can execute and modify code in the examples package. To get a quick idea of the package look at the application “Aggregation and Analysis of Multiple Streams” in examples.

Rationale for IoTPy

1. Python Stream processing applications for beginners

Anybody who can write Python functions can use IoTPy to write multicore, distributed applications that analyze streams.

2. extensive python libraries for non-streaming applications

Powerful Python libraries exist in a variety of domains including data analytics, artificial intelligence, signal processing and science. Programs in these libraries apply a function to input data to obtain output and then terminate. By contrast, streaming applications are perpetual processes that analyze endless data streams without terminating. IoTPy helps to make the wealth of non-streaming Python software available for streaming applications; it does so by providing a collection of methods for encapsulating non-streaming programs to get applications that operate on endless streams. 

3. Ubiquity of data streams

Sensors, social media, news feeds, webcams and other sources generate streams of data. Many applications ingest and analyze data streams to control actuators, generate alerts, and feed continuous displays. IoTPy helps to build such applications.

4. Processing at the Edge and throughout the Network

Data-streaming applications are most efficient when data is processed right where it is generated, for example in a Raspberry Pi connected to a sensor. Even inexpensive computers, such as the Raspberry Pi 3, have multiple cores. IoTPy uses a single, simple construct - connect input and output streams together - to create applications with multiple threads, multiple processes and multiple VMs.

Disadvantages of IoTPy

IoTPy is limited to Python; it is not a general multi-language framework for streaming applications.

IoTPy has not been optimized for performance.  IoTPy does not have its own security protocols for execution on distributed computing systems; it uses the protocols of the platforms on which it runs. IoTPy uses a simple, but restricted, mechanism for concurrent programs; it does not exploit the full power of Python concurrency. 

IoTPy is a work in progress. The code will be released in the summer of 2019. Feel free to download and use the code while it is being tested and modified.

Example Applications

For a list of applications see "Event Processing: Designing IT Systems for Agile Companies" and  "Event Processing in Action".

APPLICAtions: THE INTERNET OF THINGS

IoT in factories, mines, offshore oil platforms, and traffic in cities generates streams of data. IoTPy helps engineers use existing libraries for engineering and business to build applications that detect critical events in streams of IoT data.

applications: CITIZEN SCIENCE

Many citizen-science projects deal with ingesting and analyzing streams of data generated by sensors. For example, the Caltech Community Seismic Network (CSN) uses dense networks of inexpensive sensors to monitor buildings, campuses, and regions. IoTPY helps scientists use existing, widely-used software libraries to build applications that continuously monitor and respond to the environment.

applications: ANALYZING SOCIAL MEDIA

Twitter and other social media sites generate streams of data. Analyzing this data often requires natural language processing (NLP), graph algorithms, and mechanisms for dealing with "Big Data." This software helps use non-streaming software in NLP and other domains to build applications that detect important events in streams of data generated by social media.

NEXT

Next, read the section that introduces the idea of wrappers.

WRAPPERS TO Create Agents