Developing a toolbox for ML with time series { Outreachy }

Developing a toolbox for ML with time series { Outreachy }

Photo by Laura Ockel on Unsplash

This summer I'm working with Sktime - a unified framework for machine learning with time series. It's a python toolbox of time series algorithms for building, tuning and validating time series models. Now, let's break it down:

What's time series?

Time series is a data set that tracks a variable over some period of time. The variable can be anything that changes over time: global temperature, number of airline passengers, heart rate. The period and spaces between the points in time can also vary: a data set can represent monthly temperature fluctuations over the last hundred years or a heart rate over 15 minutes, measured at 0.5 second intervals.

1.png A time series: monthly totals of international airline passengers (in thousands), from 1949 - 1960.

Machine learning + time series data

Using machine learning tools we can analyze time series to figure out the logic, patterns behind the values - namely, build a model. Then we make use of it to solve a machine learning problem.

For instance, you're trying to predict the temperature for the next day, in other words, you want to forecast next values based on available data. You use an algorithm to create the best predictive model based on the this data you have. Here's a simple model: the algorithm can try different slopes and shifts of a straight line and check how close this line is to the existing values. Once you're satisfied with the match you can continue this line into the future (see figure below) and there are your predictions! This simple model (called "linear regression") is just one of many ways to make predictions.

Untitled Diagram.png Blue dots - available values we use to build the model. We predict future values - green dots - by continuing this line.

Implementing a model from scratch isn't an easy task. That's when toolboxes like Sktime come in! The library has a collection of models that you can train on the data at hand and then validate how well they work without understanding their implementation details.

Making these ML tools is what the development team is working on. There are many Python libraries and each focuses on a certain problem scope. NumPy, for instance, adds high-level mathematical functions for working with large arrays and matrices. In the case of Sktime, we develop the tools for machine learning with specific data sets - time series.

Contributing to the library

michael-marais-Wall-E-snowed-in.jpg Photo by Michael Marais on Unsplash

As a part of the team I work mainly on the code, either implementing new algorithms or refactoring existing code. The new tools usually come from scientific findings - say, there's a new forecasting model designed: I read the paper on that and write the code that applies the algorithm. It can be anything from data transformation recipe to hyper parameter optimization. Often the implementation already exists in another language, like R, then I just "translate" it to Python.

My other task is refactoring, restructuring the existing code to make it modular, compatible with new components. For instance, supporting a new input data format might require adding a function that would handle it. Another reason for refactoring might be to keep the library uniform and easy to use. In the best case the user shouldn't have to care about internal structure of different models, because they all support the same functionality.

To me this work is very fulfilling, because it's best suited for my abilities, yet at the same time it's quite challenging. I'm very glad to be a member of a team that develops a library. First of all, this software is very versatile, since it can be applied to solve many problems. Second, it's open-source, so virtually anyone can use it, and I like the idea of a big impact. Third, I love how friendly and enthusiastic the community is, it feels like you're not just random people working each on their own stuff but you are in a real team. We communicate, give each other feedback and support, discuss our progress and blockers. All this is fascinating and exciting, and I'm thankful that I got this opportunity. I've learned and am learning so much!

Cheers