ONNX Model format in Flink

Alexey Novakov published on January 10, 2025

7 min, 1352 words

The most popular eco-system to train ML model these days is Python and C/C++ based libraries. An example of model training can be a Logistic Regression algorithms, from ScikitLearn package or more advanced neural networks algorithms offered by Tensorflow, Pytorch and others. There are lots of tools and libraries in Python world to facilitate training and model serving.

In order to bring trained models in Keras or SciKitLean into a Flink application, we can use cross-platform file formats such as ONNX and PMML (in the past). These formats also come with language runtimes. In Flink case, we select JVM SDK to run inference logic inside the Flink job.

Let's look at the example on how to train Logistic Regression in Python using Keras and then use trained model in ONNX format inside Flink.

Categories: scala

Tags: flink machine learning scala tensorflow onnx

Machine Learning with Flink

Alexey Novakov published on November 23, 2024

14 min, 2777 words

If you want to add Machine Learning capabilities into your Flink job then this article is for you. As Flink runs on Java Virtual Machine, we are constrained by the tools which JVM supports. However, there are still plenty of options to choose in order to perform model training and inference as part of a Flink job.

Categories: scala

Tags: flink machine learning scala tensorflow pytorch

Decision Tree from scratch

Alexey Novakov published on December 21, 2021

5 min, 817 words

Cropped view of one the region in the middle of the tree we will build further

Decision Tree classifier is one the simplest algorithm to implement from scratch. One of the benefit of this algorithm is it can be trained without spending too much efforst on data preparation and it is fast comparing to more complex algorithms like Neural Networks. In this blog post we are going to implement CART algorithm, which stands for Classification and Regression trees. There are many other algorithms in decision trees space, but we will not describe them in this blog post.

Data science practitioners often use decision tree algorithms to compare their performance with more advanced algorithms. Although decision tree is fast to train, its accuracy metric usually lower than accuracy on the other algorithms like deep feed forward networks or something more advanced using the same dataset. However, you do not always need high accuracy value, so using CART and other decision tree ensemble algorithms may be enough for solving particular problem.

Categories: scala

Tags: machine learning algorithms

Face Identification with VGGFace and OpenCV

Alexey Novakov published on May 22, 2021

6 min, 1108 words

Face detection and recognition is one the area where Deep Learning is incredibly useful. There are many studies and datasets related to human faces and their detection/recognition. In this article we will implement Machine Learning pipeline for face detection and recognition using few libraries and CNN model.

Categories: scala

Tags: deep learning machine learning computer vision cnn

Convolutional Neural Network in Scala

Alexey Novakov published on April 02, 2021

7 min, 1255 words

Last time we used ANN to train a Deep Learning model for image recognition using MNIST dataset. This time we are going to look at more advanced network called Convolutional Neural Network or CNN in short.

CNN is designed to tackle image recognition problem. However, it can be used not only for image recognition. As we have seen last time, ANN using just hidden layers can learn quite well on MNIST. However, for real life use cases we need higher accuracy. The main idea of CNN is to learn how to recognise object in their different shapes and positions using specific features of the image data. The goal of CNN is better model regularisation by using convolution and pooling operations.

Categories: scala

Tags: deep learning machine learning convolution computer vision image recognition

MNIST image recognition using Deep Feed Forward Network

Alexey Novakov published on March 12, 2021

8 min, 1593 words

Deep Feed Forward Neural Network is one of the type of Artificial Neural Networks, which is also able to classify computer images. In order to feed pixel data into the neural net in RBG/Greyscale/other format one can map every pixel to network inputs. That means every pixel becomes a feature. It may sound scary and highly inefficient to feed, let's say, 28 hieght on 28 width image size, which is 784 features to learn from. However, neural networks can learn from the pixel data successfully and classify unseen data. We are going to prove this.

Please note, there are additional type of networks which are more efficient in image classification such as Convolutional Neural Network, but we are going to talk about that next time.

Dataset

Wikipedia MnistExamples

Categories: scala

Tags: deep learning machine learning MNIST images

Linear Regression with Adam Optimizer

Alexey Novakov published on February 24, 2021

6 min, 1077 words

Adam is one more optimization algorithm used in neural networks. It is based on adaptive estimates of lower-order moments. It has more hyper-parameters than classic Gradient Descent to tune externally

Good default settings for the tested machine learning problems are:

α = 0.001, // learning rate. We have already seen this one in classic Gradient Descent.
β₁ = 0.9,
β₂ = 0.999
eps = 10−8.

Categories: scala

Tags: deep learning machine learning linear regression Adam Picta

Linear Regression with Gradient Descent

Alexey Novakov published on February 20, 2021

5 min, 914 words

In this article we are going to use Scala mini-library for Deep Learning that we developed earlier in order to study basic linear regression task. We will learn model weights using perceptron model, which will be our single unit network layer that emits target value. This model will predict a target value yHat based on two trained parameters: weight and bias. Both are scalar numbers. Weights optimization is going to be based on implemented Gradient descent algorithm:

Model equation:

y = bias + weight * x

Categories: scala

Tags: deep learning machine learning linear regression

TensorFlow Scala - Linear Regression via ANN

Alexey Novakov published on February 13, 2021

8 min, 1550 words

TensorFlow Scala is a strongly-typed Scala API for TensorFlow core C++ library developed by Anthony Platanios. This library integrates with native TensorFlow library via JNI, so no intermediate official/non-official Java libraries are used.

Categories: scala

Tags: deep learning machine learning tensorflow

Artificial Neural Network in Scala - part 2

Alexey Novakov published on February 05, 2021

10 min, 1981 words

In this article we are going to implement ANN from scratch in Scala. It is continuation of the first article, which describes a theory of ANN.

This implementation will consist of:

Categories: scala

Tags: deep learning machine learning gradient descent