Face Identification with VGGFace and OpenCV

Alexey Novakov published on

10 min, 1853 words

Categories: scala

Face detection and recognition is one the area where Deep Learning is incredibly useful. There are many studies and datasets related to human faces and their detection/recognition. In this article we will implement Machine Learning pipeline for face detection and recognition using few libraries and CNN model.

Pipeline

One the part will be implemented with very popular C++ library OpenCV, which is around for a long time. It has many modules for image processing, object classification, neural networks and more. We are going to use its Java Wrapper - JavaCV

OpenCV comes with human face detector module, which is called "Haar Cascade" classifier. This class takes an image and returns a Rect object of the detected face(s) in it. The Rect object is a data structure that has X and Y coordiantes of the left-top corners plus width and height where region of interest is located. In our case, rectangular area is a face. For example:

Original photo:

Cropped with Haar Cascade:

Once we get an area of the detected face(s), we can compare its pixel data with in advance extracted face features of known faces. Comparison algroithm calculates a Euclidean distance between detected face vector with vectors of known faces. The smallest distance with some known face sets its label as a result. That means, we can take known face label that has smallest distance as a result of face identification. Our workflow will look like this:



Photo Preparation

In order to extract person features, we will prepare separate folder with image per each person:

Input directory:

raw_photos
├── guy_ritchie
└── tom_arraya

Output folder:

dataset-people
├── guy_ritchie
└── tom_arraya





Plus I am adding my photos to dataset-people directory

dataset-people
├── alexey
├── guy_ritchie
└── tom_arraya

Below program reads photos from the local folder and crops faces to save them to a separate folder:

//cropFaces.scala

import org.bytedeco.opencv.opencv_core.{Rect, Mat, Size}
import org.bytedeco.opencv.global.opencv_imgproc.resize
import org.bytedeco.opencv.global.opencv_imgcodecs.{imread, imwrite}

import java.nio.file.{Files, Paths}

def createIfNotExists(path: String) =
  if !Files.exists(Paths.get(path)) then 
    Files.createDirectory(Paths.get(path))

@main
def crop() =
  val datasetDir = "dataset-people"
  createIfNotExists(datasetDir)

  val dirs = Paths.get("raw_photos").toFile.listFiles.filter(f => !f.getName.startsWith("."))

  for dir <- dirs do
    val label = dir.getName
    println(s"Extracting faces for '$label' label")

    createIfNotExists(Paths.get(datasetDir, label).toString)
    val images = dir.listFiles.filter(_.toString.endsWith(".jpg"))

    for file <- images do
      println(s"Reading file: $file")
      val image = imread(file.toString)
      val faces = detectFaces(image)  

      for (face, i) <- faces.get.zipWithIndex do        
        val crop = Rect(face.x, face.y, face.width, face.height)
        val cropped = Mat(image, crop)
        resize(cropped, cropped, Size(ImageHeight, ImageWidth))  
        val filename = Paths.get(datasetDir, label, s"$i-${file.getName}").toString
        println(s"Writing $filename")
        imwrite(filename, cropped)

The same code on GitHub is here.

Get ONNX model

Now we can extract face features for all three persons. We are going to use CNN model which was trained on the VGGFace dataset. The easiest option to access Keras (which is Python high-level API to Tensorflow) trained model from Scala is to export it to ONNX format. Let's proceed:

  1. Create SavedModel file:
  • Instaniate Python VGGFace class from keras-vggface library with include_top=False to skip last layer of the CNN.
  • Save instaniated model to Tensforflow Tensorflow SavedModel
from keras_vggface.vggface import VGGFace

IMAGE_HEIGHT = 224
IMAGE_WIDTH = 224
COLOR_CHANNELS = 3

output_path = 'vggface_model'
model = VGGFace(model='vgg16',
                include_top=False,
                input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, COLOR_CHANNELS),
                pooling='avg')
model.save(output_path)

Important: we save VGGFace model without last layer. We do not need any predictions that VGGFace model originally was supposed to output, since we are going to use extracted features to predict other persons than those in original VGGFace dataset.

  1. Convert saved model to ONNX format via tensorflow-onnx library
python -m tf2onnx.convert \
  --saved-model vggface_model \
  --output data/model.onnx \
  --tag serve

After converting SavedModel to ONNX we get file data/model.onnx of approx. 56Mb in size:

Extracting Features

Now we can use ONNX model from Scala code. In this step, we use VGGFace model to extract features of all 3 person faces and save them into a file as HasMap for further step.

First we implement common functions, which we will use one more time for real-time face identification algorithm:

// common.scala
import io.kjaer.compiletime.*

import org.bytedeco.javacpp.indexer.{UByteIndexer, FloatRawIndexer}
import org.bytedeco.opencv.global.opencv_core.*
import org.bytedeco.opencv.opencv_core.{Mat, Scalar, RectVector, UMat}
import org.bytedeco.opencv.global.opencv_imgproc.*
import org.bytedeco.opencv.opencv_objdetect.CascadeClassifier

import org.emergentorder.compiletime.*
import org.emergentorder.onnx.Tensors.*
import org.emergentorder.onnx.backends.*

import java.nio.file.{Files, Paths, Path}
import io.bullet.borer.Cbor
import java.io.{ByteArrayOutputStream, File}

val OutputSize: Dimension = 512
val FeatureFilePath = "data/precomputed_features.cbor"

type Features = Map[String, Array[Float]]

def saveFeatures(features: Features) =
 val file = File(FeatureFilePath)
 Cbor.encode(features).to(file).result

def getModel(path: Path = Paths.get("data", "model.onnx")) =
 val bytes = Files.readAllBytes(path)
 ORTModelBackend(bytes)

def predict(
 images: Array[Float], 
 model: ORTModelBackend, 
 batch: Dimension = 1, 
 outputSize: Dimension = OutputSize
) =
 val input = Tensor(images, tensorDenotation, tensorShapeDenotation, shape(batch))
 model.fullModel[
   Float, 
   "ImageClassification", 
   "Batch" ##: "Features" ##: TSNil, 
   batch.type #: outputSize.type #: SNil](
     Tuple(input)
   )

def scale(img: Mat): Mat =
 val out = Mat()
 img.assignTo(out, CV_32FC4)
 subtract(out, Scalar(93.5940f, 104.7624f, 129.1863f, 0f)).asMat

def toArray(mat: Mat): Array[Float] =
 val w = mat.cols
 val h = mat.rows
 val c = mat.channels
 val rowStride = w * c

 val result = new Array[Float](rowStride * h)
 val indexer = mat.createIndexer[FloatRawIndexer]()
 var off = 0
 var y = 0
 while (y < h)
   indexer.get(y, result, off, rowStride)
   off += rowStride
   y += 1

 result

At this point we can run extraction step:

//extract.scala
import io.kjaer.compiletime.*
import org.emergentorder.compiletime.*
import org.emergentorder.onnx.Tensors.*
import org.emergentorder.onnx.backends.*
import org.bytedeco.opencv.global.opencv_imgcodecs.*
import org.bytedeco.opencv.opencv_core.{Mat, Scalar}

import java.nio.file.{Files, Paths}
import java.io.File
import javax.imageio.ImageIO
import scala.collection.parallel.CollectionConverters.*

def label2Features(dirs: Array[File]) = 
  lazy val model = getModel()
  val batchSize = 16

  dirs.par.map { dir =>
    val label = dir.getName
    println(s"Extracting features for '$label' at $dir folder")

    val groups = dir.listFiles.grouped(batchSize)
    val features = groups.map { files =>
      val images = files.map(f => toArray(scale(imread(f.toString)))).flatten
      val currentBatch = files.length.asInstanceOf[Dimension]      
      val out = predict(images, model, currentBatch)      
      out.data.grouped(OutputSize).toList
    }  
    label -> features.toList.flatten
  }

@main
def extract =
  val dirs = Paths.get("dataset-people").toFile.listFiles

  val avgFeatures = label2Features(dirs).map {
    (label, features) => 
      val count = features.length      
      val sum = features.reduce((a, b) => a.zip(b).map(_ + _))
      label -> sum.map(_ / count)
  }.toList.toMap

  saveFeatures(avgFeatures)

In the extract function, we do element-wise addition of all extracted vectors per each person. Then, final person vector is divided by number of images for that person to get average values of the extracted features. In the result we will get/define such type alias:

type Features = Map[String, Array[Float]]

Identify Faces

Finally, we can use our extracted features to identify a person face. Keep in mind, that our prediction algorithm knowns only 3 person faces. Any other person may be confused with one of the three known faces or they can be unknown. If we want to more different people to be identifieable, we need their features as well, so go to step number one of our pipeline to collect those people photos and extract their features.

// main.scala
import org.bytedeco.opencv.opencv_core.{Mat, Size, Rect, Point, Scalar}
import org.bytedeco.opencv.global.opencv_imgproc.*
import org.bytedeco.opencv.opencv_videoio.VideoCapture
import org.bytedeco.javacv.{CanvasFrame, OpenCVFrameConverter}

import javax.swing.WindowConstants

def createCavasFrame = 
 val frame = CanvasFrame("Detected Faces")
 frame.setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE)
 frame.setCanvasSize(1280, 720)
 frame

def calcLabel(face: Array[Float], features: Features, threshold: Int = 100) = 
 features.foldLeft("?", Float.MaxValue){ 
   case ((label, min), (l, f)) => 
     val d = distance(face, f)
     if d < threshold && d < min then (l, d)
     else (label, min)
 }._1  

def drawLabel(label: String, frame: Mat, topLeft: Point) =
 val x = math.max(topLeft.x - 10, 0)
 val y = math.max(topLeft.y - 10, 0)
 val font = FONT_HERSHEY_SIMPLEX
 val thickness = 2
 val fontScale = 1.0
 val baseline = new Array[Int](2)
 val size = getTextSize(label, font, fontScale, thickness, baseline)
 val rectColor = Scalar(255, 0, 0, 0)
 rectangle(
   frame,
   Point(x, y - size.height() - thickness),
   Point(x + size.width() - thickness, y + 10),
   rectColor,
   CV_FILLED,
   LINE_8,
   0)
 val fontColor = Scalar(0, 255, 0, 0)
 putText(frame, label, Point(x, y), font, fontScale, fontColor, thickness, CV_FILLED, false)

def toModelInput(crop: Rect, frame: Mat) =
 val cropped = Mat(frame, crop)
 resize(cropped, cropped, Size(ImageHeight, ImageWidth))
 toArray(scale(cropped))

def drawRectangle(face: Rect, frame: Mat) =
 rectangle(frame,
   Point(face.x, face.y),
   Point(face.x + face.width, face.y + face.height),
   Scalar(0, 255, 0, 1)
 )

In the main function we run infinite loop that captures video frame as an image. The captured image is then used to detect and identify person faces.

@main
def demo() =
 val capture = VideoCapture(0)
 val canvasFrame = createCavasFrame  
 val frame = Mat()
 val converter = OpenCVFrameConverter.ToMat()
 val model = getModel()
 val features = loadFeatures

 try
   while capture.read(frame) do
     val faces = detectFaces(frame)
     
     for face <- faces.get yield
       drawRectangle(face, frame)
       val crop = Rect(face.x, face.y, face.width, face.height)
       val image = toModelInput(crop, frame)
       val faceFeatures = predict(image, model).data
       val label = calcLabel(faceFeatures, features)
       drawLabel(label, frame, crop.tl)

     canvasFrame.showImage(converter.convert(frame))                              
 finally
   capture.release
   canvasFrame.dispose

Demo time





If I put more faces into the frame, identification algorithm may confuse some of them and identify someone with a beard as Tom Araya or someone with bright white skin color as Guy Ritchie. In order to overcome such issue, you need to add more different faces. Also, we would need to tune threshold parameter, which is used to discard faces which are far away from those we are interested in. Level of certantity is relative of course, there can be still many people in the world who have very similar face features like Tom, Guy or me.

Summary

We have made powerful application with so little code to identify some person faces. There are several libraries were used to get the face detection working, such as OpenCV. ONNX-Scala to access ONNX model fromm Scala. Borer library to save and load face features as Scala object (HashMap) into memory from disk.

Current approach to identify faces by calculating Euclidian distance between input faces and pre-calculated face is not the only one. We could also train custom CNN or VGGFace-based model with new layer to predict labels for Tom, Guy and myself. However, such approach is compute intensive and actually gave me quite bad results. If you know something crucial about this approach to work well, please let me know.

Other approaches to solve face identification task which you may want to get familiar with are Triplet Loss function and Siamese Networks. Perhaps, I will try one of them next time.

Links

You can find complete project code at GitHub:

https://github.com/novakov-alexey/face-identification/tree/main/src/main/scala