Article

Features extraction

In all articles dealing with handwriting recognition, several steps are discussed and described in more or less detail. These are the preprocessing, the segmentation, the feature extraction, the classification and the post-processing.

The purpose of preprocessing is to produce data adapted to the handwriting recognition system so that it operates as precisely as possible. The operations done are : binarization, noise reduction, stroke width normalization, skew correction, slant removal or measurement.

Segmentation is the decomposition of the graphic elements of the document scanned into lines, then words and finally, if necessary, characters or symbols.

Georgios Vamvakas [1] of the National Center for Scientific Research “Demokritos” in Athens, tells us that “in feature extraction stage each character, symbol, is represented as a feature vector, which becomes its identity“.

and he adds, “the major goal of feature extraction is to extract a set of features, which maximizes the recognition rate with the least amount of elements. Due to the nature of handwriting with its high degrees of variability and the imprecision, obtaining these features, is a difficult task.”

What are these features and how to build the so called feature vector to use it with Gaussian Mixture Models and Hidden Markov Models ?

Tin Merzouga, Algerian Sahara near the borders of Libya and Niger, Photo by George Steinmetz, some rights reserved.

From reading to reading we try to understand the methods and techniques used in handwriting recognition systems, and to implement them. The A2IA team [2] describes in their article a “recognizer based on a modeling of each letter with a hidden Markov model (HMM) using Gaussian distribution mixtures for the observation probabilities.”

Our goal is to implement such models to complement our previous approaches with neural networks and CTCs.

The features

In his presentation G. Vamvakas  talks about three types of features and their representations:

1. Statistical features

Statistical feature represents a symbol image by the statistical distribution of points taking care of style variations to some extent. The major features mentioned are zoning, projections and profiles, crossing and distances.

1.1 Zoning

The symbol image is divided int NxM zones from which features are extracted to build the feature vector. The zoning provides local characteristics instead of global ones.

There are many ways to cut out an image to build a feature vector as shown below. Here, the image size is not normalized to the size of the font face, as it should to optimize the feature vector.

Feature extraction by zoning : square

Figure 1 – Feature extraction squared zones.

Feature extraction vertical strips

Figure 2 – Feature extraction vertical strips

 Feature extraction horizontal strips

Figure 3 – Feature extraction horizontal strips

The values of the feature vector can be the sum of the pixel values for the zone, or the sum of a normalize pixel value or the number of foreground pixels, or the normalized number of foreground pixels.

1.2 Projections and profiles

Images are 2-D signal that can be represented as 1-D signal. These features, although independent to noise and deformation, depend on rotation. Projection histograms count the number of pixels in each column and row of a symbol image, or sum the value of each pixel in each column and row. The pixel value can be normalized.

Projection histograms can help to separate close symbols like “m” and “n” and other ambiguous pair of characters.

Projection histogram

Figure 4 – Projection histogram

The profile counts the number of pixels between the bounding box of the character image and the edge of the character. The profiles describe the external shapes of characters and allow to distinguish between a great number of letters, symbols such as “p” and “q”.

Projection histogram

Figure 4 – Projection histogram of symbol E

Profiles can also be used to the contour of the character image, locate the uppermost and lowermost points of the contour, and calculate the in and out profiles of the contour.

We have written a function called getExternalProfiles to get these features.

The arguments of this function are described above and the object returns as well.

The arguments start and leap are used to refine the length of the arrays returned. In our opinion, all rows and columns measures are not necessary.

Lines 25-26, we make a copy of the image, then get its sizes.

Lines 29-35, we look for a pixel value in the image to detect the transitions depending on background tone.

Lines 38-40, we normalize the image to have pixel value between 0 and 1.

Lines 42-71, we have a first loop to build the upper and lower profiles.

Lines 73-101, we have a second loop to build the left and right profiles.

If these functions are simple to write, the iterations are multiple. Repetition to extract the different information therefore leads to a lack of optimization. Moreover, we have not reduced the size of the area to be analyzed.

1.3 Crossing and distances

Crossings count the number of transitions from background to foreground pixels along vertical and horizontal lines through the symbol image, and distances calculate the distances of the first image pixel detected from the upper and lower boundaries, of the image, along vertical lines and from the left and right boundaries along horizontal lines.

A mixture of crossing and distances could also be used giving the distance between the edge of the first pixel of the symbol, then the distance of each transition like height 20 : 3, 6, 10, 12, 15, 18 and width 16 : 4, 7, 12, 15. In this case, the resulting feature vectors are more complex.

Crossing and distance features

Figure 6 – Crossing and distance features

As we did to extract the profiles, we wrote a function to extract the distances and crossings.

2. Structural features

Structural feature is based on topological and geometrical properties of the symbol, such as ratio, cross points, loops, branch points, strokes and their directions, inflection between two points, horizontal curves at top or bottom, and so on.

We have not studied this type of feature, yet.

3. Global transformations and moments

These features are provided by the computation of the Fourier transform of the contour of the image. As the first n coefficients of the FT can be used in order to reconstruct the contour. These n coefficients are considered to be a n-dimensional feature vector that represents the symbol. Moreover, the calculation of Central and Zenrike moments allows to reconstruct completely the original image.

After reading several articles and thesis about Fourier transforms and Hu and Zernike moments, we are inclined to think that their use for handwriting recognition can lead to interesting results.

As Richard O. Duda, Peter E. Hart and David G Stork wrote in the introduction of their book Pattern recognition, we have to answer to those three questions:

How do we know which features are most promising? Are there ways to automatically learn which features are best for the classifier? How many shall we use?

We need to delve deeper into a number of topics before going further in the recognition of handwriting in the context of the transcription of the decennial tables of civil status registers.

References

[1] – G. Vamvakas, B. Gatos, I. Pratikakis, N. Stamatopoulos, A. Roniotis and S.J. Perantonis, “Hybrid Off-Line OCR for Isolated Handwritten Greek Characters”, The Fourth IASTED International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA 2007), ISBN: 978-0-88986-646-1, pp. 197-202, Innsbruck, Austria, February 2007.

[2] – Farès Menasri, Jérôme Louradour, Anne-Laure Bianne-Bernard and Christopher Kermorvant, (2012) The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition.

[3] – Gernot A. Fink, Markov Models for Pattern Recognition, From Theory to Applications, [2008] Springer, ISBN 978-3-540-71766-9.

[4] – John Alexander Edwards, Easily Adaptable Handwriting Recognition in Historical Manuscripts, [2007] Berkeley, No. UCB/EECS-2007-76.

[5] – Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner, Gradient-Based Learning Applied to Document Recognition, [1998] IEEE.

Resources

[1] Centre National de Ressources Textuelles et Lexicales (CNRS – UMR ATILF) – France – http://www.cnrtl.fr

[2] IAM handwriting database – http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

[3] Rimes handwritten database – http://www.a2ialab.com/doku.php?id=rimes_database:start

Navigation

Social Media