Article

Orientation detection

After detecting and correcting the skew of the scanned document, a question arises. Is it in the right direction? Is it right side up or upside down?

In 2010, in their article “Combined orientation and skew detection using geometric text-line modeling”, Joost van Beusekom, Faisal Shafait and Thomas M. Breuel [7] cite various works and methods to detect skew, to detect orientation or to detect both of them in digitized documents.

For orientation detection methods

  • in 1998, R.S. Caprari [8] used an asymmetry measure computed from projection profiles.
  • in 2005, H. Aradhye [9] took advantage of ratio between left and right opening of text blobs.
  • in 2006, Lu et al. [10] employed the distribution of the number and position of white to black transitions of the elements in the line.

For skew and orientation detection methods

  • in 1995, Bloomberg et al. [11] presented a method for orientation detection that uses the ratio of the number of ascender characters to the number of descender characters used in English.
  • in 2005, Avila et al. [12] used the x-height, the base line, the number of ascender and descender to detect the orientation.
  • in 2006, Lu et al. [13] employed ascender to descender ratio to detect orientation.

We have to introduce some elements of typography before talking about the method we implemented to detect the orientation of our scanned document.

Mang children, Vietname

Mang children, Phong Tho, Lai Chau, Vietnam, 1997

Calligraphy and Typography metrics

In Calligraphy and Typography the metrics of a font have been defined for a long time. Both of them use the concepts of: baseline, ascender, descender, height, capital, serif, slant, height, width or spacing.

The here below picture is borrowed from the M.I.T website .

Text-line anatomy

Figure 1 – The anatomy of a text-line in Roman script.

In her monograph called “Metrics and neatening of handwritten characters” María Teresa Infante Velázquez [14] describes the metrics.

Baseline

All start with the baseline, Infante Velázquez says:

The baseline is the imaginary line on which most of the characters are based. Generally, contiguous characters retain their alignment using the baseline as reference. Although certain characters extend below or above this line, it continues as the imaginary base. This line can be considered as the most important, since it is the guide for writing and it is used also as reference to obtain all the heights of the characters.

Typography baseline

Figure 2 – Typography baseline

Mean line and x-height

The mean line is used to define the height (also called mean height or x-height) of the lowercase letters and it is located over the baseline. It is important to mention that not all the lowercase letters lie only between these two lines, some extend above the mean line or below the baseline. The characters that fall in those circumstances continue to take the mean line as a reference, since it provides the information to obtain their height.

Ascender line and ascender height

The ascender line helps to measure the height of lowercase letters (ascender height) which go beyond the mean / x-line, as in the case of the letters “l”, “h”, “t”, to name a few. Not to be confused with the line that it is used to calculate the height of capital letters, this will be addressed next. This line generally is located above all the lines used in character metrics. The ascender height is the space between the baseline and the ascender line.

Descender line and descender height

The descender line is located underneath the base line. It is useful for characters such as the letter “j”, “g” or “y” among others, as it helps to mark the distance that these type of characters descend below the baseline.
The height refers to the distance between the baseline and the descender line and not to the whole character like in the previously mentioned heights.

The picture below shows all the lines for metrics.

lines for metrics

Figure 3 – Typography lines for metrics.

and the next one shows the heights.

character metrics

Figure 4 – Typography character metrics.

We will discuss other metrics in future articles about handwriting tilt detection and word and letter segmentation.

The orientation can be detected by “measuring” the shape of the signal of each lines of a digitized document. Therefore, we have to search these baselines as accurately as possible.

Finding the baseline

The method presented by R.S. Caprari [8] in 1998, or an approximate one, will be used to locate the baselines in the page. That is, we will use a horizontal projection profile to locate baselines. The writing in the registers being rather regular, we will be able to calculate the average leading to correct the first measurement.

Projection profile is defined as follow by Mr. Saurav Jha:

The Projection profile of an image in a particular direction refers to the running sum of the pixels in that direction.

In context of text processing, horizontal projection profile is needed to identify or separate out the lines of a text. since, the profile exhibits valley points at line boundaries and the location of these minima points mark the line boundaries. For binary ( black and white) images, these are the points where the profile goes to zero.

The image below illustrate the horizontal projection profile of a page after conversion to gray scale and thresholding. Therefore, the hollow points mark the darker parts of each line.

Horizontal projection profile

Figure 5 – Horizontal projection profile

When using the b-spline function from scipy signal, you get a smoother representation of the projection profile. The plots below shows the histogram, the cspline2d and its derivatives.

The red dots in the representation of the cspline2d function of the image show the darkest rows of it, and the blue dots the lighter ones.

The red dots in the representation of the first derivative of the B-spline function show boundaries, an approximation of the baselines of the text and between dark and light areas. We will use this hollow points to find our baselines.

HPP cspline2d

Figure 6 – Horizontal projection profile, cspline2d and derivatives

Finding the mean line or x-height line

To find the mean line, the x-height, we use also a horizontal projection profile of a line. The red dot

Horizontal projection profile

Figure 7 – Horizontal projection profile

As you can see in the following image, the height of lowercase letters is not constant. An approximation of this height can be determined for the line. For more precision, segmentation of the line in words is necessary.

A rough approximation of the metrics of the line is shown below.

line metrics

Figure 8 – line metrics

A python script to find the orientation

Here below the script called page_find_orientation.py   where the method is implemented.

The usage of it will be as follow : python page_find_orientation.py --image image --output folder

In the original script the lines 1-21 are comments.

We put some functions like convert_to_grayscale, find_orientation and analyze_image_sample  into the “utils” module.

We find the usual arguments, the path to the image to be loaded and the path to the folder to save the corrected image.

Lines 37, we load the image to process.

Line 40, we call the function h7as.find_orientation  to find the orientation of the page.

Line 45, we change the orientation if the image was up side down.

The function takes two parameters: the image to process and the number of samples to analyze.

In order to find the orientation of the document we start from the hypothesis that it is up side down. After a preprocessing, we search the baselines in a clumsy way, then we make two verifications on samples extracted on the right of the document, then on the left of the document.

Lines 15-23, we prepare the image to compute the third-order B-spline coefficients.

Line 27, we get those coefficients. The lambda argument is set to 8.0 and specifies the amount of smoothing.

Line 30, we compute the horizontal projection profile of those coefficients to get the hollows of the signal line 38 with the function argrelextrema . The argument np.less tells that we are looking for the valley points, and the argument order=16  gives the number of points  on each side to use for the comparison. Depending on the value of this parameter we will find more or less hollows. Here, we are looking for the main variations.

Lines 44-48, we use the valley points found to calculate a leading. It is not very precise, but sufficient to determine with correct reliability the orientation of the document. In another article we’ll see how to improve accuracy.

Lines 54-79 and Lines 93-110, we analyze samples of the document and calculate the orientation.

Lines 119-121, we compare the results of each analysis, then return the most likely. 

Lines 60 and 99, we call the function analyze_image_sample  so lets have a look to it.

This function needs to be rewritten and simplified for clarity, but for now we will keep it as  this.

Lines 9-12 and 24-27, we calculate approximate typography/calligraphy metrics as follow for up side down document:

  • baseline: we slightly move up the baseline compared to the value found previously,
  • descender: we calculate this value as the baseline +/- the average height line multiple by 0.64. In typography this ratio depends on the font size. In calligraphy this ratio depends for each font on the nib width. We measured several samples and found a ratio of 0.44 to 0.64 in relation to leading.
  • ascender: we calculate this value as the baseline +/- the average height line multiple by 0.84. We measured several samples and found a ratio of 0.35 to 0.84 in relation to leading.
  • x-height: we calculate this value as the baseline +/- the average height line multiple by 0.4. We measured several samples and found a ratio of 0.14 to 0.4 in relation to leading.

Lines 15-20 and 30-32, we get a sample from the grayed image.

Lines 37-47, we calculate the cspline2d and its derivatives to get the peaks and hollows of the signal (horizontal projection profile). We are looking for the greatest number of hollows and peaks.

Lines 50-63, we search the highest peak and its predecessor.

Lines 66-71, we search the deepest hollow. This valley point is, usually, not far from the baseline.

Lines 75-89, we calculate the distance between the baseline and the highest peak and its predecessor to estimate the orientation. On right side up lines, the highest peak is after the deepest valley point. It’s the opposite for the upside down lines.

The plots below show three samples, one is upside down and the others are right side up.

For the first one, the highest peak on the horizontal projection profile and the second derivative is before the deepest valley point.

Upside down line

Figure 9 – Upside down line

For the second one, the highest peak on the horizontal projection profile and the second derivative is after the deepest valley point.

Right side up line

Figure 10 – Right side up line

For the third one, the highest peak on the horizontal projection profile is before the deepest valley point and the highest peak on the second derivative is after the deepest valley point. The analysis function gives an erroneous result that can be easily corrected taking into account the values of the second curve.

Right side up line

Figure 11 – Right side up line, but wrong analysis.

If the method gives fairly results, it does not have good reliability for three reasons.

  • The first concerns the search for baselines which is not accurate enough.
    • To solve this inaccuracy, we will search the baselines in two passes with a correction of the average line height and the calculation of the distances between the hollower points and the moderately hollow points. We will remove some lines from our list.
  • The second is related to sampling which is clumsy, rough.
    • If baselines are more accurate, so will sampling. But we need to determine more precise ratios.
  • The third is that in the sample analysis function we only consider the second derivative of the profile.
    • For this third point, we will see to what extent it is necessary to use the other curves.
    • Reliability has already been improved between the script used to produce plots and the function shown.

In this article we have presented some elements to consider to detect the orientation of a scanned document. This is a first draft, we will work on its improvement in the rest of this study.

References

[1] – Bruno STUNER, Clément CHATELAIN, Thierry PAQUET. Handwriting recognition using Cohort of LSTM and lexicon verification with extremely large lexicon. Normandie University, 2017. English arXiv:1612.07528v4.

[2] R. Plamondon, S. N. Srihari, Online and off-line handwriting recognition: a comprehensive survey, IEEE PAMI 22 (1) (2000) 63–84.

[3] – Théodore Bluche. Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. Computers and Society [cs.CY]. Université Paris Sud – Paris XI, 2015. English. <NNT : 2015PA112062>.

[4] – Jesper Dürebrandt. Segmentation and Beautification of Handwriting using Mobile Devices, Uppsala Universitet, 2015, English. <UPTEC F 15016>.

[5] – Ivor Uhliarik. Handwritten Character Recognition Using Machine Learning Methods, Comenius University, 2013, English.

[6] – Antoni Buades, Bartomeu Coll, Jean-Michel Morel – Non-Local Means Denoising, IPOL CNRS, 2011, ISSN 2105–1232. 

[7] – van Beusekom J., Shafait F., Breuel T. M., Combined orientation and skew detection using geometric text-line modeling, in: Proceedings of the ICDAR, 2010, pp. 79-92. 

[8] – R. S. Caprari, Algorithm for text page up/down orientation determination, Pattern Recognition Letters 21 (4) (2001) 311-317.

[9] – H. Aradhye, A generic method for determining up/down orientation of text in roman and non-roman scripts, Pattern Recognition 38 (11) (2005) 2114-2131. 

[10] – S. Lu, C. L. Tan, Automatic document orientation detection and categorization through document vectorization, in: Proc. of the 14th ACM Int. Conf. on Multimedia, New York, NY, USA, 2006, pp. 113-116. 

[11] – D. S. Bloomberg, G. E. Kopec, L. Dasari, Measuring document image skew and orientation, in: Proc. of SPIE Document Recognition and Retrieval II, San Jose, CA, USA, 1995, pp. 302-316. 

[12] – B. T. Avila, R. D. Lins, A fast orientation and skew detection algorithm for monochromatic document images, in: Proc. of the 5th ACM symposium on Document engineering, New York, NY, USA, 2005, pp. 118-126. 

[13] – S. Lu, J. Wang, C. Tan, Fast and accurate detection of document skew and orientation, in: Proc. of the 9th Int. Conf. on Document Analysis and Recognition, pp. 684-688. 

[14] – María Teresa Infante Velázquez,  , Metrics and neatening of handwritten characters, University of western Ontario, 2010, pp. 18-23.

Resources

[1] IAM handwriting database – http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

[2] HASYv2 – handwritten symbol database – https://zenodo.org/record/259444#.WdUj3CVPlPY

[3] Rimes handwritten database – http://www.a2ialab.com/doku.php?id=rimes_database:start

[4] Bentham handwritten database – http://www.transcriptorium.eu/~tsdata/BenthamR0/

Leave a Comment

Your email address will not be published. Required fields are marked *

52 − 45 =

Navigation

Social Media