IT Engeneering

After detecting and correcting the skew of the scanned document, a question arises. Is it in the right direction? Is it right side up or upside down?

In 2010, in their article “Combined orientation and skew detection using geometric text-line modeling”, Joost van Beusekom, Faisal Shafait and Thomas M. Breuel [7] cite various works and methods to detect skew, to detect orientation or to detect both of them in digitized documents.

- in 1998, R.S. Caprari [8] used an asymmetry measure computed from projection profiles.
- in 2005, H. Aradhye [9] took advantage of ratio between left and right opening of text blobs.
- in 2006, Lu et al. [10] employed the distribution of the number and position of white to black transitions of the elements in the line.

- in 1995, Bloomberg et al. [11] presented a method for orientation detection that uses the ratio of the number of ascender characters to the number of descender characters used in English.
- in 2005, Avila et al. [12] used the x-height, the base line, the number of ascender and descender to detect the orientation.
- in 2006, Lu et al. [13] employed ascender to descender ratio to detect orientation.

We have to introduce some elements of typography before talking about the method we implemented to detect the orientation of our scanned document.

In Calligraphy and Typography the metrics of a font have been defined for a long time. Both of them use the concepts of: baseline, ascender, descender, height, capital, serif, slant, height, width or spacing.

The here below picture is borrowed from the M.I.T website .

In her monograph called “Metrics and neatening of handwritten characters” María Teresa Infante Velázquez [14] describes the metrics.

All start with the baseline, Infante Velázquez says:

The baseline is the imaginary line on which most of the characters are based. Generally, contiguous characters retain their alignment using the baseline as reference. Although certain characters extend below or above this line, it continues as the imaginary base. This line can be considered as the most important, since it is the guide for writing and it is used also as reference to obtain all the heights of the characters.

The mean line is used to define the height (also called mean height or x-height) of the lowercase letters and it is located over the baseline. It is important to mention that not all the lowercase letters lie only between these two lines, some extend above the mean line or below the baseline. The characters that fall in those circumstances continue to take the mean line as a reference, since it provides the information to obtain their height.

The ascender line helps to measure the height of lowercase letters (ascender height) which go beyond the mean / x-line, as in the case of the letters “l”, “h”, “t”, to name a few. Not to be confused with the line that it is used to calculate the height of capital letters, this will be addressed next. This line generally is located above all the lines used in character metrics. The ascender height is the space between the baseline and the ascender line.

The descender line is located underneath the base line. It is useful for characters such as the letter “j”, “g” or “y” among others, as it helps to mark the distance that these type of characters descend below the baseline.

The height refers to the distance between the baseline and the descender line and not to the whole character like in the previously mentioned heights.

The picture below shows all the lines for metrics.

and the next one shows the heights.

We will discuss other metrics in future articles about handwriting tilt detection and word and letter segmentation.

The orientation can be detected by “measuring” the shape of the signal of each lines of a digitized document. Therefore, we have to search these baselines as accurately as possible.

The method presented by R.S. Caprari [8] in 1998, or an approximate one, will be used to locate the baselines in the page. That is, we will use a horizontal projection profile to locate baselines. The writing in the registers being rather regular, we will be able to calculate the average leading to correct the first measurement.

Projection profile is defined as follow by Mr. Saurav Jha:

The

Projection profileof an image in a particular direction refers to the running sum of the pixels in that direction.In context of text processing, horizontal projection profile is needed to identify or separate out the lines of a text. since, the profile exhibits valley points at line boundaries and the location of these minima points mark the line boundaries. For binary ( black and white) images, these are the points where the profile goes to zero.

The image below illustrate the horizontal projection profile of a page after conversion to gray scale and thresholding. Therefore, the hollow points mark the darker parts of each line.

When using the b-spline function from scipy signal, you get a smoother representation of the projection profile. The plots below shows the histogram, the cspline2d and its derivatives.

The red dots in the representation of the cspline2d function of the image show the darkest rows of it, and the blue dots the lighter ones.

The red dots in the representation of the first derivative of the B-spline function show boundaries, an approximation of the baselines of the text and between dark and light areas. We will use this hollow points to find our baselines.

To find the mean line, the x-height, we use also a horizontal projection profile of a line. The red dot

As you can see in the following image, the height of lowercase letters is not constant. An approximation of this height can be determined for the line. For more precision, segmentation of the line in words is necessary.

A rough approximation of the metrics of the line is shown below.

Here below the script called page_find_orientation.py where the method is implemented.

The usage of it will be as follow : python page_find_orientation.py --image image --output folder

In the original script the lines 1-21 are comments.

22 23 24 25 26 |
# import the necessary packages import argparse # argument parser import cv2 # computer vision image processing import matplotlib.pyplot as plt # 2D plotting import h7as.utils as h7as # Some functions defined as this study progresses. |

We put some functions like convert_to_grayscale, find_orientation and analyze_image_sample into the “utils” module.

28 29 30 31 32 33 34 |
# construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-o", "--output", required=True, help="path to the output folder") args = vars(ap.parse_args()) |

We find the usual arguments, the path to the image to be loaded and the path to the folder to save the corrected image.

36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# load the image img = cv2.imread(args["image"]) if img is not None: orientation, confidence = h7as.find_orientation(img) print("Orientation = %d, confidence = %.4f" % (orientation, confidence)) if orientation < 0: imgfliped = cv2.flip( img, -1 ) else: imgfliped = img # create the figure f, axarr = plt.subplots(1,1) axarr.imshow(imgfliped, 'gray') axarr.set_title("image hopefully right side up") plt.show() else: print("Image not found: ", args["image"]) |

**Lines 37**, we load the image to process.

**Line 40**, we call the function
h7as.find_orientation to find the orientation of the page.

**Line 45**, we change the orientation if the image was up side down.

The function takes two parameters: the image to process and the number of samples to analyze.

In order to find the orientation of the document we start from the hypothesis that it is up side down. After a preprocessing, we search the baselines in a clumsy way, then we make two verifications on samples extracted on the right of the document, then on the left of the document.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# function to find the orientation of an image # return # orientation -2 <= m <=-0.1 upside down, # 0 <= m <=2 right side up, # confidence 0 <= n <=1 def find_orientation(img, sampling=7): # variables initialization orientation, confidence = 0, 0. right_side_up, up_side_down = 0, 0 avg_line_height = 0 #The image must be upside-down or right side up ==> angle 0.0 # convert the image to gray scale imggrayed = convert_to_grayscale(img) # remove noise imggrayed = cv2.fastNlMeansDenoising(imggrayed, h = 3, templateWindowSize = 7, searchWindowSize = 21 ) # apply threshold ret, imggrayed = cv2.threshold(imggrayed,200,255,cv2.THRESH_TRUNC) # compute cspline2d of the image (smooth histogram) # Horizontal Projection Profile imgck = signal.cspline2d(imggrayed, 8.0) # compute the projection profile peaks_hollows = imgck.sum(axis=1) # locate the peaks of the signal # maxInd = signal.argrelextrema(peaks_hollows, np.greater, order=18) # locate the hollows of the signal # order : int, optional # How many points on each side to use for the comparison to consider # comparator(n, n+x) to be True. minInd = signal.argrelextrema(peaks_hollows, np.less, order=16) # Compute the avg_line_height without correction # this could be the object of a function that would take the minInd # list as a parameter and return the average height of a line and # a list of baselines. for idx in range(len(minInd[0][:])): # cv2.line(img,(0,y),(img.shape[1],y),(0,0,255),1) if idx != 0: avg_line_height += minInd[0][idx] - minInd[0][idx-1] avg_line_height = avg_line_height // len(minInd[0]) # -------------------- look up orientation step one # checking up side down on the right of the page # select a sampling of lines # for each extract a part of the line and compute VPP for _ in range(sampling): line_to_analyze = secrets.randbelow(len(minInd[0])) if line_to_analyze == 0: line_to_analyze = len(minInd[0]) // 2 # analyze the sample sample_orientation = analyze_image_sample(img, imggrayed, avg_line_height, minInd, line_to_analyze, "up-side-down") # modify the accumulators if sample_orientation == -2: orientation -= 2 up_side_down += 1 elif sample_orientation == 2: orientation += 2 right_side_up += 1 # compute the mean of the orientation found orientation = orientation // sampling # compute the confidence of the orientation found if orientation > 0: confidence = right_side_up / sampling else: confidence = up_side_down / sampling # -------------------- look up orientation step two # keep the result of the first step right_orientation = orientation right_confidence = confidence # variables initialization orientation, confidence = 0, 0. right_side_up, up_side_down = 0, 0 # checking up side down on the left of the page # select a sampling of lines # for each extract a part of the line and compute VPP for _ in range(sampling): line_to_analyze = secrets.randbelow(len(minInd[0])) if line_to_analyze == 0: line_to_analyze = len(minInd[0]) // 2 # analyze the sample sample_orientation = analyze_image_sample(img, imggrayed, avg_line_height, minInd, line_to_analyze, "right-side-up") # modify the accumulators if sample_orientation == -2: orientation -= 2 up_side_down += 1 elif sample_orientation == 2: orientation += 2 right_side_up += 1 # compute the mean of the orientation found orientation = orientation // sampling # compute the confidence of the orientation found confidence = right_side_up / sampling if orientation > 0 else up_side_down / sampling # which orientation predominates if right_orientation < 0 and orientation >= 0 and confidence < right_confidence: confidence = right_confidence orientation = right_orientation return orientation, confidence |

**Lines 15-23**, we prepare the image to compute the third-order B-spline coefficients.

**Line 27**, we get those coefficients. The lambda argument is set to 8.0 and specifies the amount of smoothing.

**Line 30**, we compute the horizontal projection profile of those coefficients to get the hollows of the signal **line 38** with the function
argrelextrema . The argument np.less tells that we are looking for the valley points, and the argument
order=16 gives the number of points on each side to use for the comparison. Depending on the value of this parameter we will find more or less hollows. Here, we are looking for the main variations.

**Lines 44-48**, we use the valley points found to calculate a leading. It is not very precise, but sufficient to determine with correct reliability the orientation of the document. In another article we’ll see how to improve accuracy.

**Lines 54-79 and Lines 93-110**, we analyze samples of the document and calculate the orientation.

**Lines 119-121**, we compare the results of each analysis, then return the most likely.

**Lines 60 and 99**, we call the function
analyze_image_sample so lets have a look to it.

This function needs to be rewritten and simplified for clarity, but for now we will keep it as this.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
# function to analyze an image sample to get the orientation of an image # return orientation, right_side_up, up_side_down def analyze_image_sample(img, imggrayed, avg_line_height, minInd, line_num, way="right-side-up"): _orientation = 0 if way == "up-side-down": # assumption the image is up side down # base_line moved up base_line = minInd[0][line_num] - int(avg_line_height * 0.15) descender = base_line - int(avg_line_height * 0.64) if base_line - int(avg_line_height * 0.64) > 0 else 0 ascender = base_line + int(avg_line_height * 0.84) x_height = base_line + int(avg_line_height * 0.4) # assumption the image is up side down # sampling the image start_x = secrets.randbelow(img.shape[1] // 4) + (img.shape[1] // 2) end_x = start_x + (img.shape[1] // 4) if descender == 0: img_line = imggrayed[descender:ascender + 2, start_x:end_x] else: img_line = imggrayed[descender - 2:ascender + 2, start_x:end_x] else: # assumption the image is right side up # base_line moved down base_line = minInd[0][line_num] + int(avg_line_height * 0.15) descender = base_line + int(avg_line_height * 0.64) ascender = base_line - int(avg_line_height * 0.84) x_height = base_line - int(avg_line_height * 0.4) # assumption the image is right side down # sampling the image start_x = secrets.randbelow(img.shape[1] // 4) end_x = start_x + (img.shape[1] // 3) img_line = imggrayed[ascender-2:descender+2,start_x:end_x] # compute derivative of the image (smooth histogram) imgck_slice = signal.cspline2d(img_line, 8.0) # compute derivative of the image (find transition between black and white) derfilt = np.array([1.0, -2, 1.0], dtype=np.float32) imgfderiv = (signal.sepfir2d(imgck_slice, derfilt, [1]) + signal.sepfir2d(imgck_slice, [1], derfilt)) # compute derivative of the image (find transition between black and white) imgsderiv = (signal.sepfir2d(imgfderiv, derfilt, [1]) + signal.sepfir2d(imgfderiv, [1], derfilt)) # get peaks and hollows from the second derivative peaks_hollows_sd = imgsderiv.sum(axis=1) maxInd_sd = signal.argrelextrema(peaks_hollows_sd, np.greater, order=6) minInd_sd = signal.argrelextrema(peaks_hollows_sd, np.less, order=8) # locate max peak and its predecessor max_peak_sd = 0 max_i_sd = 0 for i in maxInd_sd[0]: if peaks_hollows_sd[i] > max_peak_sd: max_peak_sd = peaks_hollows_sd[i] max_i_sd = i # find index number idx_max_i_sd = maxInd_sd[0].tolist().index(max_i_sd) peak_before = None # get the previous peak location if idx_max_i_sd != 0: peak_before = idx_max_i_sd - 1 # locate max hollow (baseline) max_hollow_sd = peaks_hollows_sd[argmax(peaks_hollows_sd)] max_j_sd = 0 for j in minInd_sd[0]: if peaks_hollows_sd[j] < max_hollow_sd: max_hollow_sd = peaks_hollows_sd[j] max_j_sd = j # compute the distance between points # baseline, max peak and predecessor diff_baseline_peak = max_i_sd - max_j_sd diff_ppeak_baseline = 0 if peak_before is not None: diff_ppeak_baseline = max_j_sd - maxInd_sd[0][peak_before] # the orientation is define by the distance between the baseline (hollow) # and the two peaks surrounding it if diff_baseline_peak < 0: _orientation = -2 # upside down elif diff_ppeak_baseline > diff_baseline_peak: _orientation = 2 # right side up elif diff_ppeak_baseline < diff_baseline_peak: _orientation = -2 # upside down else: _orientation = 0 return _orientation |

**Lines 9-12 and 24-27**, we calculate approximate typography/calligraphy metrics as follow for up side down document:

- baseline: we slightly move up the baseline compared to the value found previously,
- descender: we calculate this value as the baseline +/- the average height line multiple by 0.64. In typography this ratio depends on the font size. In calligraphy this ratio depends for each font on the nib width. We measured several samples and found a ratio of 0.44 to 0.64 in relation to leading.
- ascender: we calculate this value as the baseline +/- the average height line multiple by 0.84. We measured several samples and found a ratio of 0.35 to 0.84 in relation to leading.
- x-height: we calculate this value as the baseline +/- the average height line multiple by 0.4. We measured several samples and found a ratio of 0.14 to 0.4 in relation to leading.

**Lines 15-20 and 30-32**, we get a sample from the grayed image.

**Lines 37-47**, we calculate the cspline2d and its derivatives to get the peaks and hollows of the signal (horizontal projection profile). We are looking for the greatest number of hollows and peaks.

**Lines 50-63**, we search the highest peak and its predecessor.

**Lines 66-71**, we search the deepest hollow. This valley point is, usually, not far from the baseline.

**Lines 75-89**, we calculate the distance between the baseline and the highest peak and its predecessor to estimate the orientation. On right side up lines, the highest peak is after the deepest valley point. It’s the opposite for the upside down lines.

The plots below show three samples, one is upside down and the others are right side up.

For the first one, the highest peak on the horizontal projection profile and the second derivative is before the deepest valley point.

For the second one, the highest peak on the horizontal projection profile and the second derivative is after the deepest valley point.

For the third one, the highest peak on the horizontal projection profile is before the deepest valley point and the highest peak on the second derivative is after the deepest valley point. The analysis function gives an erroneous result that can be easily corrected taking into account the values of the second curve.

If the method gives fairly results, it does not have good reliability for three reasons.

- The first concerns the search for baselines which is not accurate enough.
- To solve this inaccuracy, we will search the baselines in two passes with a correction of the average line height and the calculation of the distances between the hollower points and the moderately hollow points. We will remove some lines from our list.

- The second is related to sampling which is clumsy, rough.
- If baselines are more accurate, so will sampling. But we need to determine more precise ratios.

- The third is that in the sample analysis function we only consider the second derivative of the profile.
- For this third point, we will see to what extent it is necessary to use the other curves.
- Reliability has already been improved between the script used to produce plots and the function shown.

In this article we have presented some elements to consider to detect the orientation of a scanned document. This is a first draft, we will work on its improvement in the rest of this study.

[1] – Bruno STUNER, Clément CHATELAIN, Thierry PAQUET. Handwriting recognition using Cohort of LSTM and lexicon verification with extremely large lexicon. Normandie University, 2017. English arXiv:1612.07528v4.

[2] R. Plamondon, S. N. Srihari, Online and off-line handwriting recognition: a comprehensive survey, IEEE PAMI 22 (1) (2000) 63–84.

[3] – Théodore Bluche. Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. Computers and Society [cs.CY]. Université Paris Sud – Paris XI, 2015. English. <NNT : 2015PA112062>.

[4] – Jesper Dürebrandt. Segmentation and Beautification of Handwriting using Mobile Devices, Uppsala Universitet, 2015, English. <UPTEC F 15016>.

[5] – Ivor Uhliarik. Handwritten Character Recognition Using Machine Learning Methods, Comenius University, 2013, English.

[6] – Antoni Buades, Bartomeu Coll, Jean-Michel Morel – Non-Local Means Denoising, IPOL CNRS, 2011, ISSN 2105–1232.

[7] – van Beusekom J., Shafait F., Breuel T. M., Combined orientation and skew detection using geometric text-line modeling, in: Proceedings of the ICDAR, 2010, pp. 79-92.

[8] – R. S. Caprari, Algorithm for text page up/down orientation determination, Pattern Recognition Letters 21 (4) (2001) 311-317.

[9] – H. Aradhye, A generic method for determining up/down orientation of text in roman and non-roman scripts, Pattern Recognition 38 (11) (2005) 2114-2131.

[10] – S. Lu, C. L. Tan, Automatic document orientation detection and categorization through document vectorization, in: Proc. of the 14th ACM Int. Conf. on Multimedia, New York, NY, USA, 2006, pp. 113-116.

[11] – D. S. Bloomberg, G. E. Kopec, L. Dasari, Measuring document image skew and orientation, in: Proc. of SPIE Document Recognition and Retrieval II, San Jose, CA, USA, 1995, pp. 302-316.

[12] – B. T. Avila, R. D. Lins, A fast orientation and skew detection algorithm for monochromatic document images, in: Proc. of the 5th ACM symposium on Document engineering, New York, NY, USA, 2005, pp. 118-126.

[13] – S. Lu, J. Wang, C. Tan, Fast and accurate detection of document skew and orientation, in: Proc. of the 9th Int. Conf. on Document Analysis and Recognition, pp. 684-688.

[14] – María Teresa Infante Velázquez, , Metrics and neatening of handwritten characters, University of western Ontario, 2010, pp. 18-23.

[1] IAM handwriting database – http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

[2] HASYv2 – handwritten symbol database – https://zenodo.org/record/259444#.WdUj3CVPlPY

[3] Rimes handwritten database – http://www.a2ialab.com/doku.php?id=rimes_database:start

[4] Bentham handwritten database – http://www.transcriptorium.eu/~tsdata/BenthamR0/

Recent Posts

Recent Comments

Categories