IT Engeneering

In the previous article, to detect the skew we used a method that works but not for all documents. If the document had different characteristics, such as a larger contrast or a single block of text, the threshold used in the function did not detect lines of text. In addition, the search for horizontal or vertical structural elements for inclinations of the order of 40 to 60 degrees in the clockwise and counter-clockwise direction failed.

if, Hough transform is a great to find lines in an image, we have to vary the parameters to get enough lines to compute an accurate skew angle.

In this article, we change the method to detect the skew and correct it with the Hough transform but without morphological operations.

Since we do not know the inclination and its direction, we have to make operating assumptions. Between a first document that would have an inclination of 45 ° and a second document that would have an inclination of -135 ° we could not determine the rotation to be carried out without analyzing the orientation of the text.

We assume that the majority of the documents will have an inclination between -45 and 45 degrees with text orientation right side up. If the skew is outside those limits, the document could be upside down after the skew correction. The analysis of the orientation of the document will later make it possible to restore it.

The following graph shows the principles that will be used to detect the inclination of a document.

For each line, if the angle theta given by the Houghline function is:

- less than pi/2 then the skew is classified as clockwise, else counterclockwise.
- less than pi/4 or more than 3 pi/4 then the line is classified as horizontal,
- greater than pi/4 and less than 3 pi/4 then the line is classified as vertical.

Here below the script called page_findskew.py where the method is implemented.

The usage of it will be as follow : python page_findskew.py --image image --output folder

In the original script the lines 1-21 are comments.

22 23 24 25 26 27 28 29 |
# import the necessary packages import argparse # argument parser import numpy as np # fundamental package for scientific computing import math # mathematical functions defined by the C standard import cv2 # computer vision image processing import matplotlib.pyplot as plt # 2D plotting import os # operating system dependent functionality import h7as.utils as h7as # Some functions defined as this study progresses. |

We have put shared functions like convert_to_grayscale or puttext_with_bgcolor in module “h7as.Utils”.

31 32 33 34 35 36 37 |
# construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-o", "--output", required=True, help="path to the output folder") args = vars(ap.parse_args()) |

We find the usual arguments, the path to the image to be loaded and the path to the folder to save the corrected image.

39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# load the image img = cv2.imread(args["image"]) if img is not None: rc, skew, imgenhanced = h7as.find_skew_and_straighten(img, skew_limit=np.pi/4) if rc: cv2.imwrite(os.path.join(args["output"],args["image"].split("/")[-1]),imgenhanced) # display orginal image and enhanced one side by side # create the figure and the axes f, axarr = plt.subplots(1, 2) axarr[0].imshow(img, 'gray') axarr[0].set_title("Original image") axarr[1].imshow(imgenhanced, 'gray') axarr[1].set_title("enhanced image") plt.suptitle("Handwritten from registers of civil status - image enhancement - angle %.4f" % (math.degrees(skew) - 90)) # adjust spacing f.subplots_adjust(hspace=0.3) plt.show() else: print("Image not found: ", args["image"]) |

We start by loading the image then we check if the image has been found.

After this check, we call the find function and skew correction with the following parameters:

- the image
- the tilt limit as pi / 4 or pi.

Either the search is carried out between pi/4 – 3 pi/4, or between 0 and pi.

If everything went fine, we save the image straighten and display the result as below.

Let’s go a little further and look at the find_skew_and_straighten function.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# function to find the skew of an image if any and untilt it def find_skew_and_straighten(img, skew_limit=np.pi/4): # find skew rc, skew = find_skew(img, skew_limit=np.pi/4) # set rotation angle theta_to_rotate = (math.degrees(skew) - 90) # rotate the image if skewed if skew != 0.0: (h, w) = img.shape[:2] center = (w // 2, h // 2) map_matrix = cv2.getRotationMatrix2D(center, theta_to_rotate, 1.0) imgrotated = cv2.warpAffine(img, map_matrix, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE) else: imgrotated = img return rc, skew, imgrotated |

This function call another function called find_skew with the same parameters, then rotate the image if the skew found is not zero. The angle for the rotation is calculated in degrees relative to the desired orientation of pi / 2.

Afterwards it returns a status code, the angle in radian and the image rotated or not.

The opencv 3.3 documentation contains all the explanations to set the rotation matrix and the affine transformation.

We choose to replicate the border when rotating the image.

Now, let’s go through the main function for tilt search: find_skew.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# function to find the skew of an image if any def find_skew(img, upper_threshold=600, angle_res=np.pi/180, skew_limit=np.pi/4): # angle in radian angle = 0. # convert the image to gray scale imggrayed = convert_to_grayscale(img) # adaptive thresholding on reverse gray scale image # ADAPTIVE_THRESH_MEAN_C : threshold value is the mean of neighbourhood area. # Block Size - It decides the size of neighbourhood area. # C - It is just a constant which is subtracted from the mean or weighted mean calculated. imgbw = cv2.adaptiveThreshold(~imggrayed, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, -2) # variables initialization left_tendency, right_tendency, no_tendency = 0,0,0 anglestoleft, anglestoright = 0.,0. lhorizontal_count, lvertical_count = 0, 0 half_pi = math.pi / 2 quarter_pi = math.pi / 4 three_quarter_pi = 3 * (math.pi / 4) # look for lines for tresh in range(upper_threshold, 100, -10): lines = cv2.HoughLines(imgbw,1,angle_res,tresh) if lines is not None and len(lines ) > 20: break # find if the rotation is clockwise or anti clockwise. try: for line in lines: for rho,theta in line: # is it a vertical line or horizontal one ? # this is just for information, not used if theta >= quarter_pi and theta <= three_quarter_pi: lvertical_count += 1 elif theta < quarter_pi or theta > three_quarter_pi: lhorizontal_count += 1 # is the skew clockwise or counterclockwise ? if theta in (0.,half_pi,math.pi,three_quarter_tau): no_tendency += 1 indicator += "-N" elif theta > three_quarter_pi: continue elif theta > half_pi: left_tendency += 1 anglestoleft += theta else: right_tendency += 1 anglestoright += theta # compute the mean of the angle to rotate the image depending on left and right tendency mean_anglestoleft = anglestoleft / left_tendency if left_tendency > 0 else 0 mean_anglestoright = anglestoright / right_tendency if right_tendency > 0 else 0 # images with skew greater than 45° or less than -45° # could be up side down after rotation if skew_limit == np.pi / 4 # if skew_limit == np.pi all skew clockwise will be up side down if no_tendency > right_tendency and \ no_tendency > left_tendency: angle = half_pi elif right_tendency > left_tendency: angle = mean_anglestoright else: if skew_limit == np.pi / 4: angle = mean_anglestoleft elif mean_anglestoleft > math.pi: angle = math.pi - mean_anglestoleft else: angle = (mean_anglestoleft - mean_anglestoright) / 2 except TypeError as te: print('lines is empty. No lines found.') return False, angle return True, angle |

This function takes four parameters:

- The image to process,
- The accumulator starting threshold to search for lines withe the Hough transform, (default value 600)
- The angle resolution of the accumulator in radians. (default value pi / 180)
- The skew limit searched. (default value pi / 4)

**Lines 7 to 13**, we convert the image to a binary one for the Hough transform function.

**Lines 7**, we convert the image in gray scale and **Lines 13** we apply an adaptive threshold on the inverted gray scale image.

**Lines 16-18**, we initialize our variables:

- left_tendency, right_tendency are accumulators to count the clockwise and counterclockwise lines,
- anglestoleft, anglestoright sum the clockwise and counterclockwise theta given by the HoughLines function,
- lhorizontal_count, lvertical_count are accumulators to count vertical and horizontal lines.

**Lines 25-28**, we loop on the HoughLines function until we find at least 20 lines which is enough to find a mean skew angle. The accumulator threshold is decrease by 10 at each iteration and we leave this loop if we have not found more than 20 lines when reaching the lower limit of 100. Is it a greedy function? Certainly. We will fine tune it later, if needed.

In our case, we found mostly more than 20 lines at the first iteration. When testing this function in some cases, we had to reach a lower limit of 110 to find what we were looking for. You could notice that with this code we could get less than 20 lines at the last iteration. Does it matter?

Below we show an example of what we get.

**Lines 31-77**, we analyze the results return by the HoughLines function, if any. If the latter function doesn’t return any lines we leave the find_skew function returning False and an angle of zero.

**Lines 32-46**, we iterate through the result and count up lines against our distribution schema shown above.

**Lines 49-50**, we compute the average of the angle to right and to the left. We could use other computation to get a more accuracy result. Will see later if we really need a better accuracy.

**Lines 55-74**, the result of the function is calculated in relation to the number of horizontal and vertical lines found and the estimated direction of rotation, and of course to our limits. As we said before, the result could be at the end an upside down document. the next function to find the right orientation will solve this issue.

This solution is a little more flexible than the previous one, but we could certainly find a better one. For the time being, it meets our objective as part of our study.

In our next article, we will see a solution to determine the orientation of a document.

[1] Joost van Beusekom, Faisal Shafait, Thomas M. Breuel, Combined Orientation and Skew Detection Using Geometric Text-Line Modeling, Technical University of Kaiserslautern.

[2] – Dhaval Salvi. Document Image Analysis Techniques for Handwritten Text Segmentation, Document Image Rectification and Digital Collation. University of South Carolina, 2014. English.

[3] – Dan S. Bloomberg, Gary E. Kopec and Lakshmi Dasari. Measuring document image skew and orientation. Xerox Palo Alto Research Center, Palo Alto,CA 94304, 1994. English.

[4] – Dan S. Bloomberg. Analysis of Document Skew. Leptonica, 2002. English.

[1] IAM handwriting database – http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

[2] HASYv2 – handwritten symbol database – https://zenodo.org/record/259444#.WdUj3CVPlPY

[3] Rimes handwritten database – http://www.a2ialab.com/doku.php?id=rimes_database:start

[4] Bentham handwritten database – http://www.transcriptorium.eu/~tsdata/BenthamR0/

Recent Posts

Recent Comments

Categories