opencv - Scan Business Card Tesseract and Leptonica iOS -


i trying scan business card using tesseract ocr, doing sending image in no per-prossesing, heres code using.

 tesseract* tesseract = [[tesseract alloc] initwithlanguage:@"eng+ita"]; tesseract.delegate = self; [tesseract setvariablevalue:@"0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz@.-()" forkey:@"tessedit_char_whitelist"];  [tesseract setimage:[uiimage imagenamed:@"card.jpg"]]; //image check [tesseract recognize];   nslog(@"here text %@", [tesseract recognizedtext]); 

picture of card

this output

as can see accuracy not 100%, not concerned figure can fix simple per-processing. if notice mixes 2 text blocks @ bottom, splits address, , possibly other information on other cards.

how can possibly use leptonica(or else maybe opencv) group text somehow? possibly send regions of text on image individually tesseract scan? i've been stuck on problem while possible solutions welcome!

i recommend using algorithm called "run length smoothing algorithm" (rlsa). algorithm used in lot of document image processing systems, though not every system expose part of api.

the original paper published in 1982 , requires payment. however, same algorithm cited many other papers on document image processing, can find implementation details , improvements.

one such paper this: http://www.sciencedirect.com/science/article/pii/s0262885609002005

the basic idea scan document image row row, recording width of gaps between letters.

then, nearby text characters can combined filtering on width of gaps, , setting small gaps same color text. result large connected components represent:

  • words,
    • by closing gaps between characters,
  • text lines,
    • by closing gaps between words, ,
  • paragraphs
    • by scanning column column , closing vertical gaps between text lines.

if not have access document image analysis libraries expose functionality, can mimic effect by:

  • using morphological operations (morphological closing), ,
  • perform connected-component labeling on result.

most image processing libraries, such opencv, provides such functionality. might less efficient take approach because have re-run algorithm using different text gap sizes achieve different levels of clustering, unless user provides application text gap sizes.


Comments

Popular posts from this blog

html - Sizing a high-res image (~8MB) to display entirely in a small div (circular, diameter 100px) -

java - IntelliJ - No such instance method -

identifier - Is it possible for an html5 document to have two ids? -