Analysis of “Grouping Text Lines in Freeform Handwritten Notes”
Comments Made Elsewhere:
Summary:
Discussed other attempts that borrowed from document image analysis techniques, but didn’t do well for freeform ink unless using non-scalable heuristics. Describe their technique to use a global cost function that should be optimized for the partitions of strokes found. Use linear regression and “horizontal and vertical compactness” (largest gaps between strokes in x and y) to find the “goodness of a line” of possible text. Use temporal info to group first.
Also calculate a “configuration consistency” for a recognized line of text by using a neighborhood graph between all the recognized lines of text without a non-text element between them. Longer lines have more weight. Whether it is configured correctly is a function of the summation of the length-weighted neighbors. “Model complexity” is the number of lines in the partition. The cost function is a summation of all of these.
Group initially by temporal data, then generate alternative “hypotheses” by merging line segments and accounting for “high configuration energy” (i.e. the dot of an ‘i’ or cross of a ‘t’ late in the game).
Iteratively groups recognizes “lines” until the global function is optimized.
Discussion:
They admit to hand tuning their parameters for their testing set, but overall their process is very good (i.e. start with temporal data, group similar neighbors, and account for unique cases). Hopefully we can borrow some of their intuition.