Analysis of “Distinguishing Text from Graphics in On-line Handwritten Ink”
Comments Made Elsewhere:
Summary:
Again, want to classify text versus shape so that our system can select the correct recognizer to pass to. This work takes in stroke features and also temporal and contextual features. Seeks not to provide a hard classification, but output probabilities using machine learning.
Used total least squares model (TLS) was fitted to the stroke. Instead of robust corner finding, would segment the the stroke at local points of max curvature, the result being “fragments”. Did TLS on the largest fragment, too. Pulled out 11 features in all on the stroke. If selected fragment is really large and has high TLS ratio, probably a shape (non-text).
With this feature vector, then put it into a multilayer perception model (MLP, a neural network). With probability distribution of 1 = text and 0 = shape, were able to determine a value inbetween. Use ten-fold cross validation to avoid overfitting.
Next, created a HMM from training set on correlation between strokes drawn successively. From this, use dynamic programming Viterbi algorithm to determine the states stroke.
Followed these same steps to create features and an HMM for the gaps between strokes.
Discussion:
I understand the rational to adjust for bias toward text, and they do so by adjusting their error function, but it seems that they are fitting to their training set by relying on the population of text and graphics within it. Seems that this bias would change with each training set. Is this wrong?
Again, good support that strokes drawn temporally close together will have the same class. I like that they created features on the gaps themselves and thing more could be done here. I really see this as crucial to helping determine whether a new stroke is a shape or a text.
There large training set is impressive.
As they mention in their conclusion, I do think the approach should be expanded to include spatial information between strokes. For example, a sequence of strokes classified as text could be further verified if their centroids were roughly colinear and equally spaced. Perhaps features and a HMM could be generated on groups of strokes.