Archive for October, 2008

Analysis of “Grouping Text Lines in Freeform Handwritten Notes”

Comments Made Elsewhere:

  1. Manoj’s Blog

Summary:


Discussed other attempts that borrowed from document image analysis techniques, but didn’t do well for freeform ink unless using non-scalable heuristics.  Describe their technique to use a global cost function that should be optimized for the partitions of strokes found.  Use linear regression and “horizontal and vertical compactness” (largest gaps between strokes in x and y) to find the “goodness of a line” of possible text. Use temporal info to group first.
Also calculate a “configuration consistency” for a recognized line of text by using a neighborhood graph between all the recognized lines of text without a non-text element between them.  Longer lines have more weight.  Whether it is configured correctly is a function of the summation of the length-weighted neighbors.  “Model complexity” is the number of lines in the partition.  The cost function is a summation of all of these.
Group initially by temporal data, then generate alternative “hypotheses” by merging line segments and accounting for “high configuration energy” (i.e. the dot of an ‘i’ or cross of a ‘t’ late in the game).
Iteratively groups recognizes “lines” until the global function is optimized.


Discussion:
They admit to hand tuning their parameters for their testing set, but overall their process is very good (i.e. start with temporal data, group similar neighbors, and account for unique cases).  Hopefully we can borrow some of their intuition.

Analysis of “Perceptually-Supported Image Editing of Text and Graphics”

Comments Made Elsewhere:

  1. Yuxiang’s Blog

Discussion:

Seek to create an scanned document, image editing application, particularly for doodles and sketches.  Present new ways for segmenting and selecting elements.  One mechanism is a combination rectangle/lasso selection that is inferred from the user’s mouse movement.  The selection becomes a new image.  All images can be grouped on the parent image, but not in a hierarchy (allows multiple groupings).  Subsequent clicks on an image will cycle through the groups.

Also create algorithm for keying out the background automatically.  Discussing how confusing it was to “anchor” images when a user wanted to make a subselection.

Uses “perceptual organization” principles for fragmenting into strokes and blogs and then grouping together.  Strokes based on curvilinear smootness and closure.  Blogs based on spatial proximity and curvilinear alignment to look like words.  Uses this for automatic segmentation as invoked by the user.

Are working on an interface that just uses a pen next.

Summary:

There combination rectangle/lasso idea is a good one.

I have no problem with the checkboard pattern to signify transparency.  It gives a sense of depth, that you can see past the opaque parts.

I’m also interested in how to seamless switch between editing/drawing pen input.  With a explicit modal switching device, all you have is time, gesture, pressure, and angle. No one wants to perform “holds” or start a stroke with a certain segment in order to get to the correct mode. Perhaps by tapping an arrangement of dots?  But then why not give them a button?

Good paper, though, and I’m always a fan of cool UI techniques.

Analysis of “Sketch Recognition for Computer-Aided Design”

Comments Made Elsewhere:

  1. Yuixang’s Blog

Summary:

Seeks to determine user intent based as a function of sequence, speed, and pressure.  Corners recognized by slowing down and placed at intersection of the two lines.   Curves fitted to b-splines.   Do overtracing determination to remove excess lines.  Do latching to snap points and remove excess points, with a snapping radius determined by the speed/length/density. Could also use these to distinguish users. Discussion on the implications of the user providing feedback to the system and interacting with the system.

Discussion:

Good paper of pulling out user intent by speed and sequence.  No real comments.

Analysis of “Template-based Online Character Recognition”

Comments Made Elsewhere:

  1. Yuixang’s Blog

Summary:

Seek to allow for writer-independent character recognition by creating a “class” of templates for each writer and then training a classifier on these classes.  Do character recognition based on the templates after data reduction.  Discuss the two different classifers they used.

Other attempts: primitive decomposition (break into dots, arcs, loops, etc. and recognition characters using dictionary lookup or HMM), motor models (try to simulate movement of the hand?), elastic matching (”featureless” matching of points to points), stochastic models (extract features from points or sliding window of points and use HMM), time delay neural networks.

Preprocessing: Do resampling to make stroke equidistant and then apply Gaussian filter to each coordinates. End points and points of high curvature are preserved.  Scaled to be the same height but still have same aspect ratio.

Representation: Strokes listed as x,y, and theta of curvature.  Each stroke represented as sequence of events and then determine of sum of distances between two sequences of events.  When two strokes, or sequence of events, don’t have the same number, penalties are added to the distance calculated, ending up with four different equations for distance metric and going with the minimum of the four.

Data Reduction: (1) Cluster the “featureless” characters into some set K amount of clusters, each technically representing a writing style, and then take the medoid of the cluster. (2) Use a nearest neighbor calculation to select all examples on the edge of the training set.

Classification: (1) Used nearest neighbor again, or (2) constructed a decision tree based on a vector of the distances from each reference character.  Each of these “similarity features” (aka a comparison to a reference character) is then used a node on the decision tree?  They expand this to a “difference feature” by comparing to reference characters at each node to determine with one the input character is more like, and produce better classification.

Discussion:

I think their preprocessing steps would be quite useful in text-vs-shape distinction.

On page 13 second paragraph, when are they recognizing that multiple strokes belong to a character?  I assume this is for input strokes.

I’m not a fan of clustering algorithms that require a priori number of clusters to find.  I know this is a hard problem, but I know there are other algorithms out there that do a more bottom-up approach and segment the clusters themselves.

I don’t understand why they’d want to only select the templates on the edge of the training set?  Is it because there are the most extreme cases?

It’s interesting to see another paper using a decision tree for classification of text.  Maybe the feature space really is that small.

Analysis of “Distinguishing Text from Graphics in On-line Handwritten Ink”

Comments Made Elsewhere:

  1. Yuxiang’s Blog

Summary:

Again, want to classify text versus shape so that our system can select the correct recognizer to pass to.  This work takes in stroke features and also temporal and contextual features.  Seeks not to provide a hard classification, but output probabilities using machine learning.

Used total least squares model (TLS) was fitted to the stroke.  Instead of robust corner finding, would segment the the stroke at local points of max curvature, the result being “fragments”.  Did TLS on the largest fragment, too.  Pulled out 11 features in all on the stroke.  If selected fragment is really large and has high TLS ratio, probably a shape (non-text).

With this feature vector, then put it into a multilayer perception model (MLP, a neural network). With probability distribution of 1 = text and 0 = shape, were able to determine a value inbetween.  Use ten-fold cross validation to avoid overfitting.

Next, created a HMM from training set on correlation between strokes drawn successively.  From this, use dynamic programming Viterbi algorithm to determine the states stroke.

Followed these same steps to create features and an HMM for the gaps between strokes.

Discussion:

I understand the rational to adjust for bias toward text, and they do so by adjusting their error function, but it seems that they are fitting to their training set by relying on the population of text and graphics within it.  Seems that this bias would change with each training set. Is this wrong?

Again, good support that strokes drawn temporally close together will have the same class. I like that they created features on the gaps themselves and thing more could be done here.  I really see this as crucial to helping determine whether a new stroke is a shape or a text.

There large training set is impressive.

As they mention in their conclusion, I do think the approach should be expanded to include spatial information between strokes.  For example, a sequence of strokes classified as text could be further verified if their centroids were roughly colinear and equally spaced.  Perhaps features and a HMM could be generated on groups of strokes.

Analysis of “Ink Features for Diagram Recognition”

Comments Made Elsewhere:

  1. Yuxiang’s Blog

Summary:

Claims not much has been done on determining with features that are important, and on determining text versus shape.  Claims that recognition for text and shape must be handled different.

Created a list of 40+ features, generated and hand labeled a corpus of strokes, then did statistical analysis to create a binary classification tree of eight features.  Found inter-stroke gaps (time between each stoke) to be important feature.

Discussion:

This is still single stroke recognition, if I’m not mistaken (i.e. each stoke is either a shape or letter).

Simple to understand, yet inflexible as they mentioned (they also suggest a HMM).  Kind of half-gesture, half-geometric features.  Wonder how computer vision could be used here (not really wanting to just recognition, just be able to put the shape down the tube of the right recognizer).  Not sure it could since relying on how it was drawn instead of what was drawn avoids training and misclassification of shapes that accidentally look like letters.  Using time and size seeks to capture more of the user’s intent.

Not sure why they showed the results from running the dividers on the training sets (Figure 4).  I guess just to remphasize how bad the other two are.

One theory: text will have more cusps.

Analysis of “Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System”

Comments Made Elsewhere:

  1. Manoj

Summary:

Analysis of gamer habits when given portability and wifi capabilities. (1) Renegade gaming where they are able to game anywhere (stairwells, office, etc.).  (2) There are aspects of the portable unit and its games that work against this adhoc, pick-up game with random strangers.  (3) Players make “spheres” or become antisocial.

Studied gamers through semi-structured interviews and at gaming events. Broke results into four themes: (1) gaming contexts, (2) gaming goals, (3) multiplayer game coordination, (4) in-game experience.

(1) Portable gaming reduces the physical boundaries for multiplayer gaming, so it can occur almost anywhere, even in normal inappropriate places (e.g. work).  However, a gamer will consider if his gaming will disrupt others, is socially appropriate, and what his personal image is.

(2) Want to be able to trash talk and play someone of equal skill level.

(3) How to coordinate a pick-up game?  No one really just walks up to someone and asks to play a DS game apparently.  Others advertise that they’ll be at a location or what games they have to encourage random game play.  Can you leave or enter a game in mid-play?  How does this affect others?

(4) DS’s feel less social because the screen is small.  People can’t look over your shoulder. Naturally, people seem to arrange themselves in any which way with these portable devices.

Conclusion: have DS’s advertise themselves and allow external displays to be hooked up to provide a bird’s eye view

Discussion:

I want to do “research” with a Nintendo DS.  Sign me up!

Nice little paper on qualitative research done around social gaming.  Kind of just common sense written out formally in my opinion.

Analysis of “MathBrush: A Case Study for Pen-based Interactive Mathematics”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Describe MathBrush, a pen-math tool for recognizing hand-drawn math expressions, using representation in MathML, recognition for specific users, among other features. Sits on top of a CAS (Computer Algebra System).  Just there to recognize and manipulate, not calculate.  This paper describes the rationale of MathBrush in light of education.

Had many unique design challenges since they were first to integrate with a back-end CAS and support multiple and repeated operations on the input.  (1) Display of complex of expressions and outputs, with scrolling canvas to be like paper.  (2) Use of context-sensitive menus to access commands. (3) Editing of subexpressions (i.e. edit only part of the equation) in a separate window. (4) Graphical plotting. (5) Other features not tested in their evaluation…

Did think-aloud, semi-structured interviews.  Had to correct for people leaving extra ink on the equation.  Had a two-tier recognition phase (character, then structural) that confused people when the latter would fail (assumed everything was ok after the first passed).  Had trouble conveying what needed to be corrected to the user when an error occurred, resulting in the user making lots of unneeded corrections.

Noticed that users were changing their writing style to the system.  Questioned whether their overhead was better than using pen or paper.

Discussion:

It’s funny that a participant suggested adding a on-screen keyboard to the application.  People probably generally feel this is faster.

In this situation, I think it’s a good use of context menus as they visually free the interface and don’t require the user to learn a gesture for initiating commands.  However, I am a fan of the Microsoft Office “ribbon” and I would not be opposed to having something like this that changed accordingly for the user.

They might have borrowed directly a CAS too much, in that they displayed the input and output instead of just doing correction.  Mathematica and others work like MathBrush does, but they expressed issues with participants trying to edit the output instead of their gesture, and also people not knowing where to start when an error occurred.  Perhaps “beautification” as the user went along would be better, and then making the equation red until it passed structural analysis (when it could turn green or something).

No real complaints on the people adjusting their writing style to the system (though they did have to slow down).  There will always be some of this, but don’t exceed an extraneous amount.

Analysis of “Sketch-Based Educational Games”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Four types of learning: auditory, visual, tactile, and kinesthetic (student carries out a physical activity).  Sketch recognition can help with the latter two while still providing auditory and visual feedback.  APPLES teaches planetary motion, gravity, collision, and other basic physics. “Sketch!” is a simon-says memory game. Another on geography, shape learning (which drawing constrants), and sentence diagramming.

Discussion:

For the sentence diagramming tool, there are many available parts of taggers and sentence dependency graph classifers that would allow the tool to be standalone and not require the teacher.

Love the applications though after playing with them myself.

Agree that much more could be done with the geographical tool.  Mark an ‘x’ and do reverse geocoding to find out what was selected.  Pull of information and pictures about that spot for the child.  Learn spatial relationships by calculating distances drawn on a map using units such as miles and kilometers.

Analysis of “Recognizing Free-form Hand-sketched Constraint Network Diagrams by Combining Geometry and Context”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Seek to create an hand-sketch interface to constraint programming and optimization.  Constraints satisfaction has two parts: modeling (define the variables, their possible values, and possible relationships between them) and solving (systematic determine the values of the variables given the constraints).  Focuses on the former by creating a graphical vocab that can be read to define the problem.  Effective modeling is hard and takes expertise (hence this paper).  Sketching allows for more creative and interactive feedback.  Use a geometric recognizer instead of gesture so not dependent on drawing style.  Break into primitives and heuristically recognize.

Allows for some letter recognition.  Uses context to resolve ambiguous shapes (little ‘v’, arrowhead, or less than sign?). Discussion of how recognition of nodes, undirected/directed links between nodes, variables, and operators occurs.  Each stroke is counted as two (one in each direction) and these “partners” keep tract of each so as not to be counted twice.

Threshold determination is required for some recognition (e.g. ‘x’ and ‘y’ only differ by height).  Instead of using set threshold (i.e. 10 pixels), determined that a coincidental point must be closer to an endpoint than the midpoint of the line (and vic versa for intersections).  Still work to be done, but empirically works well.

Discussion:

Appears to be another great application of sketch recognition to an uncharted domain.  The authors seek to take up the challenge of providing a smart interface to constraint satisfaction logic.  Of note is the mixture of shapes and variables (i.e. letters) together.  Here they describe the single letters geometrically since traditional handwriting recognition requires the context of other letters and words.  They of course also will allow for implicit input via a modal switch and a keyboard.

Context is king for the success of this paper.

Next Page »