Archive for the 'CPSC 689 - Sketch Recognition' Category

Analysis of “Template-based Online Character Recognition”

Comments Made Elsewhere:

  1. Yuixang’s Blog

Summary:

Seek to allow for writer-independent character recognition by creating a “class” of templates for each writer and then training a classifier on these classes.  Do character recognition based on the templates after data reduction.  Discuss the two different classifers they used.

Other attempts: primitive decomposition (break into dots, arcs, loops, etc. and recognition characters using dictionary lookup or HMM), motor models (try to simulate movement of the hand?), elastic matching (”featureless” matching of points to points), stochastic models (extract features from points or sliding window of points and use HMM), time delay neural networks.

Preprocessing: Do resampling to make stroke equidistant and then apply Gaussian filter to each coordinates. End points and points of high curvature are preserved.  Scaled to be the same height but still have same aspect ratio.

Representation: Strokes listed as x,y, and theta of curvature.  Each stroke represented as sequence of events and then determine of sum of distances between two sequences of events.  When two strokes, or sequence of events, don’t have the same number, penalties are added to the distance calculated, ending up with four different equations for distance metric and going with the minimum of the four.

Data Reduction: (1) Cluster the “featureless” characters into some set K amount of clusters, each technically representing a writing style, and then take the medoid of the cluster. (2) Use a nearest neighbor calculation to select all examples on the edge of the training set.

Classification: (1) Used nearest neighbor again, or (2) constructed a decision tree based on a vector of the distances from each reference character.  Each of these “similarity features” (aka a comparison to a reference character) is then used a node on the decision tree?  They expand this to a “difference feature” by comparing to reference characters at each node to determine with one the input character is more like, and produce better classification.

Discussion:

I think their preprocessing steps would be quite useful in text-vs-shape distinction.

On page 13 second paragraph, when are they recognizing that multiple strokes belong to a character?  I assume this is for input strokes.

I’m not a fan of clustering algorithms that require a priori number of clusters to find.  I know this is a hard problem, but I know there are other algorithms out there that do a more bottom-up approach and segment the clusters themselves.

I don’t understand why they’d want to only select the templates on the edge of the training set?  Is it because there are the most extreme cases?

It’s interesting to see another paper using a decision tree for classification of text.  Maybe the feature space really is that small.

Analysis of “Distinguishing Text from Graphics in On-line Handwritten Ink”

Comments Made Elsewhere:

  1. Yuxiang’s Blog

Summary:

Again, want to classify text versus shape so that our system can select the correct recognizer to pass to.  This work takes in stroke features and also temporal and contextual features.  Seeks not to provide a hard classification, but output probabilities using machine learning.

Used total least squares model (TLS) was fitted to the stroke.  Instead of robust corner finding, would segment the the stroke at local points of max curvature, the result being “fragments”.  Did TLS on the largest fragment, too.  Pulled out 11 features in all on the stroke.  If selected fragment is really large and has high TLS ratio, probably a shape (non-text).

With this feature vector, then put it into a multilayer perception model (MLP, a neural network). With probability distribution of 1 = text and 0 = shape, were able to determine a value inbetween.  Use ten-fold cross validation to avoid overfitting.

Next, created a HMM from training set on correlation between strokes drawn successively.  From this, use dynamic programming Viterbi algorithm to determine the states stroke.

Followed these same steps to create features and an HMM for the gaps between strokes.

Discussion:

I understand the rational to adjust for bias toward text, and they do so by adjusting their error function, but it seems that they are fitting to their training set by relying on the population of text and graphics within it.  Seems that this bias would change with each training set. Is this wrong?

Again, good support that strokes drawn temporally close together will have the same class. I like that they created features on the gaps themselves and thing more could be done here.  I really see this as crucial to helping determine whether a new stroke is a shape or a text.

There large training set is impressive.

As they mention in their conclusion, I do think the approach should be expanded to include spatial information between strokes.  For example, a sequence of strokes classified as text could be further verified if their centroids were roughly colinear and equally spaced.  Perhaps features and a HMM could be generated on groups of strokes.

Analysis of “Ink Features for Diagram Recognition”

Comments Made Elsewhere:

  1. Yuxiang’s Blog

Summary:

Claims not much has been done on determining with features that are important, and on determining text versus shape.  Claims that recognition for text and shape must be handled different.

Created a list of 40+ features, generated and hand labeled a corpus of strokes, then did statistical analysis to create a binary classification tree of eight features.  Found inter-stroke gaps (time between each stoke) to be important feature.

Discussion:

This is still single stroke recognition, if I’m not mistaken (i.e. each stoke is either a shape or letter).

Simple to understand, yet inflexible as they mentioned (they also suggest a HMM).  Kind of half-gesture, half-geometric features.  Wonder how computer vision could be used here (not really wanting to just recognition, just be able to put the shape down the tube of the right recognizer).  Not sure it could since relying on how it was drawn instead of what was drawn avoids training and misclassification of shapes that accidentally look like letters.  Using time and size seeks to capture more of the user’s intent.

Not sure why they showed the results from running the dividers on the training sets (Figure 4).  I guess just to remphasize how bad the other two are.

One theory: text will have more cusps.

Analysis of “Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System”

Comments Made Elsewhere:

  1. Manoj

Summary:

Analysis of gamer habits when given portability and wifi capabilities. (1) Renegade gaming where they are able to game anywhere (stairwells, office, etc.).  (2) There are aspects of the portable unit and its games that work against this adhoc, pick-up game with random strangers.  (3) Players make “spheres” or become antisocial.

Studied gamers through semi-structured interviews and at gaming events. Broke results into four themes: (1) gaming contexts, (2) gaming goals, (3) multiplayer game coordination, (4) in-game experience.

(1) Portable gaming reduces the physical boundaries for multiplayer gaming, so it can occur almost anywhere, even in normal inappropriate places (e.g. work).  However, a gamer will consider if his gaming will disrupt others, is socially appropriate, and what his personal image is.

(2) Want to be able to trash talk and play someone of equal skill level.

(3) How to coordinate a pick-up game?  No one really just walks up to someone and asks to play a DS game apparently.  Others advertise that they’ll be at a location or what games they have to encourage random game play.  Can you leave or enter a game in mid-play?  How does this affect others?

(4) DS’s feel less social because the screen is small.  People can’t look over your shoulder. Naturally, people seem to arrange themselves in any which way with these portable devices.

Conclusion: have DS’s advertise themselves and allow external displays to be hooked up to provide a bird’s eye view

Discussion:

I want to do “research” with a Nintendo DS.  Sign me up!

Nice little paper on qualitative research done around social gaming.  Kind of just common sense written out formally in my opinion.

Analysis of “MathBrush: A Case Study for Pen-based Interactive Mathematics”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Describe MathBrush, a pen-math tool for recognizing hand-drawn math expressions, using representation in MathML, recognition for specific users, among other features. Sits on top of a CAS (Computer Algebra System).  Just there to recognize and manipulate, not calculate.  This paper describes the rationale of MathBrush in light of education.

Had many unique design challenges since they were first to integrate with a back-end CAS and support multiple and repeated operations on the input.  (1) Display of complex of expressions and outputs, with scrolling canvas to be like paper.  (2) Use of context-sensitive menus to access commands. (3) Editing of subexpressions (i.e. edit only part of the equation) in a separate window. (4) Graphical plotting. (5) Other features not tested in their evaluation…

Did think-aloud, semi-structured interviews.  Had to correct for people leaving extra ink on the equation.  Had a two-tier recognition phase (character, then structural) that confused people when the latter would fail (assumed everything was ok after the first passed).  Had trouble conveying what needed to be corrected to the user when an error occurred, resulting in the user making lots of unneeded corrections.

Noticed that users were changing their writing style to the system.  Questioned whether their overhead was better than using pen or paper.

Discussion:

It’s funny that a participant suggested adding a on-screen keyboard to the application.  People probably generally feel this is faster.

In this situation, I think it’s a good use of context menus as they visually free the interface and don’t require the user to learn a gesture for initiating commands.  However, I am a fan of the Microsoft Office “ribbon” and I would not be opposed to having something like this that changed accordingly for the user.

They might have borrowed directly a CAS too much, in that they displayed the input and output instead of just doing correction.  Mathematica and others work like MathBrush does, but they expressed issues with participants trying to edit the output instead of their gesture, and also people not knowing where to start when an error occurred.  Perhaps “beautification” as the user went along would be better, and then making the equation red until it passed structural analysis (when it could turn green or something).

No real complaints on the people adjusting their writing style to the system (though they did have to slow down).  There will always be some of this, but don’t exceed an extraneous amount.

Analysis of “Sketch-Based Educational Games”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Four types of learning: auditory, visual, tactile, and kinesthetic (student carries out a physical activity).  Sketch recognition can help with the latter two while still providing auditory and visual feedback.  APPLES teaches planetary motion, gravity, collision, and other basic physics. “Sketch!” is a simon-says memory game. Another on geography, shape learning (which drawing constrants), and sentence diagramming.

Discussion:

For the sentence diagramming tool, there are many available parts of taggers and sentence dependency graph classifers that would allow the tool to be standalone and not require the teacher.

Love the applications though after playing with them myself.

Agree that much more could be done with the geographical tool.  Mark an ‘x’ and do reverse geocoding to find out what was selected.  Pull of information and pictures about that spot for the child.  Learn spatial relationships by calculating distances drawn on a map using units such as miles and kilometers.

Analysis of “Recognizing Free-form Hand-sketched Constraint Network Diagrams by Combining Geometry and Context”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Seek to create an hand-sketch interface to constraint programming and optimization.  Constraints satisfaction has two parts: modeling (define the variables, their possible values, and possible relationships between them) and solving (systematic determine the values of the variables given the constraints).  Focuses on the former by creating a graphical vocab that can be read to define the problem.  Effective modeling is hard and takes expertise (hence this paper).  Sketching allows for more creative and interactive feedback.  Use a geometric recognizer instead of gesture so not dependent on drawing style.  Break into primitives and heuristically recognize.

Allows for some letter recognition.  Uses context to resolve ambiguous shapes (little ‘v’, arrowhead, or less than sign?). Discussion of how recognition of nodes, undirected/directed links between nodes, variables, and operators occurs.  Each stroke is counted as two (one in each direction) and these “partners” keep tract of each so as not to be counted twice.

Threshold determination is required for some recognition (e.g. ‘x’ and ‘y’ only differ by height).  Instead of using set threshold (i.e. 10 pixels), determined that a coincidental point must be closer to an endpoint than the midpoint of the line (and vic versa for intersections).  Still work to be done, but empirically works well.

Discussion:

Appears to be another great application of sketch recognition to an uncharted domain.  The authors seek to take up the challenge of providing a smart interface to constraint satisfaction logic.  Of note is the mixture of shapes and variables (i.e. letters) together.  Here they describe the single letters geometrically since traditional handwriting recognition requires the context of other letters and words.  They of course also will allow for implicit input via a modal switch and a keyboard.

Context is king for the success of this paper.

Analysis of “LADDER, a sketching language for user interface developers”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Created the LADDER language for describing a shape, the editing behaviors allowed, and how to display it when recognized.  Language/specification is parsed and converted to Java code for use by primitive recognizer to become a domain-specific tool.  Can specify with hard (must exist) and soft (not required) constraints.  User study showed that geometric descriptions were more important that feature-based (i.e Rubine).  Can’t use to describe objects with curves because they’re hard to define.  Also, shape must be described using primitives only (not lots of detail).

Shapes can be declared like classes in OOP (abstract shapes, domain groups = packages).  Predefined shapes exist (e.g. “shape”, “point”, “line”, etc.).  Each shape has basic properties/attributes (”bounding box”, etc.).  Rich syntax for constraints.  Editing behaviors are defined by a trigger and what to do when triggered. For display, can show original, cleaned up, ideal strokes (no signal noise), or replace with picture.  Can also define “vectors” or how two components of a shape are joined (e.g. two lines of a PolyLine).

For primitive recognition, a bottom-up approach that works to guarantee that the higher-level, domain shape recognizer only chooses one shape per primitive found.  For domain shape recognition, a shape must satisfy a Jess rule, leaving unrecognized shapes to be determined on each new stroke.  Editing gestures are triggered by double taps or click and holds. Discussion of the constraint solver that uses Mathematica functions for determining the ideal beautifed shape.  Discussion on the code generation that occurs from the LADDER defintions.

Discussion:

From the listings in their related works, this is definitely not the first attempt at this.  The beauty of LADDER appears to lie in its hierarchical structure, use of geometric features, and rich syntax. It really opens up the possibilities for anyone wanting to incorporate a sketch recognition system.

Can a designer specify what to do with an ambiguous shape/primitive found?  Turn it a color or something until the system can later determine what it might be?

Since it’s obviously already setup to go from definition to code, what’s to keep a GUI from being created that producing a LADDER definition for you based on your drawing?

Obviously to scale to more detailed shapes, the LADDER language/specification will have to allow for grouping of soft constraints for each way a detail shape might be drawn (e.g. a stick figure drawn with a circle head and line segments for a body, versus one drawn with all circles and ellipses).

Analysis of “Ambigous Intentions”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Want an interface that can capture a user’s ambiguity and convey it visually and interactively. Not doing so forces designers to wait until then end to put their precise and definite designs into the computers.  Must identify primitive shapes AND also context.  Must support abstraction (the instancing of complex drawings), ambiguity (no worries if shape can’t be determined right now, just keep the alternatives for when we have more context; or designer sets up a placeholder), and imprecision (allow approximations with exact calculations later). Created the “Electronic Cocktail Napkin”.

Their program recognizes a box, a circle, or a line. Doesn’t do cleanup. User can edit the drawing. User can specify certain configurations to recognize (e.g. chair-like boxes around a large box are a dining room table).  All contextual recognition is user-defined.

For imprecision, uses constraint-based interactive editing (e.g. can still move lines by the stay connected, chairs must remain certain size in comparison to table, etc.).  Infers constraints from context specified by the user (above, below, contains, etc.).  Will err on side of over-constraining so that user can move/delete as needed. Also want to support gradual refinement.

Describes a glyph as a pen path, aspect ratio, bounding box size, stroke and corner count.  Uses 3×3 grid.  Compares to templates which define the transformations that are allowed (i.e rotating, etc.).  Returns candidates and gives them a 1-5 score (1 being best).  Templates can be added on the fly.

Configurations are constantly searched for in the graph.  An ambiguous shape may be determined by configuration found around it. These are constructed by the user.

Both templates and configurations are part of a context or context-chain kept by the application.  A context has templates, configurations, spatial relations, and mappings to templates in other contexts.  Searches up the context chain when analyzing a glyph.  Selects the context in the chain based on what is recognized.

Discussion:

If figure 5b, how does it know when I’ve drawn over a prior shape and then remove the previous shape? Compare bounding boxes?

This a good methodology for recognizing context around a drawing, and it could definitely be enhanced by today’s better recognizers.  Everything kind of starts out with a base context, but then another is added to the chain as its glyphs are recognized.  It also seems possible that a stack of incorrect contexts could be recognized if the wrong of two similar contexts are determined first off and then used for further recognition.

It is unfortunate that it must rely so much on user-defined glyphs and configurations, but I suppose this is nature for any domain.  Definitely a problem that will be supplemented when text and shapes can be distinguished, then using text to help determine domain and context.

Analysis of “What!?! No Rubine Features?”

Comments Made Elsewhere:

  1. Akshay’s Blog

Summary:

Gesture-based: how is it drawn instead of what it looks like; mathematically sound classifiers, but user-dependent

Geometric-based: what it looks like; more user and style-independent, but have numerous thresholds and heuristics that can be hard to analyze and optimize; no classification, but calculates the error distance from an ideal shape

Want the advantages of both combined with normalized confidence values so the higher-level recognizer can decide which to use.  While in search of this, researchers found that gesture-based features are less significant in recognition of freely sketched data.

Used a classifier with gesture and geometric features (44 total).  Found that it didn’t perform as well as Paleosketch (a hueristics-based approach), until the correct features were found (came much closer).  Did a greedy feature selection algorithm to select those found more influencial.  Reduced number down to 15.  Only one gesture-based feature made the cut: total rotation.  Could actually get a 93% recognition rate with the top six features alone.

Discussion:

All of the things discussed in this paper are things we seen before, just applied in a new way.  There are obvious advantages to using a classifier over a heuristics approach (overhead moved to preprocessing, central method of calculate allows for normalized confidence values, can grow or shrink feature set as needed, etc.)

Naturally this has to have some good segmentation up front to work well. I see the complex fit feature now.

But it’s almost a heuristics approach since most of the features could almost be considered binary (i.e. “does it fit a curve? yes or no”).  But it does show are solid some of Paulson’s heauristics-turned-features are.

Also seems like this method would be less reliant on scaling and normalizing the sketch initially.

« Previous PageNext Page »