Archive for October, 2008

Analysis of “LADDER, a sketching language for user interface developers”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Created the LADDER language for describing a shape, the editing behaviors allowed, and how to display it when recognized.  Language/specification is parsed and converted to Java code for use by primitive recognizer to become a domain-specific tool.  Can specify with hard (must exist) and soft (not required) constraints.  User study showed that geometric descriptions were more important that feature-based (i.e Rubine).  Can’t use to describe objects with curves because they’re hard to define.  Also, shape must be described using primitives only (not lots of detail).

Shapes can be declared like classes in OOP (abstract shapes, domain groups = packages).  Predefined shapes exist (e.g. “shape”, “point”, “line”, etc.).  Each shape has basic properties/attributes (”bounding box”, etc.).  Rich syntax for constraints.  Editing behaviors are defined by a trigger and what to do when triggered. For display, can show original, cleaned up, ideal strokes (no signal noise), or replace with picture.  Can also define “vectors” or how two components of a shape are joined (e.g. two lines of a PolyLine).

For primitive recognition, a bottom-up approach that works to guarantee that the higher-level, domain shape recognizer only chooses one shape per primitive found.  For domain shape recognition, a shape must satisfy a Jess rule, leaving unrecognized shapes to be determined on each new stroke.  Editing gestures are triggered by double taps or click and holds. Discussion of the constraint solver that uses Mathematica functions for determining the ideal beautifed shape.  Discussion on the code generation that occurs from the LADDER defintions.

Discussion:

From the listings in their related works, this is definitely not the first attempt at this.  The beauty of LADDER appears to lie in its hierarchical structure, use of geometric features, and rich syntax. It really opens up the possibilities for anyone wanting to incorporate a sketch recognition system.

Can a designer specify what to do with an ambiguous shape/primitive found?  Turn it a color or something until the system can later determine what it might be?

Since it’s obviously already setup to go from definition to code, what’s to keep a GUI from being created that producing a LADDER definition for you based on your drawing?

Obviously to scale to more detailed shapes, the LADDER language/specification will have to allow for grouping of soft constraints for each way a detail shape might be drawn (e.g. a stick figure drawn with a circle head and line segments for a body, versus one drawn with all circles and ellipses).

Analysis of “Ambigous Intentions”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Want an interface that can capture a user’s ambiguity and convey it visually and interactively. Not doing so forces designers to wait until then end to put their precise and definite designs into the computers.  Must identify primitive shapes AND also context.  Must support abstraction (the instancing of complex drawings), ambiguity (no worries if shape can’t be determined right now, just keep the alternatives for when we have more context; or designer sets up a placeholder), and imprecision (allow approximations with exact calculations later). Created the “Electronic Cocktail Napkin”.

Their program recognizes a box, a circle, or a line. Doesn’t do cleanup. User can edit the drawing. User can specify certain configurations to recognize (e.g. chair-like boxes around a large box are a dining room table).  All contextual recognition is user-defined.

For imprecision, uses constraint-based interactive editing (e.g. can still move lines by the stay connected, chairs must remain certain size in comparison to table, etc.).  Infers constraints from context specified by the user (above, below, contains, etc.).  Will err on side of over-constraining so that user can move/delete as needed. Also want to support gradual refinement.

Describes a glyph as a pen path, aspect ratio, bounding box size, stroke and corner count.  Uses 3×3 grid.  Compares to templates which define the transformations that are allowed (i.e rotating, etc.).  Returns candidates and gives them a 1-5 score (1 being best).  Templates can be added on the fly.

Configurations are constantly searched for in the graph.  An ambiguous shape may be determined by configuration found around it. These are constructed by the user.

Both templates and configurations are part of a context or context-chain kept by the application.  A context has templates, configurations, spatial relations, and mappings to templates in other contexts.  Searches up the context chain when analyzing a glyph.  Selects the context in the chain based on what is recognized.

Discussion:

If figure 5b, how does it know when I’ve drawn over a prior shape and then remove the previous shape? Compare bounding boxes?

This a good methodology for recognizing context around a drawing, and it could definitely be enhanced by today’s better recognizers.  Everything kind of starts out with a base context, but then another is added to the chain as its glyphs are recognized.  It also seems possible that a stack of incorrect contexts could be recognized if the wrong of two similar contexts are determined first off and then used for further recognition.

It is unfortunate that it must rely so much on user-defined glyphs and configurations, but I suppose this is nature for any domain.  Definitely a problem that will be supplemented when text and shapes can be distinguished, then using text to help determine domain and context.

Analysis of “What!?! No Rubine Features?”

Comments Made Elsewhere:

  1. Akshay’s Blog

Summary:

Gesture-based: how is it drawn instead of what it looks like; mathematically sound classifiers, but user-dependent

Geometric-based: what it looks like; more user and style-independent, but have numerous thresholds and heuristics that can be hard to analyze and optimize; no classification, but calculates the error distance from an ideal shape

Want the advantages of both combined with normalized confidence values so the higher-level recognizer can decide which to use.  While in search of this, researchers found that gesture-based features are less significant in recognition of freely sketched data.

Used a classifier with gesture and geometric features (44 total).  Found that it didn’t perform as well as Paleosketch (a hueristics-based approach), until the correct features were found (came much closer).  Did a greedy feature selection algorithm to select those found more influencial.  Reduced number down to 15.  Only one gesture-based feature made the cut: total rotation.  Could actually get a 93% recognition rate with the top six features alone.

Discussion:

All of the things discussed in this paper are things we seen before, just applied in a new way.  There are obvious advantages to using a classifier over a heuristics approach (overhead moved to preprocessing, central method of calculate allows for normalized confidence values, can grow or shrink feature set as needed, etc.)

Naturally this has to have some good segmentation up front to work well. I see the complex fit feature now.

But it’s almost a heuristics approach since most of the features could almost be considered binary (i.e. “does it fit a curve? yes or no”).  But it does show are solid some of Paulson’s heauristics-turned-features are.

Also seems like this method would be less reliant on scaling and normalizing the sketch initially.

Analysis of “Backpropagation Applied to Handwritten Zip Code Recognition”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Doing recognition on zip codes written on letters.  Recognizes the zip code and then does linear transformation to put in a normalized form.  Discussing extracting “local features” into order to make higher order ones.  These local features can appear anywhere, so only care about their approximate position, not precise.  Detection is done through “weight sharing” (don’t understand it just yet).

The “network” architecture consists of three layers, which they call H1, H2, H3.  Each layer has 12 groups.  For H1, each group is made up of 64 (8×8) units, or “hidden units”, each making up a feature map.  Each unit considers the 5×5 space of its neighbors.  The image is undersampled because just want to detect the presence of features, not the precise position?  Each unit performs the same operation, sharing the same 25 weights, but have different biases (thresholds).  Each feature map has a different set of 25 weights.

For H2, each group is only 16 units (4×4). Each unit takes in “connections” from 8 of the 12 feature maps from H1, so it have eight 5×5 neighborhoods.

For H3, has 30 units and is “fully connected” to H2.  H3 also connects to the output layer.  In all, there are 9760 independent parameters.

Used the backpropagation simulator. All weights are randomly initialized, and I assume it tries to optimize the output (mean squared error) on each iteration?

Found segmentation to be a major problem.  Also ambigious patterns or writing styles not seen in training set.  They successfully applied backpropagation learning to “large, real-world task”.

Discussion:

I’m not fully grasping this paper and will have to discuss this in class to complete my understanding.

« Previous Page