Archive for August, 2008

Analysis of “Visual Similarity of Pen Gestures”

Author:

A. Chris Long, Jr.,  James A. Landay,  Lawrence A. Rowe,  and Joseph Michiels

Summary:

Introduction:

People have trouble remember gestures, especially if they are similar, and gestures are difficult to design. These researchers designed a tool that can advise gesture designers.  Ran experiments to measure gesture similarity, devised an algorithm to compute similarity, and then created design tool to provide advice.

Related Works:

Noted gesture recognition used in Newton, Palm Pilots, word processors, note-taking, air traffic control, etc.  Researchers used single-stroke techniques.

Psychologists attempted to determine how people judged similarity of geometric primitives.  Example: for rectangles, people used width and height. However, different people use different similarity metrics and the same person will even use different metrics for different stimuli (i.e might judge triangles by tilt instead of height and width).  Apparently “the logarithm of the quantitative metrics…correlate with similarity”.

Discussion also covers multi-dimensional scaling (MDS), which reduces the number of input dimensions for viewing on a plot.  Concerns were comparing participants to one another, how many dimensions to use, how to measure distance (Euclidean distance chosen), and assigning meaning to the axes.

User Study 1:

Created a set of empirically determined, dissimilar gestures.  10 participants shown groups of three gestures (aka a “triad” ) and asked to the pick the most dissimilar one from the other two.  Goal 1: to determine what geometric properties influenced perceived similarity, and did so through MDS. Goal 2: to produce a model of gesture similarity to predict similarity between two given gestures, and did so with “regression analysis” (appears to be just feature vectors in a vector space).  Did regression analysis for each dimension and then compared two dimensions at a time for comprehension sake.  Derived equations for each dimension that would plot an arbitrary value from a new gesture.  Their derived model reports gesture similarities with correlation 0.74.

User Study 2:

Sought to determine how certain features played into similarity (angle of bounding box, length and area of bounding box, rotation). Same setup and goals as before, but participants didn’t see all combinations of gestures. Did independent analysis of each of the three gesture sets to determine how similarity judgements were affected. Used results of study 1 to predict study 2 as well. Found that the bounding box angle is an important feature, and that alignment/non-alignment with the normal coordinate axes is significant to similarity.  The other differences listed above were not significant enough.

Turned the two derived models from each study towards the other and had correlations of 0.56 for the first, and 0.51 for the second.  Authors quick to admit this judging human perception with a computer program is hard, but happy with their results.

Author’s Discussion:

Said their models could still be used for a gesture design tool to advise gesture creation.  Liked MDS and good for discovering “candidate predictors”, but it was limiting, so regression analysis best for creating model.

Discussion:

I’m still not confident that these researchers found a reliable model for predicting gestures similarities, or something that could be automated and reproduced.  The proof of this coming from the fact that the models worked great on the data they were finessed for, but then dropped to the flip of a coin then the data was switched.  The paper is older, so I’d be interested to see the tool they developed.

Compliments on their user studies as they appeared to be executed well.  I might have argued to let the participants draw each shape of the triad and then decide which was the most dissimilar.  However there it would have made the study take longer, and then you can’t measure qualitative feedback like “I chose this shape as the most dissimilar because it felt the most different to draw.”

Comments made elsewhere:

  1.  Andrew’s Blog

Analysis of “Specifying Gestures by Example”

Author:

Dean Rubine
Summary:

1) The author begins with discussion of the current hand-coded gesture recognizers of 1991, labeling them as difficult.  He then claims the ability to do away from hand coding by using examples gestures and his GRANDMA architecture.  Few tools exist for such, and the author says his architecture can build small, fast, accurate recognizers that are trained on a small number of examples for a gesture.

2) An example program called GDP also uses the framework. The authors highlights a few use cases through screen shots, and that GRANDMA has a two phase operation: gesture collection and classification, then manipulation. GRANDMA also intentionally only has single-stroke commands to:

  1. avoid the problem of segmentation
  2. support the two-phase process
  3. contribute to a positive user experience

3) The user, or “gesture designer”, will select the command, or “class”, at run-time that he wishes to train and then provide around fifteen (empirically determined) examples of the gesture that issues the command.  GRANDMA is an MVC framework.  He was specify three semantic components: “recog” to define the attributes, handler, and view when the gesture is recognized, “manip” for the manipulation phase, and then “done”.  When gesture is made over multiple view (remember MVC), the priority is given to the top-most view (example gesture goes over an object view and the main window view).

4) Classification of an input gesture ‘g’ from the available classes ‘C’ is done through statistical analysis of a feature vector extracted from the input. Features were empirically chosen and included the cosine and sine of the starting angle, total length of the gesture, size of the bounding box, speed of drawing (so it’s not just a static image), etc. etc.  Features need to be computed in constant time, be meaningful, and have enough in number to distinguish between gestures. A weight for each of the classes is determined from their examples in the training process.  The linear classifier then classifies the gesture from the feature vector, assigning probabilities and standard deviation to reject classes (though the author says to forget rejection if you just have an “undo” feature).

5) GDP has “eager recognition”, attempting to determine the gesture at each new data point and continuing in the manipulation phase once ambiguity has been resolved and as long as the mouse button is held.  Multi-touch is also supported by using single-stroke recognition and a decision tree of the determined gestures.
Discussion:

I had commented on Nabeel’s post about using a time-out for determining the end of a gesture (here Rubine uses 0.2 seconds).  Interesting to me to see it was actually used.

If you have an ambiguous gesture, is it really absolutely horrible to ask the user what command he wanted? Or just like a live spell checker or the “Did you mean:” on Google search results, perform the most probable action but allow the user to select alternatives from an unobtrusive list? The “pack” gesture on page 2 looks like it could have been a rectangle, thus bringing this question to mind.

The problem definitely gets easier with only one stroke.  Could just wait 0.2 seconds for another stroke before going into the manipulation phase? The user might think the application is slow, but then you could still keep the two phase approach. If I’m going to draw an “X” or an arrow, I’ll make the accompanying marks pretty quickly.

Big fan of the MVC paradigm.

I like the concept of eager recognition, but the system should also take into account that it could getting it wrong.  The author discussed that the best solution is just to have an undo feature.  But what if I’m attempting the ‘pack’ command three times and eager recognition thinks I want to draw a rectangle each time?  Three times I’d hit execute the undo command, which should be hint to the system to try something different.

Still reading on covariance matrices, but wouldn’t a plain vector space comparison between the candidate feature vector and the average feature vector for each class suffice too?
Honestly, this paper just makes me excited.

Comments made elsewhere:

  1. Aksha’s Blog
  2. Andrew’s Blog
  3. Manoj’s Blog

Analysis of “Introduction to Sketch Recognition”

Authors:

Tracy Hammond and Kenneth Mock

Summary:

Paper opens with summary of this paper.

There are two types of “digitizers”, or ways the tactile sketching device receives input: active and passive.

  • Active digitizers require a special stylus that is sensed through electromagnetic signals.  The Wacom tablet is the best example. This allows for hovering, pressure sensitivity, and additional functionality through buttons on the pen. However, these pens can be lost and require calibration.
  • Passive digitizers accept any form of touch, though can field more natural when using one’s finger. However, they suffer from “vectoring” when other parts of the body (i.e. the palm) touch the screen and cause jumping.

Tablet computers are convertible, switching between slate and traditional notebook form.  A standard computer can use a USB attached tablet. Microsoft Windows XP Tablet Edition and Windows Vista include handwriting recognition, other drawings, or an on-screen QWERTY keyboard. Apple computers have Inkwell, but a tablet Mac is only available in aftermarket. Linux has it’s normal issues: need drivers but lots of open source solutions.

Digital drawing obviously has advantages over it’s normal counterpart: copying, moving, deleting, etc.  These abilities are  applicable to text, imagery, handwriting, and drawn strokes, and be grouped for through functionality.  Any example program is ScanScribe. Many shapes can be auto-recognized allowing a clean up shape and/or command to follow.
In a teaching situation, drawing surfaces can supplement the experience through projection from a tablet PC or use of a large area device such as a SMART board.  Large computer displays, such as models by Cintiq, also have advantages in teaching, a lab setting, or a classroom setting.  Recording of a lecture allows students to review a missed class or material not understood during the first pass.  Drawings will then be supplemented by audio/video playback. Examples are OneNote, Captivate, and Camtasia. Reviewing recorded lectures has evidence of improving student comprehension and morale.  For a lecturer, this may require time to adapt to the medium and creation of lecture templates that allow space for annotation.

For students, a tablet’s ability becomes useful for homework assignments, as flash cards, for digital books, and curriculum-specific software. Equations, diagrams, and pictures co-exist with normal text for a completely electronic learning process.  Other domains are sketch recognition of sheet music, chemistry diagrams, mechnical engineering simulations, finite state machines, UML diagrams, etc.

Each of domains listed are supported by the author’s LADDER software which uses the FLUID framework and the GUILD system for shape recognition. This framework is extentible to other domains by writing a LADDER domain description that defines the domain’s shapes as if drawn perfectly and providing an optional hook into a CAD or other program. Shapes can also be defined by drawing them. While drawing, shapes are identified by meeting certain thresholds from the perfect shape and being within the correct context (ex: pin joint can only exist if a body and moving part are recognized first).

Two case studies are discussed.  One involved high school math teacher who, using a tabet PC and recorded lectures, noticed increased attention span and better questions. At the time, the school intended to move to a one-to-one ratio of computers-to-children with a computing environment supportive of sharing notes and electronic submissions. A second involved a middle school teacher who performed polling and teaching with tablet technology and a projector.  He noticed excitement among the students because of the technology, better assignment assignments, and more participation from parents when the material was posted online.  Both case study participants preferred a tablet PC to an interactive whiteboard, citing portablility and ease of use.

Discussion:

In the future, supply page numbers.  This one was printed in an incorrect order.

Most of the material presented is summative, minus the discussion of the FLUID framework and the use cases.  Thus I will only discuss the latter.

I am excited about the FLUID framework because, as presented, it’s open-end integration for any domain makes it quite robust.  I’d most like to see such technologies in the creation of animation, animation review, web conferencing, etc.

For the case studies, I would have liked to see more quantitative data, though the teachers were quite capabile of recognizing the qualitative results.  Did grades improve or contrast greatly from a previous semester or another offer of the same course?

Comments Made Elsewhere:

  1. Andrew’s Blog

Analysis of “SketchPad: A Man-Machine Graphical Communication System”

Summary:

The author describes how to use SketchPad through use of a “light pen” and a pad of buttons to issues commands. The example given is an irregular hexagon that is made regular by snapping it to a circle and then removing the circle.  This hexagon is then stored and new “sheet of paper” (clearing of the screen) allows for instances of it to be used and manipulated.  The author discusses how these “subpictures”, “constraints”, and “definition copying” mirror the process a designer goes through to satisfy design conditions, listing a few domains of usefulness (repetitive drawings, circuit simulations, etc.).  The author describes the “ring” data structure that allows new points, lines, and other data to be in what appears is a linked list.  Constraints and other drawing subroutines must fit a generic data structure.

Cursor is either at the position of the light pen or snapped to a line or intersection if within its buffer area.  Basic functions are draw, move, and delete. The author discusses the implementation for display of drawings including the lines, text, handles for constraints (if desired), and the magnification of drawings that might go outside the viewport.  Discussion continues on the recursive nature of deleting and merging points (which might have lines, constraints, etc. dependent on them), and also in the rendering of an instance (must be translated, rotated, and scaled to its position relative to the original data) and “attachers” (constraints that must persist through each instance).  The behavior for copying point, attacher, or entire instance is overviewed and is pretty standard.

When applying constraints to a drawing, two methods are proposed: one-pass (fast) and relaxation (slower, used when the former fails).  One-pass works by recognizing the variables in the structure (drawing) that satisfy their constraints and declaring them “free”?

The paper concludes with multiple sample uses (patterns, bridges, linkages, artistic drawings, etc.).

Discussion:

I like the idea of drawing shapes to make instances of them, just building your own library as you go.  If you like a shape, just drag it onto some shelf or area at the bottom and it’s ready for instancing by dragging copies of it out.  Unfortunately, SketchPad only allowed one “sheet” at a time (I assume some button combination was required to returned to the base drawing).

Obvious HCI flaw in the system: its heavy use of buttons.  I’m not sure if it had capability to duplicate a drawing (not the same as copying an instance of it).

This is an interesting quote: “It is only worthwhile to make drawings on the computer if you get something more out of the drawings than just a drawing.”  Is this still true today? This may a social achievement of HCI when a young student in class will get lost in doodling on his tablet PC instead of a piece of paper for just for the sake of doodling.  Perhaps this isn’t really possible till digital paper becomes common place (”paper” will always be the ultimate medium).

Edit: I add my comment here to this discussion.

Comments Made Elsewhere:

  1. Nabeel’s Blog