Archive for September, 2008

Analysis of “Algorithms for the Reduction of Number of Points…”

Comments Made Elsewhere:

  1. Manoj’s Blog

Discussion:

Digitizing pen input can be done through x-y values or polar coordinates.  Another is to divide the drawing region into cells and represent lines by cell/point-cloud density, though this is computationally expensive. Discussion continues on older ways to digitize input.

Claim that most digitizers are able of recording more data points than needed.  Have to remove duplicates and those in a straight line to improve offline processing and graphic display speeds.  Can do this mathematically (which may overgeneralize and smooth pointy features), by criteria (points to close together; drop a vector that does deviate by threshold angle in chain encoding), or by a cartographer (and then a system which could “learn” what the cartography generalizes, but impossible to do).

For line reduction, a simple approach is to delete all by every nth point of a determined line (still with flaws, though).  Another method is to select the points (which this paper focuses on, instead of deletion).  How do you select them?  Author suggests one of selecting the points that is the farthest perpendicular distance from the line segment between the two end points (most likely to characterize the line).

Method one: If all points are under a threshold, then the start (anchor) and end (floater) points are good to represent the line themselves.  When above threshold, then maximum value is the new floater and process starts again.  When complete, floater becomes anchor and process again, starts again.

Method two: same as method one, except all the floating points are saved in a vector so when the new anchor point is chosen, just select the one from the top.  Selects more points than method one, but only takes 5% of the computation time and is thought to produce better caricatures.

Discussed Lang’s implementation of the above methods that reduced the number of calculations needed to be made.  Discussed the speed comparisons of the different implementations.
Summary:

Even though an older article, this obviously still has merit today as we’d like to only process the data we actually need to.

I particularly found the chain encoding method interesting.  Checking angles and distance on each new vector are easy ways to reduce points needed.  Seems like it could be combined with the window feature of ShortStraw to determine a corner (was the collection angle by the last n points under some median threshold?).

But this opens to the door for lossy representation of an artist/sketcher’s original intention.  This may be our first taste into “beautification”, the amount of being domain dependent.

Analysis of “ShortStraw: A Simple and Effective Corner Finder for Polylines”

Comments Made Elsewhere:

  1. Nabeel’s blog

Summary:

The authors intend to present an algorithm that is simple (like the $1 recognizer) for finding corners (not heavy on math). Helps in recognition of primitives in a free-sketch recognition environment. Examples uses: mapping by drawing a line, curves in a 3D environment, and identifying letters in words on a virtual keyboard.

Steps:

1) Resample the stroke using a “interspacing distance” calculated from length of bounding box’s diagonal divided by the constant 40 (empirically determined).

2) Bottom-up testing: calculate euclidean length of “windows” along points. Set threshold equal to median of length times 0.95.  If below threshold, you’re a corner.

3) Does three top-down checks: (1) checks for a line between corners, (2) is not a line, must be more corners so relax the threshold until lines appears, and (3) check for lines between triplets of corners and remove the middle if so.

Results: Beat the two baseline implementations (Sezgin, Kim and Kim) hands down, using a the harder all-or-nothing accuracy approach.  Still suffered from completely recognizing all corners in heavily obtuse sketches.
Discussion:

I think the authors made a good point in the Discussion about not using speed as a metric for determining corners: their algorithm is much more applicable as it can be used on image-based sketch recognition.

However, if they information were available, perhaps clusters of non-resampled data points could be used to confirm/deny a potential corner.

The sample imagery in figure 4 all have visibly defined corners.  Show me what this algorithm did when the user drew an ellipse.
Over all, a well-done application of simple concepts to achieve a good solution to a touch problem.

Summary for Homework 1

For the curious, my project summary is available here (PDF).

Analysis of “User Sketches: A Quick, Inexpensive, and Effective way to Elicit More Reflective User Feedback”

Comments made elsewhere:

  1.  Nabeel’s Blog

Summary:
Introduce “user sketching” as a means for doing usability testing that involves actual end users but is not as expensive as video analysis, questionnaires, interviews, and other “think-aloud” methods.  Are other discount methods (heuristic evaluation), but are done by an expert only.  Sketching is an iterative process that helps the drawer gain new knowledge by visually seeing his results.

User Study 1:

48 people; 4 groups of 12;  3 groups shown on design; last group shown all three; went through think-aloud exercises but only gave reactive feedback instead of reflective (meaning unable to provide their own ideas on how to redesign it); before leaving, each participant was asked to quickly sketch how they’d draw the system

User Study 2:

Analyzed the sketch responses by dividing them into their original groups and then looking for themes. Group with three designs had more originality. If not shown in the sketches, a UI element was seen as unpopular, which reflect the results of the questionnaire.  If the original design doesn’t show up in the sketches, then it’s not popular obviously.  Found many original ideas unique only the sketches and not video analysis or questionnaire data.  Other ideas were refined by the sketches. Most participants only took a few minutes to sketch out their ideas (relatively quickly). Hardly any ideas were left out of the sketches (all reinforcing and supportive).  Another advantage: quicker and cheaper in comparison to other UT tests

Found one participant who said he had no ways to redesign the system, yet when force to sketch something, came up with a lot of ideas.  Presented two other participants who they were also able to squeeze more ideas out of my sketching.

Discussion:

I knew I was going to like this paper after reading the abstract. I like the definition for “language of design” too.

I’d like to know more about the participants of the study.  I agree that sketch design is good, but I find it hard to believe that any normal engineering major could not provide some type of decent verbal feedback on how to redesign what he/she saw.

The selections from the semi-structured interview crack me up, especially the “overly excited sketcher”.

The next question is how does one break these sketches down into numbers for comparison?  Sure we can do simple things like X number of people had the circular design in their sketch, but we’ll need higher level comparison.  Do another user study with the half participants using a tablet PC to annotation on the design they saw and then the other half doing paper sketches with free form.  We’ve already see people using sketch recognition to create UI and web forms, but how do we get more from this?

Analysis of “Prototype Pruning by Feature Extraction for Handwritten Mathematical Symbol Recognition”

Comments Made Elsewhere:

  1. Akshay’s Blog

Summary:

Mathematical symbols are hard to recognize because there’s a lot of them (1000+). This paper attempts to first “pre-classify” a candidate symbol so that it can be “pruned” to a specific group, then matched (instead of checking against all symbols). Like others, they also empirically select features to use for “prototype pruning”.

Steps:

  1. Gather points
  2. Pre-process (smoothing (average of three points) to remove errors from jitty surface; resampling of duplicate and too-distanced points; size normalization; chop of heads and tails)
  3. Feature extraction
  4. Passed to recognizer (elastic matching or HMM) or to preclassifier for groupin (then the recognizer)

Recognition is only done on the pruned sets using the elastic recognizer, not all the symbols.

Data Collection: x,y,time, and pressure from 50+ people

Features:

  1. Number of Loops (determined by finding minimum distance pairs)
  2. Number of Intersections (found using a modified sweepline algorithm)
  3. Number of Cusps
  4. Number of Strokes
  5. Point Density using three boxes over the shapes (40% top, 20% middle, 40% bottom)
  6. Initial direction, end direction, and direction from initial to end.
  7. Position of initial and end points (which quadrant they fall in)
  8. Width to height ratio (0 for slim, 1 for wide, and 2 for normal)

Recognition: Done through an elastic recognitzer, which means calculate the minimum distance between the unknown symbol and the set of models.

Conclusion: There method (not using the HMM or these features: loop and intersection), lowers the recognition rate from about 94% to 91% but the expense of computation is greatly reduced through use of prototype sets.
Discussion:

I like the author’s honest on the “ad-hoc” nature of selecting features.

The idea of grouping things together to reduce the number of comparisons is great, but this still comes down to just empirically determining the appropriate features for your domain.  However, the author admits that, with this paper, he’s only making progress in the area.

Thanks for the ideas on features, too…

Analysis of “Graphical Input Through Machine Recognition of Sketches”

Comments Made Elsewhere:

  1. Yuxiang’s Blog

Author:

Christopher Herot

Summary:

Theorizes that the machine can make determinations about the user’s attitude toward and uncertainties about his design, on top of the sketch’s meaning.  Performed three experiments to determine (1) if there are a set of rules (syntax) that could be processed independently of the “embedded semantics”, (2) what domain-independent inferences a computer could make about a sketch, and (3) made use of architectural knowledge to recognize a sketch.

Had a physical pen that sampled at set constant rate (16-200 points per second).  Eraser ability could also be saved by the machine.

First program, STRAIT, found corners through speed. Used speed an “intent-a”, where a more purposefully drawn sketch is drawn slower. Noticed that speed decreased at corners. Curves were fitted to B-splines, and another program named CURVIT.

For latching (straighten of points), originally just snapped points without a certain radius, but didn’t work well.  Rewrote STRAIT to STRAIN that used speed to determine latching radius, but still a z-index is needed for properly latching some shapes. Use number of bodies, drawing order, etc. for determing this (discussed later).

For overtracing, assume the user is emphasizing the line, but must distinguish from two carefully drawn parallel lines.

Did experiment drawing in 3D and with drawing a floor plan, divvy it up on a grid, then have a room finding algorithm (like a Roomba).  Didn’t work as well because context was needed.

So the sketch recognition algorithm needs context, which the user has to specify (to avoid have to recognize that, too).  Examples: “homes”, “parts”, etc.  Other items: how far along in the sketch, idiosyncracies of the pesron, his/her attitude toward the design.  Author describes the tree data structure for doing such, but then says it won’t work?

Have a bottom-up approach where the system recognizes and top-down approach were the user or higher-level programs makes corrections and manipulations.

Has two functions: speed and bentness (maximum distance between the chord made by the start and end point of an interval).  Latter used to find corners, and former used to verify this. Speed, line length, and drawing sequence can also be used for determining a certainty factor when latching.

Discussion:

The author has breathed new life in the feature of speed as it is useful for corner finding and less a feature of the gesture itself. The number of corners could then a feature of the gesture.

He mentions context such as the idiosyncrasies of the user and completion of the sketch, but how does one gather these?  I know some research can identity the user, but it take a timeout to generate a percent complete statistic during run-time.  This could be displayed to the user as the amount of “ink” left in the pen or something, and then it refills whenever they finish the stroke.

It must be nice to have the application sampling at 100-200 times a second.  Flash is sampling at more like 38-40.  May be switching to Java to get 55-60 points a second.

Analysis of “Marqs” Paper

Author:

Brandon Paulson, Tracy Hammond

Summary:

There are a lot of advantages to allowing a users to search pictures and media via a sketch instead of search terms (cognition, easier). Built a system that is domain and style independent and does not require initial training. Don’t want to require lots of training, so have a two-recognizers system:  one classifier that’s ready to go after a single stroke and another that trains on search queries. End user has to associate a sketch with the album as creation time.

Commented that curvature is an important feature in recognizing single strokes.

Both classifiers use the same global feature set that is independent of scale, orientation, and multiple strokes. The major axis (line between furthest points) is determine and rotated to be horizontal.  Features: bounding box aspect ratio, pixel density, average curvature, and number of perceived corners.  Goes from a single-stroke classifier to a linear stroke once two examples have been determined.

Comment that a grid-based approach could be added if orientation was needed for the sketches in a particular domain.  Admits to wanting to implement more features.

Discussion:

I was really excited about this paper until I read that the end-user has to assign a sketch to each album himself.  It’s like tagging, which I hate doing outside of del.icio.us.  This system will only be as good as the user wants it to be. I afraid I would suffer from not remembering a sketch I did a year ago, and would want to have another search feature to fall back on.

I think cornering finding is a very interesting metric and one that could be really useful.  I also wonder if point clustering could be another “global feature” useful for searching, though this would hold location with it.

I’m confused by their relevance feedback metrics.  What does it mean that the average search rank was 1.51? Tell me the precision and recall of the system, and other accepted terminology.  Also, how did they determine what  the system returned the top search result 70% of time?  Did the evaluation participants mark it as such?

This concept would do well to have a index of the sketches so you could use metrics like TF-IDF for result rankings, too (if the corpus were large enough).
Where this would really be cool is when you incorporate a social aspect to it. What other users draw like me? If asked, “What are you thinking right now?”, I draw it, and it returns other users thinking or who have thought the same within the last three hours, it’s Pictionary meets Twitter!  Sadly, you’d probably get lots of phallic imagery.

Comments made elsewhere:

  1. Ben’s blog

Analysis of “Gestures without Libraries, Toolkits or Training”

Comments made elsewhere:

  1. Andrew’s Blog

Author:

Wobbrock, Wilson, and Li

Summary:

Authors make claims that gesture recognition is hard to achieve for the common programmer and user interface designer who want to use any tool (Flash, Javascript, etc.). Claim to have a $1 recognizer that performs better than Rubine and Dynamic Time Warping. Their 100-LOC algorithm supports rotation, scale, and position invariance.  HMMs, neural network, and statistical classifiers don’t easily allow run-time gesture definitions and are hard to debug.  Libraries are not always availabe in the language needed.  Some platforms have APIs that are powerful, but only useful when available.

Goals: (1) resilient to variations in speed and sensing; (2) support rotation, scale, and position invariance; (3) no advanced mathmatical tecniques; (4) few lines of code; (5) interactive purposes; (6) teach it new gestures; (7) return N-best list; (8 ) competitive

Step 1: Points are re-sampled to be equidistant with a total number of points at either 32, 64, or 128.

Step 2: Find the centroid and the first point and rotate these to the zero degrees for comparison.

Step 3: Scale to some bounding box and then translate all points to the origin.

Step 4: Find the average distance between the points of the candidate gesture and those of the template or classes. Score based on these.

Though the candidate and template gestures are scaled and rotated to their indicative angle, this doesn’t mean it’s their optimal angle. However, for similar gestures, the optimal angle could be achieved by rotating as long as the path-distance between the candidate and template decreased.  This works quite well, but not so for dissimilar gestures. The authors found that the Golden Section Search just much better for dissimilar and only slightly worse for similar. It finds the minimum value in a range using the Golden Ratio.

Limitations: It can’t distinguish for specific orientations, aspect ratios, and locations (i.e. up arrow versus down arrow). It doesn’t use time, so speed can not help in determining. For handling template variations, users will have to define the proper alias to a gesture when it not recognized (ex: I drew an arrow, but it was recognized, so I’ll assign it to the class of “arrow”).

Evaluation: 10 participants asked to draw 16 types at 3 different speeds 10 times each.  Implement $1, Rubine, and Dynamic Time Warping (DTW), all using their rotation invariance scheme, scaled to a standard square size, and translated to the origin. Trained Rubine after the adjustments.
Found $1 and DTW to be very accurate (0.98%).  The number of gestures had significant effect on the recognition errors. Speed had significant effect on the errors, but all three recognizers were affected similarly.  Each returned an N-best list, of which $1 feel off the fastest.  DTW was noticably the slowest.  Also gathered some qualitative data on what gestures users liked the most. Rubine did the worst of the three.

As of writing, had not done empirical tests with actual novice programmers. Also discussed allowing the end-user to correct a failed recognition.
Discussion:

Maybe I missed it, but how is accuracy defined again? Do they get it right if they didn’t classify the gesture?

The idea of stripping down sketch recognition further and further is appealing, but the limitations are great.  This is basically just a shape recognizer,  so the system could recognize a command from the user, but the lose of direction, location, and (arguable) time will limit creative UI designers.  Corner-only and other region-specific gestures, or arrow gestures meant to define a speed or direction, are lost.

The concept is admirable and still quite useful however, especially as more and more rich internet applications (RIAs) become a norm and the line between desktop and web app is blurred.

« Previous Page