Archive for September, 2008

Analysis of Oltman’s Thesis

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Chapter 1:

Wants to make free-sketch drawing (no cleaned up versions) a part of any design process, not requiring the user to be forced to interact with buttons or other elements (no interruption).  Most recognizers can’t do this because they require the relative feedback in order to improve their recognition later.  Do this through a computer vision and machine learning based approach that segments the ink visually to see the “parts”.  Cleaned up sketch recognition doesn’t have to deal with corrective strokes (ex: user goes back to complete a rectangle), and will also cause the user to change drawing styles when they realize what the system is capable of (how it recognizes certain shapes).

Problems:

1) Signal noise - two shapes are never drawn the same, even if intended (lines don’t touch, pen flicks, etc.)

2) Conceptual variation - variations of the same symbol that a classifier must learn or account for

3) Overtracing - occurs because user wants to emphasize or tidy up; does not generally change the shape so computer vision technique is suited for this

4) Segmentation - how to divvy up the sketch and bring them back to relative groups; again claims of computer vision doing well here

Lead argument: the user only cares that the stroke looks like as drawn, so visual recognition is best suited for this.

Isolated Shape Classification: Segment the piece into “visual parts” (not semantic parts as in stoke-based recognition, like the line in a rectangle). Each part is a circular region of rings and wedges (making of regions which have histograms of the pixel of ink for that region).  All the regions form a vector of counts for the part.  A stroke is made up of these parts (usually 50 determined) and then a classifier is used.  A “codebook” describes holds a “standard vocabulary” for the parts.  The parts of sketch symbol are then compared to those in the codebook by comparing histograms of parts, creating a matrix of comparison.  Extract a “match vector” from this by getting the minimum’s for each row.  The match vector is used to train a classifier for difference between shape classes (thought it was the other way around??). Gets around signal noise because it’s using the histogram of regions, so direction will still generally be the same (though how much that line wiggles doesn’t matter).  Also, parts can match even though not identical because we find a minimum against the codebook.

Shape Localization: Scan and segment into canidates. Run the classifier, sort the results, and then group.

Chapter 2:

Each part is identified by a “bullseye”.  The radius of the rings do not increment equally so that inner rings are smaller and represent data more tightly while the outer rings are larger and give “context” to the inner ones.  Are equally distanced in log-space. A bullseye is rotated to the direction of the stroke, then another half-bin width to prevent the data from being on the edge of the histogram, and this direction (orientation) also serves as a third dimension for the histogram data in each bin (helps to distinguish bins with the same histograms).  Calculate direction using the tangents of a 19-point window and then an “orthogonal distance regression” test.  Strokes are also scaled and resampled for preprocessing.  The bins are normalized for calculating a distance metric between bulleyes using their vectors.  Distance metric is small for similar ones and large for dissimlar.

Chapter 3:

Need to create the cookbook.  Do so by calculating the bullseye parts for each shape in the training set, cluster them, and then use a representative centroid part in the codebook.  Create a match vector from an input sketch’s bullseye’s and compare it to the codebook (the degree to which each one of its parts appears in the codebook).  Classify on this.

Chapter 4:

Uses a Support Vector machines classifier that learns the “decision boundary” that will divide two (and apparently only two) classes in the vector space.  This is binary classifier.  Need one for every pair of classes.  Class with the most “votes” by the binary classifies for an input sketch is the winner.

To actually find shapes in the complete sketch:

  1. Select candidate region by sliding a window along the sketch and taking “snapshots”, increase the size, do it again.  Remove snapshots with too few points or that overlap too much.
  2. Run the classifier on all these candidate regions.
  3. Clusters those candidates that were classified similarly (splitting up clusters that are too large) and output a set of predictions.
  4. Remove predictions that still overlap and choose the final predictions from those over a threshold.

Chapter 5:

Yeah for evaluation.  Got 89.5%.

Chapter 6:

Go related work. You rock.

Discussion:

Despite the argument shared by the author, I’m still partial to a system that provides corrective feedback.  The intended goal of the design process is to be creating something that transfer smoothly to the next phrase of the process (i.e. CAD, UML, etc.), thus constructing this along the way for be more benefitial.    In all of the example domains the author states, the free-sketch user is already disposed to drawing for a system (just a system of symbol conventions instead of a recognition), so why not just go with a stroke recognizer that has higher accuracy rates?

Still a very interesting technique, though.  It doesn’t have to worry at all about noise in the stroke or overtracing.

How do they determine this “standard vocabulary of parts”?  A priori knowledge? The “vocabulary” is the distance measure between parts.  At first, I didn’t grasp the purpose of the codebook, but I see now that is more a key or middle man for getting the training data into a format that the classifier can work with.

I’m still confused as to how the bulleyes are determined initially.  The candidate regions are determined via a sliding window, but I’ve missed how the bullseyes are choosen.

Added later:

Something like this creates a heavy vector space like in document retrieval (lots of words in each document), so you could employ search techniques from this discipline (i.e. cosine similarity, k nearest neighbor, rocchio classification). For example, when the parts from the input shapes are clustered, could just determine the centroid or nearest neighbor of a new part.  However, this is an expensive operation (as the author notes on the bottom of page 45).

Analysis of “Constellation Models for Sketch Recognition”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Introduction:

Are using a probabilistic model (constellation model) trained on example sketches to label an input stroke.  Supposed can recognize stokes that vary a lot, but they still must be drawn the same way (i.e. a rectangle draw with one stroke != another drawn with four).  Can have optional parts, but the mandatory parts must be present.

Other approaches used search trees, hierarchical graphs, and image-based techniques to find relationships between strokes.  In a constellation model, local features have defined shape and global position (like an eye, nose) and the pairwise features define relations, like distance, of the features.  They apply this for specific features to sketch recognition.  Features defined relative to the bounding box.  Only searches on mandatory features to prevent O(n^2) runtime.  Will search optional for still unlabeled strokes. From these feature vectors, compute mean and covariance matrices.  Use ML or maximum likelihood search.

For the branch and bounding part, run through each of the strokes with the mandatory labels, keeping track of the cost for each and using the best as the bound to limit searching other branches (once all mandatory labels are found). Use “multipass thresholding” to “cut off” branches before searching by using a generous threshold and then increasing using a pessimistic one on subsequent passes.  Also used hard constraints (i.e. a nose should always be above a mouth) to limit searches.  Both of these techiques greatly reduced the amount of time required for searching.

Discussion:

An interesting first read into doing multi-stroke recognition. Had to hit Wikipedia to learn about “branch and bounding” algorithms first thing.  They did not discuss their low-level stroke recognizer, or does this guy actually just work on global position and relation to other objects? (Sure there may be a stroke in the position where the eye is suppose to be, but did you first recognize that is looks like an eye?)  They might not care about this so as to all for cartoon-like drawings (ex: a child draws stars for eyes).  I assume the mandatory features are also hand-labeled?  (The author did discuss have the system define these, but it wasn’t good at it).

Why did they not post any accuracy rates for their algorithm?

I get the feature vector and covariance matrix calculations, though I’d have to do re-read if I were to present this.

Analysis of “GLADDER: Combining Gesture and Geometric Sketch Recognition”

Comments Made Elsewhere:

  1. Manoj’s Blog

Summary:

Combine geometric and gesture-based sketch recognition. Modified Rubine to classify using the Mahalanobis distance instead of a global covariance matrix (quadratic classifier instead of a linear one). Implemented LADDER as described.  Throw the shape in Rubine and, if below a certain threshold, keep it.  Back list from LADDER and then insert the Rubine answer into it.  Use this ranked list in the UI.

Discussion:

I like it. :)

Analysis of “A Domain-Independent System for Sketch Recognition”

Comments Made Elsewhere:

  1. Manoj’s Blog

Summary:

Has a two stage approach: stroke approximation that is done during drawing time and a post-processing phase that occurs after drawing is done.   The post-processing phase attempts to recognize shapes through heuristic reasoning rules.  Has “hierarchical” output of (1) low-level raw data, (2) syntactic level of vertexes and primitive shapes, and (3) high-level semantics with relations between objects drawn.

Claim to combine vertex detection and primitive shape approximation by using curvature, direction, and feature area.  Avoid any empirical thresholds. First trying to fit a primitive shape to the stroke, breaking it into pieces and attempting again.

Line Segment Approximation: use least-squares best fit on the direction curve or raw points to determine if all under a threshold.  Then use feature area to verify.

Curve Approximation: has a distinct direction graph pattern (forms a line with the slope of 2*pi/number of points); continue to use direction graph after finding a curve to hypothesize circles, ellipses, arcs, helixes, etc. and eliminate based on feature area heuristics.

For self-intersection strokes: make two copies and divvy them up different ways (at it’s intersection points and point with maximum curvature) and compare the recognition results.

Post-processing: do simple relational retrieval (use time and parallel/orthogonal structures), cleanup the strokes (remove tails), and do basic object recognition (boundaries, triangles, rectangles, arrows, dashed lines, etc.).

Discussion:

Seems like a great approach. The use of feature area instead on the raw points and the direction curve looks like a novel solution to tightly recognizing shapes, deviating away from heavy corner finding until more complex shapes come into play.

We’ve been blogged about…

A few weeks ago, we had a teleconference call with Dr. Christopher Herot during my Sketch Recognition class.  He’s an MIT grad and now successful entrepreneur.  I throughly enjoyed the conference call as Dr. Herot was quite knowledgeable and a good rebounding board for ideas.

But, while doing some Googling on corner finder, I discovered that he’d written about his meeting with the class.  It’s mainly a summary of the sentiments he expressed during the call, but still fun none the less.

Read the blog post here.

Analysis of “Eliminating False Positives During Corner Finding by Merging Similar Segments”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Very short paper. Using speed and curvature, go through and find every possible corner (unless too close together, and also move the one with the smallest curvature).� Next, assuming that the corners of small stoke segments are false positives, attempt to merge them with their neighbors, only doing so if the error is less.

Discussion:

Considering my discussion in the last blog post, here’s a solution for corner finding that doesn’t appear to require resampling.� It looks to be a very fast solution, too.� Seems like they could have implemented Kim&Kim’s local convexity as another heurisic of the initial fit.

Blogs.tamu.edu did an upgrade…

And there are still lots of glitches, like all of my categories disappeared and some posts have odd characters in them now.

Analysis of “A curvature estimation for pen input segmentation in sketch-based modeling”

Comments Made Elsewhere:

  1. Manoj’s Blog

Summary:

Paper summaries research on discovering the “region of support” (number of neighbors) to look at when determining curvature on the on-the-fly and only using angle space.

Things to note: resampling will undoubtedly cause smoothing of the input curve, so determining a good size for the k-neighbors is necessary.

As a first step, they initially define the curvature as the direction change at each point with k = 1 (only one neighbor). Next, they define “local convexity” where all neighboring points with the same direction (sign) as point(i) are used as the region of support.� Works well for some situations, but came make features indistinguishable (where to segment?).� Added “local monotonicity” where you only observe the neighboring subsets on the left and right as long as they are monotonically decreasing (meaning keep decreasing).� Next, find the local maximum or minimum (depending on if curve is positive or negative) to find the segmentation point.

Performed evaluation test with volunteers and then tested their algorithm in parts (direction only, with local convexity, only, etc.) and then against two other accepted algorithms.

Discussion:

Again, another paper that doesn’t make use of time so their implementation is independent of input medium (tablet PC or actual paper) and the direction of the scanning (can start from the front or back). They do use speed for calculating the threshold to use when determining the segmentation points.

I was actually thinking up a solution similar to theirs (but theirs is obviously better).� I did not realize the issue of determining the optimal number (k) of nearest numbers to use, but am supportive of using a sliding window at drawing time for calculating the segmentation points there and then.

I like that they do use their segmentation algorithm for going to the next step of defining features for shape recognition on page 5.� This algorithm appears to be capable of complex line segmentation.

How could this be done without resampling?� Is it possible in determining curve segmentation?� I want to develop a vector-based approach that incorporates speed and doesn’t require resampling.

Analysis of “PaleoSketch: Accurate Primitive Sketch Recognition and Beautification”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Another curve and polyline recognizer, but also able to ellispes, arcs, curves, circles, helixes and spirals.

Related Works:

Talk about how feature-based recognizers require extensive training for strokes that must be drawn in the same manner, placing lots of constraints on the user.  These authors use lots of Sezgin and Yu/Cai research in their work to develop a more robust geometric recognizer.  Yu/Cai used curvature to segment strokes.  Goal is to recognize lots of primitive shapes.

Implementation: Have a pre-recognition phrase, then a series of pass/fail tests that return a beautified shape, and then a hierarchical organizer.

Pre-recognition:

Remove duplicate points. Create a direction, speed, curvature, and corners graphs. Compute “normalized distance between direction extremes (NDDE)”, or distance between high and low points on the direction graph, and “direction change ratio (DCR)”, or maximum change in direction divided by average change in direction. Both used to help distinguish polylines from curves.  Remove tails.  Determine if overtraced by computing total rotation.  Determine if closed figure.

Line Test:

Compute the least squares test between all the points and the best fit line. If meets a threshold and is not overtraced (less than three corners), it’s a line. Return line between the two endpoints (not best fit line).

Polyline Test:

Find then strokes. Each much pass the line test.  The average least squares error must pass meet a threshold.  Must have high DCR value.

Ellipse Test, Circle Test, Arc Test, Curve Test, Spiral Test, Helix Test, Complex Test:

Tired of reading…no need to rewrite the constraints here…

Hierarchy:

Whether a polyline or complex shape, all the returned beautified shapes are ranked by their components (example: line - 1, helix - 5).  The lowest rank is the shape used,  though alternatives are kept for the user.
Discussion:

I’m glad that I read Sezgin first. I think it helped me compare and constrast with this paper better.
In the future work, they said they wanted to be able to handle multiple strokes.  One constraint could be to check the distance between endpoints of strokes and, if within a threshold and the time between the two do no vary much, throw the group of strokes through the recognizer. Comparing endpoints is like the user “picked up where he left off”. How these two thresholds are determined, I don’t know.  You’ve always got “undo” on your side.  Not a universal approach.

I like the “invisible corners” problem…could maybe approach it by shooting vectors off the tangents of points and marking where they regularly intersect.  Check the dot product to see if they’re close to being orthogonal.

Another paper I’m excited about.

Analysis of “Sketch Based Interfaces: Early Processing for Sketch Understanding”

Comments Made Elsewhere:

  1. Andrew’s Blog

Summary:

Goal is to design a system that designers can interact with from the beginning, having the computer understand the paper prototypes before computer aided drawing. Want the system to not have icons, menus, and buttons, just exploit direct manipulation without having to learn specific ways to draw objects (draw a rectangle however you want).  System will need to handle any number of basic primitives: lines, curves, polylines, etc.

Three phases: approximation (fits lines and curves to the points), beautification (modifies output and helps next phase), basic recognition (looks for shapes)…another phase for more high level recognition (domain-dependent symbols).

Vertex detection for lines:

Broke components of a sketch in direction, curvature, and speed.  Spikes correlated to possibly corners, but how to beat through the noise? Use “average based filtering”, meaning set a threshold for lower speeds and higher curves based on the mean of each data set.  Doesn’t free of all noise (look into scale space as a better alternative), and can’t use curvature and speed separately (they work best by complimenting each other).

To create these hybrid fits, will generate a certainty metric of [0,1] for each curvature (Fd) and speed candidate (Fs) vertex.  On different scales, but able to sort each set on these. Merges into a final hybrid set by finding the first intersection of the two, and then taking the highest scoring candidates from each set after calculating the least squares error.

Handling curves:

Find arcs by dividing the arc length by the distance of each pair of previously detected vertices. Approximated the curved regions using Bezier curves, which has two control points and two end points.  Gave equations for calculating the control points.  Again, used least squares of distance of candidate point from curve to determine the best fit, but had to create a piecewise linear curve to use for approximation since using a Bezier curve was too expensive.

Beautification:

For line segments, adjusts the slopes of the lines by rotating around the midpoint of the line.  Had some technique for clustering the slopes and using a histogram Doesn’t mention computing new control points based on the end points.

Basic Shape Recognition:

Used handmade templates.

Evaluation:

Did an informal evaluation; compared their’s to a standard tool; everyone liked it; had 96% recognition rate.

Discussion:

I like the goals that they lay out in the related works: (1) able to draw arbitrary shapes in a single stroke, (2) automatic vertex detection, (3) not require different drawing modes for different types of geometric objects, and (4) should feel natural.  All lofty and desirable goals to achieve.

They do not mention what they do in the beautification for the curves.  I understand that they’ve already got the calculated control points from recognition of the Bezier curve.  However, the end points of the curve will have changed by the beautification of the line segments.   Recognizing it as a curve and drawing it using the calculated end points and control points is beautification in itself, but I assume new control points do not need to be calculated.

I wonder if the curve could be simplified by not using a quadratic Bezier curve and not a cubic one.  Only one control point and might help in the least square errors determiniation.

Next Page »