Analysis of Oltman’s Thesis
Comments Made Elsewhere:
Summary:
Chapter 1:
Wants to make free-sketch drawing (no cleaned up versions) a part of any design process, not requiring the user to be forced to interact with buttons or other elements (no interruption). Most recognizers can’t do this because they require the relative feedback in order to improve their recognition later. Do this through a computer vision and machine learning based approach that segments the ink visually to see the “parts”. Cleaned up sketch recognition doesn’t have to deal with corrective strokes (ex: user goes back to complete a rectangle), and will also cause the user to change drawing styles when they realize what the system is capable of (how it recognizes certain shapes).
Problems:
1) Signal noise - two shapes are never drawn the same, even if intended (lines don’t touch, pen flicks, etc.)
2) Conceptual variation - variations of the same symbol that a classifier must learn or account for
3) Overtracing - occurs because user wants to emphasize or tidy up; does not generally change the shape so computer vision technique is suited for this
4) Segmentation - how to divvy up the sketch and bring them back to relative groups; again claims of computer vision doing well here
Lead argument: the user only cares that the stroke looks like as drawn, so visual recognition is best suited for this.
Isolated Shape Classification: Segment the piece into “visual parts” (not semantic parts as in stoke-based recognition, like the line in a rectangle). Each part is a circular region of rings and wedges (making of regions which have histograms of the pixel of ink for that region). All the regions form a vector of counts for the part. A stroke is made up of these parts (usually 50 determined) and then a classifier is used. A “codebook” describes holds a “standard vocabulary” for the parts. The parts of sketch symbol are then compared to those in the codebook by comparing histograms of parts, creating a matrix of comparison. Extract a “match vector” from this by getting the minimum’s for each row. The match vector is used to train a classifier for difference between shape classes (thought it was the other way around??). Gets around signal noise because it’s using the histogram of regions, so direction will still generally be the same (though how much that line wiggles doesn’t matter). Also, parts can match even though not identical because we find a minimum against the codebook.
Shape Localization: Scan and segment into canidates. Run the classifier, sort the results, and then group.
Chapter 2:
Each part is identified by a “bullseye”. The radius of the rings do not increment equally so that inner rings are smaller and represent data more tightly while the outer rings are larger and give “context” to the inner ones. Are equally distanced in log-space. A bullseye is rotated to the direction of the stroke, then another half-bin width to prevent the data from being on the edge of the histogram, and this direction (orientation) also serves as a third dimension for the histogram data in each bin (helps to distinguish bins with the same histograms). Calculate direction using the tangents of a 19-point window and then an “orthogonal distance regression” test. Strokes are also scaled and resampled for preprocessing. The bins are normalized for calculating a distance metric between bulleyes using their vectors. Distance metric is small for similar ones and large for dissimlar.
Chapter 3:
Need to create the cookbook. Do so by calculating the bullseye parts for each shape in the training set, cluster them, and then use a representative centroid part in the codebook. Create a match vector from an input sketch’s bullseye’s and compare it to the codebook (the degree to which each one of its parts appears in the codebook). Classify on this.
Chapter 4:
Uses a Support Vector machines classifier that learns the “decision boundary” that will divide two (and apparently only two) classes in the vector space. This is binary classifier. Need one for every pair of classes. Class with the most “votes” by the binary classifies for an input sketch is the winner.
To actually find shapes in the complete sketch:
- Select candidate region by sliding a window along the sketch and taking “snapshots”, increase the size, do it again. Remove snapshots with too few points or that overlap too much.
- Run the classifier on all these candidate regions.
- Clusters those candidates that were classified similarly (splitting up clusters that are too large) and output a set of predictions.
- Remove predictions that still overlap and choose the final predictions from those over a threshold.
Chapter 5:
Yeah for evaluation. Got 89.5%.
Chapter 6:
Go related work. You rock.
Discussion:
Despite the argument shared by the author, I’m still partial to a system that provides corrective feedback. The intended goal of the design process is to be creating something that transfer smoothly to the next phrase of the process (i.e. CAD, UML, etc.), thus constructing this along the way for be more benefitial. In all of the example domains the author states, the free-sketch user is already disposed to drawing for a system (just a system of symbol conventions instead of a recognition), so why not just go with a stroke recognizer that has higher accuracy rates?
Still a very interesting technique, though. It doesn’t have to worry at all about noise in the stroke or overtracing.
How do they determine this “standard vocabulary of parts”? A priori knowledge? The “vocabulary” is the distance measure between parts. At first, I didn’t grasp the purpose of the codebook, but I see now that is more a key or middle man for getting the training data into a format that the classifier can work with.
I’m still confused as to how the bulleyes are determined initially. The candidate regions are determined via a sliding window, but I’ve missed how the bullseyes are choosen.
Added later:
Something like this creates a heavy vector space like in document retrieval (lots of words in each document), so you could employ search techniques from this discipline (i.e. cosine similarity, k nearest neighbor, rocchio classification). For example, when the parts from the input shapes are clustered, could just determine the centroid or nearest neighbor of a new part. However, this is an expensive operation (as the author notes on the bottom of page 45).
Comments(1)