Archive for the 'Paper Analyses' Category

Analysis of “Multimodal Collaborative Handwriting Training for Visually-Impaired People”

Comments Made Elsewhere:

  1. Manoj’s Blog

Summary:

Authors sought to create a collaborative teacher-student system for visually-impaired children to learn handwriting and signatures.  Uses haptic and auditory feedback.  Most people don’t realize that blind people can’t write.  With speech synthesizers, a lot less emphasis is put on it, too.  People with visual disabilities learn best through tactial and auditory feedback.

Systems works by teacher drawing the appropriate stroke that is then felt by the student using a haptic device. Previous work had a device that would physically move the person’s hand for them to teach, but it was not as received as well by non-sight participants.  A second study did better when combining haptic feedback with auditory pitches based on location.  Combining this with real-time control of the haptic feedback by a teacher proved useful (as opposed to a constant-rate computered genereated haptic feedback).

Their system (McSig) allows a teacher to sit at a tablet PC and draw a shape, or create stencils for latter, that the student draws.  A device then guides the visually-impaired child along the stroke, using Dutch paper (device that raises when drawing on) and auditory pitch all as multimodal feedback.

Used an open source library from previous work to limit the trajectory of the force feedback to safe levels.   Performed usability testing with visually imparied adults.  Performed system evaluation with visually impaired children in 20 minute sessions.  A student was pre-tested to see if he knew the letter, then guided through it.  All letters were single stroke, though multi-stroke was supported.  A longevity study is next.

Discussion:

A very interesting and worthwhile application of the sketch domain to teaching.  Not much sketch recognition is going on with the computer itself as the author’s commented that the Microsoft handwriting recognition seemed to fail regularly, so the auditory feedback was turned off.  But its more sketch recognition of a different type, that of an impaired human through aide of a computer.  The problem is still the same (i.e. a human with sight can look at a sketch and understand it, but one without sight can not).

The success of their system is repetition with appropriate feedback, either verbally from a teacher or through a computer generated sound, force, or tactial shape.  We, as humans, are able to learn, we just need enough information.

This paper also seems interesting in light of my final project where I desire to teach people to draw.  Again, something that can be done with proper repetition, feedback, and the building of confidence (I believe). It’s my hypothesis that most people shy away from drawing because they believe they can’t do it, but given a system that helps them establish spatial relationships with visual feedback and “rewards”, a person could be taught to draw, say, a favorite Disney character.

Here they have a teacher provide real-time drawing feedback to the student, which they found to be important in light of prior research that had the computer provide this feedback in a constant-time after a teacher had input the shape.  This seems interesting to me also for my final project, and I’m considering it as a phase to complete for the user.

Analysis of “Fluid Sketches: Continuous Recognition and Morphing of Simple Hand-Drawn Shapes”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Authors seek to introduce a new “paradigm” for sketching in that recognition for simple shapes occurs while sketching and the sketch is actually fitted before the pen movement is even done.  Uses least squares fitting to simple shapes.

Created an equation that is a function of current trajectory of the point, the fit to an ideal curve of a particular shape class, and the current time and time of the point’s drawing.  This function is fitted to the limited domain of simple shapes that the author has (circles, rectangles, etc.).   Fitting to the different types of shapes is covered.  The author goes into great detail of the implementation for these functions.

Discussion:

I like the idea of fluid sketching, that it would “snap” to its ideal shape as the user is in the process of drawing, but the author admits that it’s hard to apply to more complex shapes and sometimes context is needed.

I could see this being really useful as a stage in teaching drawing using sketch recognition.  However, I ended up reading this paper over like three hours with a nap and many interruptions inbetween…so didn’t get as much as out of it as I should have.

Analysis of “Sketch Recognition User Interfaces: Guidelines for Design and Development”

Comments Made Elsewhere:

  1. Manoj’s Blog

Summary:

Paper claims that prior research as either been on sketch UI without much recognition or more solely on the sketch recognition technologies.  Evaluates what makes a strong sketch recognition married with HCI, including how to handle situations, editing, errors, etc.  Also looks at what evaluation method is appropriate (paper prototypes, etc.).  Uses the domain of shapes for a PowerPoint presentation.

Application: Created a PowerPoint slide annotation application.  Allowed “online editing mode” where user could hold pen for X milliseconds to switch from sketch to editing/selecting.  Also allowed recognized and unrecognized shape creation (require UI element to switch). Shapes are cleaned up when put into PowerPoint.

System evaluation invovled three users asked to draw, draw and type text, draw and edit/delete.

Observations: (1) Don’t give feedback until the user explicitly says they’re done sketching. (2) Have to provide an obvious indication of recognized/unrecognized mode switching, (3) Only recognize one-domain at a time (multiple domain not feasible as of this paper), (4) Have pen-based editing for copy, paste, etc. (5) Support distinct editing and sketching gestures without a modal switch, (6) Use large buttons (which users prefer over keyboard shortcuts), (7) Always have real-time pen response.

Found that it was either paper prototyping or a full system.  SkRUIs are different then normal testing because (1) they aren’t command-line based, and (2) there’s lots of freedom relative to just button clicking.

Discussion:

None of these papers seem to mention if the person is using a USB-attached tablet or a screen tablet, which very much adds to the HCI.  And three people is not nearly enough in my opinion.

There domain seems relatively simple (relationship diagrams), though they did mention circuit diagram recognition as well.  Having the user define the domain (i.e. LADDER) adds a whole other factor.  After repeated use of LADDER, functionality liked delayed recognition would seem like a hindrance because I know the rules and how some shapes rely on others for recognition, so I would want them to be sure the system knew they were there.

I like the idea of online editing mode.  Thinking of other gestures-based selection initiations besides holding, what if the first few points of a stroke drawn over an existing shape would automatically start dragging that shape.  If the user pauses, he didn’t really mean to pick it up and it is released, turning the gesture from editing to sketching.  If he did mean to pick it up and continues with his motion for some threshold, the shape will move with the pen until pen-up.  Deleting could also occur in a similar way.

Incorporating the lasso/selection-box tool we read about previously would also be a good addition here.

Having immediate recognition on pen-up also seems like a solution to the recognized vs. unrecognized shape creation.  If the system has a low confidence in the shape, leave it unrecognized instead of correcting it.  The user can then explicitly request re-recognition or un-recognition of the shape (or is this giving to much uneeded functionality to the novice user?).

Analysis of “Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples”

Comments Made Elsewhere:

  1. Akshay’s Blog

Summary:

Authors seek to create a system that automatically generates “near-misses” or alternatives to a LADDER description so that the shape designer can provide relative feedback.  Want to handle the “conceptual errors” during LADDER description creation that result in an under- or over-constrained system.

Starts with an example shape drawn by the user, who then defines the constraints himself or has the system generate constraints for him (the system having heuristics to limit this down to a manageable list).  the system then keeps running lists of good, bad, and possible constraints based positive and negative feedback.

First example shown are scale and rotation variants of the example shape. For over-constrained, goes through each constraint and shows the designer an example shape with that constraint’s negation to determine if it’s needed or not. For under-constrained, adds any additional constraints that fit and then tests if they should be there by example shapes made with their negation.

Author also discusses the steps for solving and generating a shape based on given constraints.

Discussion:

In this paper, the author presents work on way to get relative feedback from a shape designer to properly constrain a shape description in the LADDER language.  I’m also familiar with relative feedback in the domain of information storage and retrieval, where a user marks documents or search results as good or bad.  There, it is usually done in a vector space or using a weighted function.  Here, it’s similar to a vector space comparison, you’re just going through each of the vector’s components one-by-one instead of marking an entire document and it’s representative vector as good or bad.

For a complex shape with lots of primitives, I wonder if the calculation of negative or possible constraints might have an adverse effect if the designer fails to notice the slight variation from one negated constraint.  Does this system only work for simple shapes (i.e. arrows, squares, etc.)?   Unfortunately, I couldn’t get the feature to work in my current version of LADDER.

Analysis of “What Are Intelligence? And Why?”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Introduction:

Author desires to return to the simple motives behind intelligence and thinking, claiming that evolution has lead our bodies and our cognitive abilities to be “overdetermined” and “unnecessarily complex”.   He also sets up that thinking is indeed part abstract, but mostly it is visual in that we replay and simulate our experiences.  Author also explore why we need intelligence and if humans truly have a monopoly on it.

The fundamental elements of intelligence are: (1) prediction of future actions (imagining), (2) response to change instead of inalterable instinct or conditioned reflexes, (3) intentional, goal-oriented action, (4) reasoning based on collection of data.

AI has borrowed from five fields to obtain views on intelligent reasoning: (1) mathematical logic (everything a result of first-order logic and it can be capture in a formal description), (2) psychology (more a function of natural science, (3) biology, (4) statistics, and (5) economics.  Author also discusses social collective intelligence…

Author admits and cites others also to the notion that no hard facts on the evolution of human cognition.  Lays claim to nature making nothing by blind searches on what works and what doesn’t.  Presents different theories on why our brains grew over millions of years, whether it be to hunt better or to just get along with other humans.  Notes that we did not really adopt languages until our brain growth leveled off.  Makes distinct between animal and human intelligence in that we can imagine while an animal is more of a “here and now character.”

Author claims that intelligence is merely a “natural phenomenon” of evolution, unable to explain why we obtained it outside the notion that we somehow moved past hunting and gathering.

Discussion:

Having just read the introduction, I can already tell I may not like this article.  I, in no way, believe we are “overdetermined” and “unnecessarily complex” beings made by “blind searches” of nature without thought or design consideration.

Following up, I am not of the belief that we are part of an uncontrolled, unguided chain reaction, meant only to survive and exist.  Instead I believe humans are the result of purposeful and intelligent design, given a life to live with meaning and set apart from the rest of creation, or “nature”, for such.  How much weight is continually put on discrepancies between our eye and that of a octopus or other animal when I’d rather you explain to me how a “blind search” made an eye that was able to see?

Where I do agree with the author is that human intelligence achieves much through visual interpretation and the reproduction of such through imagination. The discussion on this I did find interesting, especially to the application of recreating “intelligence” artificially, just as was done with the parrot example and similarly can be done with a computer.  What a gift human intellilect and reasoning are!

Analysis of “Magic Paper: Sketch-Understanding Research”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Have implemented a physics demo similar (or based on) Dr. Hammond’s work.  Claim that cognitive science resesarch says that people make more alternatives using hand-drawn sketch tools than a diagramming tool.

Must have a domain and lexicon for sketch recognition as unrestricted understanding is currently not possible (just like speech recognition).  Unlike speech req, you can go back and change something you previous made, so sketch req is non-chronological.

Using image-based techniques is useful for overtracing and filled-in shapes.

Discussion:

This is a good overview of a lot of the work we’ve read.  I’m curious to read the near-miss paper next.

Analysis of “Grouping Text Lines in Freeform Handwritten Notes”

Comments Made Elsewhere:

  1. Manoj’s Blog

Summary:


Discussed other attempts that borrowed from document image analysis techniques, but didn’t do well for freeform ink unless using non-scalable heuristics.  Describe their technique to use a global cost function that should be optimized for the partitions of strokes found.  Use linear regression and “horizontal and vertical compactness” (largest gaps between strokes in x and y) to find the “goodness of a line” of possible text. Use temporal info to group first.
Also calculate a “configuration consistency” for a recognized line of text by using a neighborhood graph between all the recognized lines of text without a non-text element between them.  Longer lines have more weight.  Whether it is configured correctly is a function of the summation of the length-weighted neighbors.  “Model complexity” is the number of lines in the partition.  The cost function is a summation of all of these.
Group initially by temporal data, then generate alternative “hypotheses” by merging line segments and accounting for “high configuration energy” (i.e. the dot of an ‘i’ or cross of a ‘t’ late in the game).
Iteratively groups recognizes “lines” until the global function is optimized.


Discussion:
They admit to hand tuning their parameters for their testing set, but overall their process is very good (i.e. start with temporal data, group similar neighbors, and account for unique cases).  Hopefully we can borrow some of their intuition.

Analysis of “Perceptually-Supported Image Editing of Text and Graphics”

Comments Made Elsewhere:

  1. Yuxiang’s Blog

Discussion:

Seek to create an scanned document, image editing application, particularly for doodles and sketches.  Present new ways for segmenting and selecting elements.  One mechanism is a combination rectangle/lasso selection that is inferred from the user’s mouse movement.  The selection becomes a new image.  All images can be grouped on the parent image, but not in a hierarchy (allows multiple groupings).  Subsequent clicks on an image will cycle through the groups.

Also create algorithm for keying out the background automatically.  Discussing how confusing it was to “anchor” images when a user wanted to make a subselection.

Uses “perceptual organization” principles for fragmenting into strokes and blogs and then grouping together.  Strokes based on curvilinear smootness and closure.  Blogs based on spatial proximity and curvilinear alignment to look like words.  Uses this for automatic segmentation as invoked by the user.

Are working on an interface that just uses a pen next.

Summary:

There combination rectangle/lasso idea is a good one.

I have no problem with the checkboard pattern to signify transparency.  It gives a sense of depth, that you can see past the opaque parts.

I’m also interested in how to seamless switch between editing/drawing pen input.  With a explicit modal switching device, all you have is time, gesture, pressure, and angle. No one wants to perform “holds” or start a stroke with a certain segment in order to get to the correct mode. Perhaps by tapping an arrangement of dots?  But then why not give them a button?

Good paper, though, and I’m always a fan of cool UI techniques.

Analysis of “Sketch Recognition for Computer-Aided Design”

Comments Made Elsewhere:

  1. Yuixang’s Blog

Summary:

Seeks to determine user intent based as a function of sequence, speed, and pressure.  Corners recognized by slowing down and placed at intersection of the two lines.   Curves fitted to b-splines.   Do overtracing determination to remove excess lines.  Do latching to snap points and remove excess points, with a snapping radius determined by the speed/length/density. Could also use these to distinguish users. Discussion on the implications of the user providing feedback to the system and interacting with the system.

Discussion:

Good paper of pulling out user intent by speed and sequence.  No real comments.

Analysis of “Template-based Online Character Recognition”

Comments Made Elsewhere:

  1. Yuixang’s Blog

Summary:

Seek to allow for writer-independent character recognition by creating a “class” of templates for each writer and then training a classifier on these classes.  Do character recognition based on the templates after data reduction.  Discuss the two different classifers they used.

Other attempts: primitive decomposition (break into dots, arcs, loops, etc. and recognition characters using dictionary lookup or HMM), motor models (try to simulate movement of the hand?), elastic matching (”featureless” matching of points to points), stochastic models (extract features from points or sliding window of points and use HMM), time delay neural networks.

Preprocessing: Do resampling to make stroke equidistant and then apply Gaussian filter to each coordinates. End points and points of high curvature are preserved.  Scaled to be the same height but still have same aspect ratio.

Representation: Strokes listed as x,y, and theta of curvature.  Each stroke represented as sequence of events and then determine of sum of distances between two sequences of events.  When two strokes, or sequence of events, don’t have the same number, penalties are added to the distance calculated, ending up with four different equations for distance metric and going with the minimum of the four.

Data Reduction: (1) Cluster the “featureless” characters into some set K amount of clusters, each technically representing a writing style, and then take the medoid of the cluster. (2) Use a nearest neighbor calculation to select all examples on the edge of the training set.

Classification: (1) Used nearest neighbor again, or (2) constructed a decision tree based on a vector of the distances from each reference character.  Each of these “similarity features” (aka a comparison to a reference character) is then used a node on the decision tree?  They expand this to a “difference feature” by comparing to reference characters at each node to determine with one the input character is more like, and produce better classification.

Discussion:

I think their preprocessing steps would be quite useful in text-vs-shape distinction.

On page 13 second paragraph, when are they recognizing that multiple strokes belong to a character?  I assume this is for input strokes.

I’m not a fan of clustering algorithms that require a priori number of clusters to find.  I know this is a hard problem, but I know there are other algorithms out there that do a more bottom-up approach and segment the clusters themselves.

I don’t understand why they’d want to only select the templates on the edge of the training set?  Is it because there are the most extreme cases?

It’s interesting to see another paper using a decision tree for classification of text.  Maybe the feature space really is that small.

« Previous PageNext Page »