Analysis of “Backpropagation Applied to Handwritten Zip Code Recognition”

Comments Made Elsewhere:

  1. Nabeel’s Blog

Summary:

Doing recognition on zip codes written on letters.  Recognizes the zip code and then does linear transformation to put in a normalized form.  Discussing extracting “local features” into order to make higher order ones.  These local features can appear anywhere, so only care about their approximate position, not precise.  Detection is done through “weight sharing” (don’t understand it just yet).

The “network” architecture consists of three layers, which they call H1, H2, H3.  Each layer has 12 groups.  For H1, each group is made up of 64 (8×8) units, or “hidden units”, each making up a feature map.  Each unit considers the 5×5 space of its neighbors.  The image is undersampled because just want to detect the presence of features, not the precise position?  Each unit performs the same operation, sharing the same 25 weights, but have different biases (thresholds).  Each feature map has a different set of 25 weights.

For H2, each group is only 16 units (4×4). Each unit takes in “connections” from 8 of the 12 feature maps from H1, so it have eight 5×5 neighborhoods.

For H3, has 30 units and is “fully connected” to H2.  H3 also connects to the output layer.  In all, there are 9760 independent parameters.

Used the backpropagation simulator. All weights are randomly initialized, and I assume it tries to optimize the output (mean squared error) on each iteration?

Found segmentation to be a major problem.  Also ambigious patterns or writing styles not seen in training set.  They successfully applied backpropagation learning to “large, real-world task”.

Discussion:

I’m not fully grasping this paper and will have to discuss this in class to complete my understanding.

No Comment

No comments yet

Leave a reply

*
To prove you're a person (not a spam script), type the security word shown in the picture.
Anti-Spam Image