Episodic Memory of Weizmann Edge snippets

This page describes performance of a 2-level (non-hierarchical) version of Sparsey on 70 of the 90 Weizmann snippets that have been preprocessed using simple binary edge filtering (applied within a bouding box (BB) around the human actor).  The video below shows the binary edge input (produced from a mask files) for two of the 90 Weizmann snippets (daria_bend.avi and daria_jack.avi).  We also decimated the originals in time from about 80 frames to about 30.  The same frame size, 84x120, is used for all 10 Weizmann classes.

Below, we see the Sparsey model (with ~13 million binary wts) being reactivated during a test presentation of another of the edge videos, "daria_walk", which was presented once during learning (as were all the sequences, i.e., single-trial learning). There are many things to note in this video. At the most summary level, note that the model has assigned a sequence of 13 sparse codes, which are chained together via Hebbian learning (green "horizontal" (H) weights). Each sparse code occurs in one of the 20 macrocolumns (macs) (a mac can be active multiple times, including consecutively). Each code (set of colored cells in the mac [black: correct, red: incorrectly active, green: incorrectly inactive]), is chosen as a function of the simultaneous input via the H-wts and the bottom-up (U) information from the active input "pixels" (blue wts), i.e., as a function of a spatiotemporal similarity computation.  Also note that we only show the afferent active U (blue) and H (green) wts for one cell on each frame.  When the unfolding trace is highly accurate (as in this example...i.e., most of the cells are black), all eight winners in a mac will have highly correlated patterns of U and H input (as seen in the 3rd video on this page).

The supervised learning is accomplished by increasing the weights from the last code active in the sequence onto a field of class nodes (not seen in this figure).

RESULT: We presented all 90 snippets once each (a total of 1,722 frames), resulting in 90 spatiotemporal sparse distributed code (SDC) memory traces like the one shown. Classification accuracy was 99% (89 out of 90).  Overall memory trace accuracy was 87%. Training time was 290 sec.  Note that while the model has ~12.5 million wts (U and H wts combined), relatively small percentages of the wts afferent to most internal units (of which there are 20 x 7 x 8 = 1,120) are actually increased during learning, e.g., ~15-25% for H, ~5-10% for U. This suggests that this particular model could probably store a great deal more such input sequences and get high trace (and classification (train=test)) accuracy.

However, as discussed at the bottom of this page, this non-hiearchical model will not generalize well....it's parameters have been optimized to maximize episodic memory capacity.  We are now moving on (back) to testing hierarchical models, to obtain a good "train != test" (i.e., standard classification test) result.

Again, there are numerous sources of possible speed-up, which we will explore. Notably, if you look at the input snippet above, you can see that each frame is extremely sparse. We suspect that these Weizmann classes could be learned from spatially much coarser inputs. So rather than 84x120 pixels, we are preparing 42x60 versions (using blurring, rescaling and re-binarization).  This alone would speed up the processing by at least 10x.

The model that achieved the above results limited the fraction of a mac's U wts increased over the training set to 25%. We noticed that several of the mac's U matrices were hitting the saturation limit and so we tested a model which set that limit to 45%. The model also had fewer cells; Q=7 CMs per mac and K=6 cells per CM; a total of 840 cells and ~9.5 million wts. This model attained 97% class accuracy (87 out of 90), 87% recognition trace accuracy, and training time was 303 sec (not sure why this smaller model (9.5 million vs. 12.5 million wts) took slightly longer to run).  Also, we know that we could have made several small parameter changes that would have increased class accuracy very close to 100%, and again, we expect several orders of magnitude speed increase are easily possible with standard software optimizations and minimizing the input size. Remember that Sparsey does not need to compute any gradients.