Hierarchical Spatiotemporal Sparse Distributed
Memory Trace of a Sequence

  1. This animation shows the procession through time of a hierarchical temporal SDR memory trace representing the input sequence, “BALL”. NOTE: This figure is unfolded in time: the blue (horizontal) links are actually recurrent. A main point of this figure is to suggest that the three input sources to a level, bottom-up (U), horizontal (H), and top-down (D), combine, on each time step, to determine the code that becomes active at that level.  NOTE in particular, that the L2 codes each last ("persist") for two time steps and the L3 code persists for all four time steps.
  2. This model has an input level, which uses a localist representation: 4x4 grid of binary pixels. Each internal level consists of one macrocolumn ("mac") that is comprised of a set of WTA minicolumns.  The L1, L2, and L3 macs consists of 12, 9, and 6 minicolumns, respectively. The figure shows the temporal trace unfolded (unrolled) in time.  Thus this is a recurrent model, not a model in which time is explicitly spatialized, e.g., Waibel et al's Time Delay Neural Net.
  3. Blue arrows show the propagation of H signals via the recurrent H synaptic matrix in each internal level.  These signals are shown originating from a code active at t and arriving back at the cells of the same mac at t+1.  That is, the figure is unfolded in time.
  4. Blue arrows show the propagation of H signals via the recurrent H synaptic matrix in each internal level.  These signals are shown originating from a code active at t and arriving back at the cells of the same mac at t+1.
  5. Magenta arrows show propagating D signals, originating from the superjacent code active on the same time step.
  6. This figure serves equally well as a picture of the initial formation (learning) of the trace in response to “BALL” and as a picture of the reinstatement (retrieval) of a previously learned trace of “BALL”.
  • You can see the forward (upward) sweep of activation on each time step. In the real cortex, there may be 7-10 levels. We envision that each level does its processing in a gamma cycle and that the theta cycle corresponds to one sweep through all levels.
  • The processing that takes place at each level is the evaluation of the total input (U, D, and H signals), calculation of the familarity, G, of that total (context-dependent) input, and choosing a code to become active to represent that total input. See my 2010 and 2014 papers for descriptions of the processing.
  • The pictorial convention is that at the beginning of each new time step, the whole network appears showing the codes that were active at the end of the prior time step. As the signals propagate and the code selection algorithm executes at each level, the code may be changed, in which case you will see the new code become active while the prior code fades out. On the other hand, if the age of the code at a given level is less than its persistence, it remains active. This is shown as a momentary pulsing (blinking) of the code. Again, the longer persistence of a code at level J+1 is what allows that code to become associatively linked with multiple (here, two) successive codes at level J.
  • Note that each L2 code remains active for two time steps and the L3 code remains active for all four time steps.
  • Note, in particular the convergence of U, D, and H signals at each internal level (only U and H at the top) on each time step. Each unit in a mac normalizes these inputs separately and then multiplies (possibly nonlinearly transformed versions of) of them to determine the unit's overall local degree of support.
  • As suggested by this animation, the formation (learning) of a memory trace is essentially single-trial (one-shot). When presented with a novel sequence, the model will detect low familarity (G~0.0) on each time step at each level. In this case, codes will be selected at random and full-strength associative connections will be made between them. When presented with a familiar sequence, the model will detect high familiarity (G~1.0) on each time step and at each level, which minimizes the amount of randomness in the code selection process, allowing the determinisitic influence of prior learning to dominate code selection and, with high probability, reinstate the previously learned codes.