Actual vs. Restricted Views of Input Features (Pixels) for all Macs at all Levels

Each row of this montage shows the actual features experienced by the macs at all levels for a particular 8-frame 16x16 synthetic moving 2-segment arm. Each column shows the actual features experienced for a particular V1 mac activation criteria setting. In column 1, the criterion was that the number of active pixels in the V1 mac's aperture was between 3 and 4, inclusive (π-=3, π+=4). In column, π-=4, π+=4, and in column 3, π-=4, π+=5. For all other macs, π-=1, π+=4, though for these other macs, at V2 and V3, the units for those criteria are not pixels, but rather active afferent subjacent macs.

The main point of this montage is to show how the information that is actually being coded by the different macs, particularly at higher levels, varies depending on the π parameters. (For clarity, that we do not show the neurons in these macs and thus do not show the codes active in the macs. The second figure below shows the active codes that occurred corresponding to the trial of row 4 column 1 of the top montage.) In each row, compare the actual pixels shown in any given mac across the three columns. Use the slider control to focus on any one particular frame. The difference is most visible for the V3 (top) mac. In general, more pixels are shown in V3 in column 1 than for columns 2 and 3. This means that the version of the input spatiotemporal concept (or the specific spatiotemporal pattern if you prefer) that is actually encoded in the V3 mac is much richer in column 1 than in the other two columns. The same is generally true for the V2 macs as well. This means that more weights are being increased in the network of column 1. The actual spatiotemporal patterns for which codes are assigned in a mac constitute that mac's basis for representing its (spatiotemporal) input space (receptive field, RF). Ultimately, we want to understand how rich the stored patterns (basis elements) need to be in order for a mac to be able to recognize all future inputs to its RF with a given level of accuracy. But, the optimization process corresponding to this question simultaneously plays out interdependently amongst all macs at all levels. So analysis is very difficult and at this stage, we are forced to search large regions of parameter space.

The experiments producing these traces had 14 such snippets and the train and test set were the same. The recognition accuracy (RA) for the experiments for column 2 and 3 were in the mid 90 percent region. For the first column experiment, it was about 70%. There are numerous parameters involved here, so we cannot draw conclusions yet. For one thing, one general parameter tuning principle should be that if we widen the mac activation criteria at level J (which will result in level J macs becoming active more often and therefore coding more events...and therefore using up their representational capacity faster), then we should want to tighten the criteria at level J+1, and vice versa. With respect to this montage, we would need to cross these conditions with variations in the mac activation criteria at level V2, and with V3. It's really a combinatorial space that we need to explore. We are only just beginning to do this.

The figure below shows the actual spatiotemporal, hierarchical memory traces that occured in the test trial of row 4 column 1 above. Black cells are correct activations, red are incorrect, green are incorrect non-activations, and gray indicates that a cell was active on the prior frame. Gray/black split means it was active on prior frame and is currently active.