Hexagonal Overlapped Receptive Fields and the Features they Recognize

Figure 1 below shows features represented in the receptive fields (RFs) of a subset of V1 macs across frames of two 20-frame snippets from the KTH database, one of a person boxing (left) and one of a person jogging (right). The 11-level Sparsey network in this case was trained on the depicted boxing snippet and two other 20-frame boxing snippets. Thus, the jogging snippet is a novel test snippet. When the pixels become black, that means that associated V1 mac (i.e., the mac whose RF is the cyan patch of pixels in question) is actively representing the feature. The purpose of Figure 1 is simply to show the kinds of input patterns (features) that V1 macs typically learn. In each frame, one can also see, in light gray, the other pxiels active on the frame that are not encoded by V1 macs whose RFs are shown. But note that many other V1 macs are active on most of these frames; we just do not show their RFs because the image would get too cluttered with overlapping RFs. Thus, on each frame, almost all pixels are actively represented by at least one V1 mac. The set of V1 macs was selected in this figure so that their RFs would not overlap much: the darker-shaded pixels show the few cases of overlap. More significant overlap is apparent in following figures.

Figure 2 shows features represented in the receptive fields (RFs) of a subset of V2 macs for the same two snippets. In this case, the RFs shown correspond to different sets of V2 macs (possibly one or two in common between the two videos). Note the larger RFs of the V2 macs and the slightly more complex, on average, features being represented. The boxer's legs don't move much across these frames; hence there are a lot of quite straight vertical features appearing. The same is true for macs at the other levels as well. If you hunt around in Figures 1 and 2, you can find instances in which a straight or low-curvature edge translates within a single RF over 2-3 successive frames. The code active on such a frame, t, in the relevant mac (codes not shown here) would represent the spatiotemporal feature leading up to t. You can use the video controls to stop and step through the frames more slowly (use ctrl- to step back/forward by one frame at a time).

Figure 3 shows features represented in the receptive fields (RFs) of a subset of V3 macs for the same two snippets. Again, the RFs shown correspond to different sets of V3 macs between the two videos. The RFs are still larger and the features increasingly complex. Although some of the features shown activating here are somewhat regular looking, i.e., reminiscent of the kinds of features that would be hand-engineered a priori, many look much more irregular and arbitrary. Moreover, it's only the features seen activating in the boxer videos that will have been learned by (stored in) the relevant macs (again, because the jogging snippet was not used to train the model used here). In order to see all the features stored in a mac whose RF is depicted, we'd need to present all the snippets that were used to train the model. We are currently developing various functions for showing condensed visualizations of a mac's set of stored features (i.e., its basis). In general, we will see that the stored features (basis vectors), which in general, are spatiotemporal, mostly look irregular and arbitrary. This is because Sparsey uses a single-trial, episodic learning protocol. That is, the inputs that a mac experiences and which activate it are assigned to codes in holistic operations (i.e., single-trial). Once the fraction of afferent bottom-up (U) weights to a mac passes a threshold, learning is frozen in that afferent U projection (cf. critical period). Given that the jogging snippet is novel, the codes activated as it is presented may in general not be identical to any of the mac's stored codes. But for at least some cases, the code activated should have high intersection with a stored code whose corresponding feature is similar to the current input.

Figure 4 shows features represented in the receptive fields (RFs) of a subset of V4 macs for the same two snippets. Again, the RFs shown correspond to different sets of V4 macs between the two videos. The RFs are still larger and the features increasingly complex.

Figure 5 shows features represented in the receptive fields (RFs) of a subset of V5 macs for the same two snippets. Again, the RFs shown correspond to different sets of V5 macs between the two videos. The RFs are still larger and the features increasingly complex.