An example of such a discretizing method that is often used in reinforcement learning is tile coding [Watkins, 1989,Lin and Kim, 1991,Sutton, 1996,Santamaria et al., 1997,Sutton and Barto, 1998], which is based on the Cerebellar Model Articulation Controller (CMAC) structure proposed by [Albus, 1971,Albus, 1975]. In tile coding, the state space is divided into a number of disjoint sets. These sets are commonly called tiles in this context. For instance, one could define hypercubes such that each hypercube is defined by a Cartesian product , where is the lower bound of hypercube in state dimension and is the corresponding upper bound. Then, a feature corresponding to is equal to one when and zero otherwise.
The idea behind tile coding is to use multiple non-overlapping tilings. If a single tiling contains tiles, one could use such tilings to obtain a feature vector of dimension . In each state, precisely of these features are then equal to one, while the others are equal to zero. An example with tilings and features is shown on the left in Figure 1. The tilings do not have to be homogeneous. The right picture in Figure 1 shows a non-homogeneous example with tilings and features.
When features are active for each state, up to different situations can theoretically be represented with features. This contrasts with the naive approach where only one feature is active for each state, which would only be able to represent different situations with the same number of features. In practice, the upper bound of will rarely be obtained, since many combinations of active features will not be possible. In both examples in Figure 1, the number of different possible feature vectors is indeed larger than the length of the feature vector and smaller than the theoretical upper bound: and .