Context-adaptive binary arithmetic coding is (very roughly) a lossless compression method that relies on knowlege of the current state of decoding to make probabalistic decisions about what value for a certain element is about to come next. For example, if the previous ten macroblocks were all interlaced then there's a good chance the next one will be too, and it will cost more bits to signal that the next macroblock isn't interlaced.
This mostly already worked for mbaff due to the way the code was designed. The most major part was understanding that the decoder might not have recieved field decoding flag by the time the code to determine whether a macroblock is skipped runs. The skip code relied on neighbour calculation using the macroblock's field/frame flag, but the decoder didn't yet know this. Thus x264 needed to store what the decoder thinks the field decoding flag is and update it only when it's actually written to the bitstream.