Saturday, 25 September 2010


Context-adaptive binary arithmetic coding is (very roughly) a lossless compression method that relies on knowlege of the current state of decoding to make probabalistic decisions about what value for a certain element is about to come next. For example, if the previous ten macroblocks were all interlaced then there's a good chance the next one will be too, and it will cost more bits to signal that the next macroblock isn't interlaced.

This mostly already worked for mbaff due to the way the code was designed. The most major part was understanding that the decoder might not have recieved field decoding flag by the time the code to determine whether a macroblock is skipped runs. The skip code relied on neighbour calculation using the macroblock's field/frame flag, but the decoder didn't yet know this. Thus x264 needed to store what the decoder thinks the field decoding flag is and update it only when it's actually written to the bitstream.

Motion vectors

This I don't understand. Michael Niedermayer (the writer of ffmpeg's h264 decoder) sums up the situation quite succinctly, quoting from libavcodec/h264.h:

    /* Wow, what a mess, why didn't they simplify the interlacing & intra
     * stuff, I can't imagine that these complex rules are worth it. */

First of all, motion vectors from fields to other fields and frames to other frames aren't changed. However, when predicting from a frame to a field, the fact that a field has half the vertical resolution of the frame means that the vertical component of the motion vector will need to be halved to reference the same spatial areas. Similarly the motion vector from a field to a frame must be doubled. This is easily achieved with a few bit shifts (good old integer arithmetic!) as so:

        if( h->sh.b_mbaff )
#define MAP_MVS\
                MAP_F2F(mv, ref, x264_scan8[0] - 1 - 1*8, h->mb.i_mb_topleft_xy)\
                MAP_F2F(mv, ref, x264_scan8[0] + 0 - 1*8, top)\
                MAP_F2F(mv, ref, x264_scan8[0] + 1 - 1*8, top)\
                MAP_F2F(mv, ref, x264_scan8[0] + 2 - 1*8, top)\
                MAP_F2F(mv, ref, x264_scan8[0] + 3 - 1*8, top)\
                MAP_F2F(mv, ref, x264_scan8[0] + 4 - 1*8, h->mb.i_mb_topright_xy)\
                MAP_F2F(mv, ref, x264_scan8[0] - 1 + 0*8, left[0])\
                MAP_F2F(mv, ref, x264_scan8[0] - 1 + 1*8, left[0])\
                MAP_F2F(mv, ref, x264_scan8[0] - 1 + 2*8, left[1])\
                MAP_F2F(mv, ref, x264_scan8[0] - 1 + 3*8, left[1])\
                MAP_F2F(topright_mv, topright_ref, 0, left[0])\
                MAP_F2F(topright_mv, topright_ref, 1, left[0])\
                MAP_F2F(topright_mv, topright_ref, 2, left[1])

            if( h->mb.b_interlaced )
#define MAP_F2F(varmv, varref, index, macroblock)\
                if( h->mb.cache.varref[l][index] >= 0 && !h->mb.field[macroblock] )\
                    h->mb.cache.varref[l][index] <<= 1;\
                    h->mb.cache.varmv[l][index][1] /= 2;\
                    h->mb.cache.mvd[l][index][1] >>= 1;\
#undef MAP_F2F
#define MAP_F2F(varmv, varref, index, macroblock)\
                if( h->mb.cache.varref[l][index] >= 0 && h->mb.field[macroblock] )\
                    h->mb.cache.varref[l][index] >>= 1;\
                    h->mb.cache.varmv[l][index][1] <<= 1;\
                    h->mb.cache.mvd[l][index][1] <<= 1;\
#undef MAP_F2F

Here you can see I use a macro to simplify the repeated code, this is actually inspired by ffmpeg so thanks for the good ideas guys!

Diagonal motion vectors are bizarre to say the least. Top left motion vectors, in some situations, are required to come from the middle of the macroblock as opposed to the usual bottom right location. This was the source of much frustration for a while. Top right motion vectors, for certain partitions, and when the top right is unavailable, are taken from strange places in the top left macroblock.

Intra coding

When decoding video it is common that the next macroblock to be decoded will look somewhat like the surrounding macroblocks. H.264 decodes macroblocks in so called ``scan'' order, i.e. do the first row, left to right, then move on to the next row and repeat. Imagine a scene with a large section of sky covering the top half of the frame. A good guess to the content of the next macroblock might be the shade of blue that is directly surrounding it (from the already decoded macroblocks above and to the left). So we guess that the macroblock we're encoding is blue, then code the difference between that and the actual colour values. If our macroblock is close to the prediction, which it usually is, then we don't need to code much at all. This is for the most part how intra coding works. The name comes from the fact that no data outside of the current picture is accessed, as opposed to inter coding that looks at the previous and future frames to help compression.

MBAFF changes which neighbours need to be referenced when predicting the next macroblock. This means that it's not usually just the top and left macroblocks that the pixels for intra prediction come from. The rules for which macroblock to reference are fairly convoluted, but do make sense from a compression point of view. I have several pages of diagrams trying to make sense of it!