# Journal of Simon Horlick

## Tuesday, 28 June 2011

### Setting up keys for GPG

## Tuesday, 16 November 2010

### How to set up git hosting on COMPSOC

If you already have a local git repository that you want to host on compsoc, then you can skip this first step.

Initialise a new git repository in the directory of the project you want to host (this will be the copy you have at home, for example)

[simon@simon ~]$ cd ~/project/foo/ [simon@simon foo (master #)]$ git init Initialized empty Git repository in /home/simon/project/foo/.git/

Log in to compsoc, and create a new directory where you want your repository stored and initialise a bare git repository there.

[simon@forgetful ~]$ mkdir foo.git [simon@forgetful ~]$ cd foo.git/ [simon@forgetful foo.git]$ git --bare init Initialized empty Git repository in /data/home/simon/foo.git/ [simon@forgetful foo.git (BARE:master)]$

On your home machine you will need to add compsoc as a remote, here I've referred to compsoc as "origin".

[simon@simon foo (master #)]$ git remote add origin simon@forgetful.compsoc.man.ac.uk:foo.git

Do some editing, commit some changes locally (i.e. at home).

[simon@simon foo (master #)]$ vim hello_world.c [simon@simon foo (master #)]$ git add hello_world.c [simon@simon foo (master #)]$ git commit -m "Added test file" [master (root-commit) 76e25d7] Added test file 1 files changed, 7 insertions(+), 0 deletions(-) create mode 100644 hello_world.c

Use git push to push your local changes to compsoc.

[simon@simon foo (master)]$ git push origin master Counting objects: 3, done. Delta compression using up to 4 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 302 bytes, done. Total 3 (delta 0), reused 0 (delta 0) To simon@forgetful.compsoc.man.ac.uk:foo.git * [new branch] master -> master

If you want to put your git repository on the internet, either copy or symlink the foo.git directory to be in your public_html directory and run git update-server-info.

[simon@forgetful public_html]$ cd ~/public_html/ [simon@forgetful public_html]$ ln -s ../foo.git/ [simon@forgetful ~]$ cd foo.git/ [simon@forgetful foo.git (BARE:master)]$ git --bare update-server-info [simon@forgetful foo.git (BARE:master)]$ chmod a+x hooks/post-update

Now anyone can come along and clone your git repository.

[simon@simon ~]$ cd /tmp [simon@simon tmp]$ git clone http://compsoc.man.ac.uk/~simon/foo.git Cloning into foo... [simon@simon tmp]$ cd foo/ [simon@simon foo (master)]$ cat hello_world.c #includeint main() { printf("hello, world!\n"); return 0; }

Thanks for reading!

## Saturday, 25 September 2010

### CABAC

Context-adaptive binary arithmetic coding is (very roughly) a lossless compression method that relies on knowlege of the current state of decoding to make probabalistic decisions about what value for a certain element is about to come next. For example, if the previous ten macroblocks were all interlaced then there's a good chance the next one will be too, and it will cost more bits to signal that the next macroblock isn't interlaced.

This mostly already worked for mbaff due to the way the code was designed. The most major part was understanding that the decoder might not have recieved field decoding flag by the time the code to determine whether a macroblock is skipped runs. The skip code relied on neighbour calculation using the macroblock's field/frame flag, but the decoder didn't yet know this. Thus x264 needed to store what the decoder thinks the field decoding flag is and update it only when it's actually written to the bitstream.

### Motion vectors

This I don't understand. Michael Niedermayer (the writer of ffmpeg's h264 decoder) sums up the situation quite succinctly, quoting from *libavcodec/h264.h*:

/* Wow, what a mess, why didn't they simplify the interlacing & intra * stuff, I can't imagine that these complex rules are worth it. */

First of all, motion vectors from fields to other fields and frames to other frames aren't changed. However, when predicting from a frame to a field, the fact that a field has half the vertical resolution of the frame means that the vertical component of the motion vector will need to be halved to reference the same spatial areas. Similarly the motion vector from a field to a frame must be doubled. This is easily achieved with a few bit shifts (good old integer arithmetic!) as so:

if( h->sh.b_mbaff ) { #define MAP_MVS\ MAP_F2F(mv, ref, x264_scan8[0] - 1 - 1*8, h->mb.i_mb_topleft_xy)\ MAP_F2F(mv, ref, x264_scan8[0] + 0 - 1*8, top)\ MAP_F2F(mv, ref, x264_scan8[0] + 1 - 1*8, top)\ MAP_F2F(mv, ref, x264_scan8[0] + 2 - 1*8, top)\ MAP_F2F(mv, ref, x264_scan8[0] + 3 - 1*8, top)\ MAP_F2F(mv, ref, x264_scan8[0] + 4 - 1*8, h->mb.i_mb_topright_xy)\ MAP_F2F(mv, ref, x264_scan8[0] - 1 + 0*8, left[0])\ MAP_F2F(mv, ref, x264_scan8[0] - 1 + 1*8, left[0])\ MAP_F2F(mv, ref, x264_scan8[0] - 1 + 2*8, left[1])\ MAP_F2F(mv, ref, x264_scan8[0] - 1 + 3*8, left[1])\ MAP_F2F(topright_mv, topright_ref, 0, left[0])\ MAP_F2F(topright_mv, topright_ref, 1, left[0])\ MAP_F2F(topright_mv, topright_ref, 2, left[1]) if( h->mb.b_interlaced ) { #define MAP_F2F(varmv, varref, index, macroblock)\ if( h->mb.cache.varref[l][index] >= 0 && !h->mb.field[macroblock] )\ {\ h->mb.cache.varref[l][index] <<= 1;\ h->mb.cache.varmv[l][index][1] /= 2;\ h->mb.cache.mvd[l][index][1] >>= 1;\ } MAP_MVS #undef MAP_F2F } else { #define MAP_F2F(varmv, varref, index, macroblock)\ if( h->mb.cache.varref[l][index] >= 0 && h->mb.field[macroblock] )\ {\ h->mb.cache.varref[l][index] >>= 1;\ h->mb.cache.varmv[l][index][1] <<= 1;\ h->mb.cache.mvd[l][index][1] <<= 1;\ } MAP_MVS #undef MAP_F2F }

Here you can see I use a macro to simplify the repeated code, this is actually inspired by ffmpeg so thanks for the good ideas guys!

Diagonal motion vectors are bizarre to say the least. Top left motion vectors, in some situations, are required to come from the middle of the macroblock as opposed to the usual bottom right location. This was the source of much frustration for a while. Top right motion vectors, for certain partitions, and when the top right is unavailable, are taken from strange places in the top left macroblock.

### Intra coding

When decoding video it is common that the next macroblock to be decoded will look somewhat like the surrounding macroblocks. H.264 decodes macroblocks in so called ``scan'' order, i.e. do the first row, left to right, then move on to the next row and repeat. Imagine a scene with a large section of sky covering the top half of the frame. A good guess to the content of the next macroblock might be the shade of blue that is directly surrounding it (from the already decoded macroblocks above and to the left). So we guess that the macroblock we're encoding is blue, then code the difference between that and the actual colour values. If our macroblock is close to the prediction, which it usually is, then we don't need to code much at all. This is for the most part how intra coding works. The name comes from the fact that no data outside of the current picture is accessed, as opposed to inter coding that looks at the previous and future frames to help compression.

MBAFF changes which neighbours need to be referenced when predicting the next macroblock. This means that it's not usually just the top and left macroblocks that the pixels for intra prediction come from. The rules for which macroblock to reference are fairly convoluted, but do make sense from a compression point of view. I have several pages of diagrams trying to make sense of it!

## Monday, 26 April 2010

### Diary of an x264 (mbaff) developer

I've just found out officially that I've been accepted into the Summer of Code program! This means that following university exams I'll be working developing a new interlaced encoding method for my favourite video encoder: x264.

Interlacing is the process of taking two consecutive pictures (called fields) of video that are often captured slightly after one another and merging them together into a single frame. Many professional video cameras do this in their usual operation. The up-side of this is you can have the high frame rate that comes with presenting two pictures in the space of one, but the downside is that each field has half the vertical resolution of the equivalent frame.

Macroblock adaptive frame-field interlacing, or mbaff for short, is a method of encoding interlaced video that makes a decision on whether or not each individual macroblock should be interlaced. The traditional method, and that currently implemented in x264, is to keep each frame fully interlaced. However, it is preferable to encode a macroblock with low movement in it as progressive, as this is more efficient. MBAFF interlacing is one of the few major features that most H.264 video encoders currently support that's missing in x264.

I hope to see significant improvements in the efficiency of interlaced encoding when this project is completed. Until then, I'll post regular updates of my progress.

## Monday, 1 March 2010

### Google Code Jam 2009 1C: Bribe the Prisoners

**Problem**

The problem statement describes a prison with adjacent cells and a set of prisoners to be released. The problem is essentially to find the ordering of prisoners to be released such that the total number of occupied adjacent cells is minimal (and thus less coins need to be handed out).

**Solution**

Let be the set of cells in which a prisoner to be released resides. Assume is the optimal prisoner to be released first. The first thing we need to do is to give every prisoner left a gold coin as they all find out about leaving. We are now left with two sub-arrays and where we need to calculate the next best prisoner to be released in each array. For the sub-array we can iterate over the prisoners to be released residing in this range and find for each one the minimum number of coins if they were to leave and split the sub-array (making two more similar problems).

Note: The minimum number of coins for a sub-array [a,b] can be memoised as the solutions to the sub-problems overlap.

Here is the solution code provided by Google with a couple of additional comments:

int p[200]; // prisoners to be released. map<pair<int, int>, int> dp; // Finds the minimum amount of gold needed, // if we only consider the cells from a to b, inclusive. int Solve(int a, int b) { // First, look up the cache to see if the // result is computed before. pair<int, int> pr(a, b); if(mp.find(pr) != mp.end()) return mp[pr]; // Start the computation. int r = 0; // For each prisoner to be released in the range a<=p_i<=b for(int i=0; i<Q; i++) { if(p[i] >= a && p[i] <= b) { // b-a prisoners find out about p[i] leaving and we are left with two // sub-arrays that need to be solved in a similar fashion. int tmp = (b-a) + Solve(a, p[i]-1) + Solve(p[i]+1, b); // Is the number of coins for the release of p[i] minimal? if (!r || tmp<r) r=tmp; } } mp[pr]=r; return r; }

## Wednesday, 24 February 2010

### Dijkstra

#include <algorithm> #include <iostream> #include <limits> #include <list> #include <queue> #include <vector> using namespace std; // Implementation of Dijkstra's algorithm, using the example graph from p596 of // Introduction to Algorithms. As STL's priority_queue does not contain a // decrease-key function, the vertex is simply added multiple times and only // the lowest distance (first encountered on queue) is considered. Note that // the number of nodes in the graph is fixed to keep code simple. typedef pair<int,int> pii; const int max_nodes = 5; int distances[max_nodes]; bool visited[max_nodes]; void Dijkstra(list<pii> Adj[max_nodes], int s) { for(int i=0;i<max_nodes;++i) { distances[i] = numeric_limits<int>::max(); visited[i] = false; } distances[s] = 0; priority_queue<pii, vector<pii>, greater<pii> > Q;// pair(distance, vertex) Q.push(make_pair(distances[s],s)); while(!Q.empty()) { int u = Q.top().second; Q.pop(); // If the vertex has been visited (by a shorter path) while it was in // the queue ignore it. if(visited[u]) continue; visited[u]=true; // Relax each adjacent vertex that hasn't already been visisted. If it // has been visited it will have been by a shorter path. for(list<pii>::iterator v=Adj[u].begin(); v != Adj[u].end(); ++v) { // If d(s,u) + d(u,v) < d(s,v) update d(s,v). if(!visited[v->first] && distances[v->first] > distances[u] + v->second) { distances[v->first] = distances[u] + v->second; Q.push(make_pair(distances[v->first],v->first)); } } } } int main() { list<pii> Adj[max_nodes]; Adj[0].push_back(make_pair(1,10)); // (s,t) Adj[0].push_back(make_pair(3,5)); // (s,y) Adj[1].push_back(make_pair(2,1)); // (t,x) Adj[1].push_back(make_pair(3,2)); // (t,y) Adj[2].push_back(make_pair(4,4)); // (x,z) Adj[3].push_back(make_pair(1,3)); // (y,t) Adj[3].push_back(make_pair(2,9)); // (y,x) Adj[3].push_back(make_pair(4,2)); // (y,z) Adj[4].push_back(make_pair(0,7)); // (z,s) Adj[4].push_back(make_pair(2,6)); // (z,x) Dijkstra(Adj, 0); for(int i=0;i<max_nodes;++i) { cout << "distances["<<i<<"] = " << distances[i] << ".\n"; } }

## Tuesday, 29 December 2009

### Group theory

This is one of the cases where I've simply got bored of solving a problem the conventional way and decided it could be a much better use of my time writing a program rather than pressing multiply on my calculator repeatedly.

Here's a simple python program to find the order of an element of a group modulo some number (the binary operation being multiplication):

#!/usr/bin/env python import sys def element_order_mul(n,g): r=g o=1 while r%n != 1: print "not",r r=r*g o=o+1 return o # calculates the element order under multiplication mod n. # takes: n, element n=int(sys.argv[1]) g=int(sys.argv[2]) o=element_order_mul(n,g) print g,"^",o,"is congruent to 1 mod",n

[simon@simon ~]$ ./order.py 9 4 not 4 not 16 4 ^ 3 is congruent to 1 mod 9

## Saturday, 9 May 2009

### Trig identities

I always have trouble remembering these, here's a simple derivation using Euler's formula:

Note that the imaginary and real parts are separate so the line , which is real, corresponds to on the top line.

## Friday, 17 April 2009

### Terrain 2

The algorithm I will implement is based on a restrictive quad tree approach. All this means is that for each node in the tree, each of its directly adjacent neighbours will need to be only one LOD level different. The reason for this is that the index buffers can be computed in advance and simply re-used for the 16 (2^4) different cases. Each of north, east, south and west can either be the same level or one less (if a neighbour is of a higher level then we would be a lower level to it and hence we would join two seams of the same detail - see figure 1).

In regard to calculating the vertices, I'm thinking it might be a good idea to separate the load across cores (everyone uses multicore now right?). The general idea is to:-

- Traverse the tree splitting nodes as usual.
- Create child nodes, update neighbour pointers, etc. This may be recursive to keep the quadtree restricted to one LOD difference (note: pain, if not impossible, to parallelise).
- Add a pointer to each node that needs its vertices calculated to a list and defer actual calculation until after the tree has been processed.

- Simply dispatch calculation jobs to threads, these jobs are independent and nicely parallelisable.
- Vertex buffers also need to be created to please direct3d, shouldn't be too hard to squeeze them in somewhere.

Geomorphing is causing me to think, I can't find much on the internet to base my implementation from. I have never done anything to do with vertex interpolation before so this could be a horrible way of doing it. I'm thinking of sending the vertex shader: x, y, z, base_y where base_y is the value of y at the previous LOD level. This can be simply calculated by either taking the value as-is (for when the point coincides with the same point in the last LOD) or interpolating between two values from the last LOD (i.e. where there was an edge on a triangle). It's then just a case of interpolating between base_y and y for each vertex using a per frame shader constant.

Results soon.

## Thursday, 16 April 2009

### Terrain

In March last year I made a terrain renderer that used noise functions to generate a reasonable looking mountain range. Since then I've learned a lot and I would like to re-visit it but this time take into account:

- Normal maps
- Texturing
- Interpolation between LOD levels
- Decent lighting
- Post processing (HDR?)
- Lots of small optimisations

Here is the old terrain renderer on youtube:

## Saturday, 21 February 2009

### Quaternions

Quaternions, discovered in 1843 by Hamilton, are the extension of complex numbers to four dimensions. As it turns out, they are very useful in computer graphics and can represent any rotation in 3D space in a compact and computationally cheap form. Another advantage of quaternions is that they avoid gimbal lock as they can describe a rotation in one operation as opposed to Euler angles that combine yaw, pitch and roll in separate operations.

A quaternion can be written as \hat{\mathbf{q}}=iq_x+jq_y+kq_z+q_w=\mathbf{q}_v+q_w where i,j,k are all different square roots of -1 such that, \begin{align*}ij & = k, & \qquad ji & = -k, \\jk & = i, & kj & = -i, \\ki & = j, & ik & = -j.\end{align*} The vector \hat{\mathbf{q}}_v is closely related to the axis of rotation and the angle of rotation affects all parts of the quaternion as we shall see below.

A unit quaternion, \hat{\mathbf{q}}, can represent a rotation of 2\phi radians around an axis, \mathbf{u}, by \hat{\mathbf{q}}=(\sin \phi \mathbf{u}, \cos \phi). To rotate a point or vector, \mathbf{p}, by this quaternion is, \hat{\mathbf{q}}\mathbf{p}\hat{\mathbf{q}}^{-1}. It can be shown that the inverse, \hat{\mathbf{q}}^{-1}, for a unit quaternion, \hat{\mathbf{q}}, is actually the conjugate where this is defined as, \hat{\mathbf{q}}^{*} = (-\mathbf{q}_v, q_w).

The product of two quaternions is determined by the product of the basis elements and the distributive law, \begin{align*}\hat{\mathbf{q}}\hat{\mathbf{r}} & = (iq_x+jq_y+kq_z+q_w)(ir_x+jr_y+kr_z+r_w)\\& = i(q_y r_z-q_z r_y+r_w q_x+q_w r_x)\\& \quad + j(q_z r_x-q_x r_z+r_w q_y+q_w r_y)\\& \quad + k(q_x r_y-q_y r_x+r_w q_z+q_w r_z)\\& \quad + q_w r_w-q_x r_x-q_y r_y-q_z r_z\\& = (\mathbf{q}_v \times \mathbf{r}_v + r_w \mathbf{q}_v + q_w \mathbf{r}_v,q_w r_w - \mathbf{q}_v \cdot \mathbf{r}_v).\end{align*} The transformation expressed by a unit quaternion can also, and more usefully in computer graphics, be represented by the matrix, M =\begin{pmatrix} 1-2(q_y^2+q_z^2) & 2(q_xq_y-q_wq_z) & 2(q_xq_z+q_wq_y) & 0 \\ 2(q_xq_y+q_wq_z) & 1-2(q_x^2+q_z^2) & 2(q_yq_z-q_wq_x) & 0 \\ 2(q_xq_z-q_wq_y) & 2(q_yq_z+q_wq_x) & 1-2(q_x^2+q_y^2) & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}

## Monday, 9 February 2009

### Text rendering

I was looking for a good library for text rendering. Something that would support truetype faces, antialiasing and kerning and be relatively easy to use. I noticed freetype in the credits to Braid (which I really need to play when the PC version appears - the demo is awesome). So after a bit of research, I found it had everything I was looking for and decided to implement it in my renderer.

Most of the time I spent in implementing this was actually writing the parts of the renderer I needed. I used the approach of using freetype to render a string to a texture and then rendering a quad with this texture. However, I feel uneasy about this as it makes the assumption that the strings I'm rendering don't change much (how fast is freetype at rendering?) and that it will probably take up more texture memory than storing individual glyphs as textures. This is something I'd like to investigate more but at the moment don't have the time. Anyway here's something to look at: