So, I suggest you start by getting a simple orthographic projection up and loading some images and drawing them onto textured quadsm, with blending and alpha test
PNG is useful because it supports alpha.
Then add transformations to these quads, and possibly a hierarchy.
Then a simple system to script these transformations over time.
The OpenGL part is the ortho porjection, and textures quad animation, that shoudl be straightforward.