Sunday, 27 October 2013

GPU Accelerated Camera Processing On The Raspberry Pi


Over the past few days I've been hacking away at the camera module for the raspberry pi. I made a lot of headway creating a simple and nice api for the camera which is detailed here:

However I wanted to get some real performance out of it and that means GPU TIME! Before I start explaining things, the code is here:

Here's a picture:

And a video of the whole thing (with description of what's going on!)

The api I designed could use mmal for doing colour conversion and downsampling the image but it was pretty slow and got in the way of opengl. However, I deliberately allowed the user to ask the api for the raw YUV camera data. This is provided as a single block of memory, but really contains 3 separate grey scale textures - one containing the 'luminosity' (Y) and another 2 that contain information to specify the colour of a pixel:

I make a few tweaks to my code to generate these 3 textures:

        //lock the chosen frame buffer, and copy it into textures
            const uint8_t* data = (const uint8_t*)frame_data;
            int ypitch = MAIN_TEXTURE_WIDTH;
            int ysize = ypitch*MAIN_TEXTURE_HEIGHT;
            int uvpitch = MAIN_TEXTURE_WIDTH/2;
            int uvsize = uvpitch*MAIN_TEXTURE_HEIGHT/2;
            int upos = ysize;
            int vpos = upos+uvsize;

And write a very simple shader to convert from yuv to rgb:

varying vec2 tcoord;
uniform sampler2D tex0;
uniform sampler2D tex1;
uniform sampler2D tex2;
void main(void) 
    float y = texture2D(tex0,tcoord).r;
    float u = texture2D(tex1,tcoord).r;
    float v = texture2D(tex2,tcoord).r;

    vec4 res;
    res.r = (y + (1.370705 * (v-0.5)));
    res.g = (y - (0.698001 * (v-0.5)) - (0.337633 * (u-0.5)));
    res.b = (y + (1.732446 * (u-0.5)));
    res.a = 1.0;

    gl_FragColor = clamp(res,vec4(0),vec4(1));

Now I simply run the shader to read in the 3 yuv textures, and write out an rgb one, ending up with this little number:

Good hat yes? Well, hat aside, the next thing to do is provide downsamples so we can run image processing algorithms at different levels. I don't even need a new shader for that, as I can just run the earlier shader, but aiming it at successively lower resolution textures. Here's the lowest one now:

The crucial thing is that in opengl you can create a texture, and then tell it to also double as a frame buffer using the following code:

bool GfxTexture::GenerateFrameBuffer()
    //Create and bind a new frame buffer

    //point it at the texture (the id passed in is the Id assigned when we created the open gl texture)

    return true;

Once you have a texture as a frame buffer you can set it to be the target to render to (don't forget to set the viewport as well):

        glViewport ( 0, 0, render_target->GetWidth(), render_target->GetHeight() );

And also use the read pixels function to read the results back to cpu (which I do here to save to disk using the lodepng library):

void GfxTexture::Save(const char* fname)
    void* image = malloc(Width*Height*4);
    glReadPixels(0,0,Width,Height,IsRGBA ? GL_RGBA : GL_LUMINANCE, GL_UNSIGNED_BYTE, image);

    unsigned error = lodepng::encode(fname, (const unsigned char*)image, Width, Height, IsRGBA ? LCT_RGBA : LCT_GREY);
        printf("error: %d\n",error);


These features give us a massive range of capability. We can now chain together various shaders to apply multiple levels of filtering, and once the gpu is finished with them the data can read to the cpu and fed into image processing applications such as opencv. This is really handy, as algorithms such as object detection often have to do costly filtering before they can operate. Using the gpu as above we can avoid the cpu needing to do the work.

Thus far I've written the following filters:

  • Gaussian blur
  • Dilate
  • Erode
  • Median
  • Threshold
  • Sobel
Here's a few of them in action:


p.s. my only annoyance right now is that I still have to go through the cpu to get my data from mmal and into opengl. If anyone knows a way of getting from mmal straight to opengl that'd be super awesome!

pp.s. right at the end, here's a tiny shameless advert for my new venture - If you like my writing, check out the dev blog for regular updates on my first proper indie title!


  1. Good work. A direct MMAL->texture path is something we'd like to expose - I'll let you know when it's possible.

  2. Thanks Dom - I'll look forward to it. I'm thinking of writing one that goes direct to OMX, as that does have an egl_render component which does the job - it's just not exposed in mmal. One thing to bear in mind for the direct to texture path is how you'd handle the yuv format. You could either supply 3 textures, force an rgba conversion then just supply 1 texture, or just supply the raw yuv in a slightly funky texture and let the user figure it out in a shader :)

  3. Really great stuff! Thanks for providing the code, I'm trying to understand it now. Question for you: is there an easy way to generate a number in real-time that is proportional to how "in-focus" the scene is? Maybe compute an overall peak-peak magnitude after a high-pass or edge detecting filter? Could that be done all on GPU, or would it need CPU to do generate RMS or peak-peak magnitude value?

  4. Hmmm - well you've got 2 problems there. First, is there an algorithm you can think of that, for a given pixel tells you how 'in focus' a small region around it is? If you can do that then you can calculate a per pixel value from 0 to 1 that indicates focus. Then you can downsample that area to average out the focus level across pixels and get down to a low enough texture size for the cpu to process.

  5. One of the better algorithms for calculating a focus value is to calculate the statistical variance of the image - essentially a contrast measurement.

    i.e. variance = sum((intensity(x,y) - mean_intensity)^2) / (height*width)

    Maximising this value gets close to the proper focus position.

  6. Hmmm - well it'd be tricky to do that exact algorithm on a gpu, but you could probably get close. You could calculate the mean of a quadrant of pixels and output it to a smaller texture. Then in a 2nd phase, take that sum for a quadrant of pixels, multiply it by 4 for each one, add them together and divide by 16. Then in a 3rd phase multiply by 16, sum, then divide by 64 etc. A similar process could probably be done for the whole equation. That'd give you an approximation that got less accurate as you did further downsamples, but once the image was of a small enough size (say down from 1024x1024 to 128x128) you could then copy it to cpu and do the remaining calculations in more detail. I'd imagine that would get you a solid 30hz for a hi res image. The main issue right now is that there's a big cpu overhead in getting the camera data to gpu, but it looks like we'll have a solution for that soon enough, making relying on the cpu to finish off the work more feasible.

  7. I'm not an expert on the Pi GPU but chris the algorithm you're describing to implement pelrun's variance calc is indeed the right one for all the GPU's I've ever used. you wouldn't need to send very much data back to the CPU - you repeatedly halve the resolution of the texture, entirely using the GPU, until you get to a tiny size (even, 1x1 pixels) that is read back by the CPU.
    this is known most generally as an image pyramid
    re-describing it in relation to pelrun's equation:
    you repeatedly run a shader that averages 2x2 pixel blocks (or 3x3 or 4x4), summing x and x^2 for each pixel. ideally you want to use floats or >8 bit integer values, but the resolutions are low so memory isn't the issue. (does the pi gpu do >8 bit integer textures?). the literature nearly always downsamples by a factor of 2 each time in the pyramid, but on some gpus it's a better balance of parallelism vs passes to average bigger blocks eg 4x4. Anyway, assuming 2x2, you only need log2(resolution) passes - eg input is 1024x1024, then you get 512x512, 256x256, 128x128,.... down to 1x1.
    Pelrun writes variance as var=E(X-mu^2) where mu is the mean, but you can also write var as var=E(X^2)-E(X)^2
    (btw by E(f) I mean the expected value of f, ie the average of f over the whole image - so E(X) is the average of all pixels, E(X^2) is the average of the square of all pixels)
    with that version, you can use a pyramid to get you E(X) and E(X^2) together, the CPU reads back 2 floats, subtracts them, and voila, you have variance.
    I'm rambling :) sorry.

  8. Hey Alex

    I don't think the pi supports floating point render targets unfortunately, but I was thinking, given its technically grey scale data, I could treat each rgba value as a 32 bit piece of data by encoding a high resolution value as something akin to:
    val = (r+g*2+b*4+a*8)/15
    (assuming val is always between 0 and 1).
    Then to go back you'd multiply val by 15, then break it down into powers of 2. Somethng like that anyhoo :)

  9. Ooh, if only more rambles were that informative!

  10. As far as I can tell you're right about not having floating point textures :P Packing/unpacking floats into the RGBA8 ints seems to be possible, though - I found the following with some useful code:

  11. $ ./picam
    picam: /home/pi/picam/graphics.cpp:224: bool GfxShader::LoadVertexShader(const char*): Assertion `f' failed.

    not my area of expertise, and most likely I missed a step?


  12. fyi, for anyone else who hits this issue, Chad had accidentally built picam in a sub directory (in has case, in picam/build), so the executable file wasn't in the same folder as the shaders. The assertion is indicating the shader wasn't found.

  13. Direct MMAL->texture is now supported. See:

    1. Chris:
      I'm a big fan of your work. I thinks this is agreat and it is very useful for many people.
      Is there a chance for you to update this work with this new mmal->texture path?
      That would be what I'm try to achieve.

    2. Hi - wow - long time replying to this - apologies - been very busy. That's certainly on my list but my time to work on it varies a lot and my job has been taking priority lately :( when I get back to it, direct to texture path will be my first job.

    3. Hi Chris, I've been reading alot of your posts and comments about the raspi camera, really great stuff, thanks alot!
      Did you ever implement the new mmal->texture path? And did you ever put your code on github?

      I'm going to be experimenting with your code on a raspi 3 with the new v2 noir camera over the next month or so and it would be awesome to have the latest version of your code as a starting point and maybe even be able to contribute back to github.

      Cheers, Lars

  14. Hi Chris,

    I'm really glad that you have made records of your experiments and that you are kind enough to share it with the rest of us!

    I am also looking to do some image processing onboard the Raspberry Pi and given that you are much more familiar with the capabilities of MMAL than I am, I wonder if you could comment on whether it's realistic to be able to retrieve YUV for uncompressed luminance buffer, at, say, 720p, run various image processing code over that (and use the results immediately for external purposes), then "render" some debug lines or shapes (basically draw some boxes over features that were detected that I'm interested in), and then, using this frame buffer that's been drawn over, send to the H.264 encoder for output to a file or network stream?

    The way that I have been starting to investigate how to do this is reading RaspiVid.c, and I got to this point:

    status = connect_ports(camera_video_port, encoder_input_port, &state.encoder_connection);

    It definitely looks like this is where your API (which I am perusing next) comes in! I would just love to know if you know off the top of your head if I can leverage the capabilities of what you're doing here (using the GPU to e.g. get lower mip levels without having to go all the way to copying to an OpenGL frame buffer texture, although -- excitingly -- we now can do that efficiently as well, so it seems) while being able to pump a result into the H.264 encoder?

    Fantastic work!

    1. Hi. It'd definitely be possible to do all that, although off my head I don't know exactly how you'd do it. You can use the mmal system to generate h264 videos as shown by the rasppi vid demo. You could definitely take the output of my gpu demo, render stuff over the top of it, then I beleive there's a path using mmal to take an opengl texture and copy it to video. If that's not there, the slow way would be to copy it to cpu, then pump the resulting data back into an mmal encoder in mmal. Wouldn't run at a great frame rate but it'd be possible. The tricky thing is performance. Memory bandwidth on the pi means this pushes the frame rate to the limit already, although using my gpu downsampling to get a lower res texture before reading it to cpu would certainly help.

    2. Yeah it definitely would push this hardware to the limit, and that makes me uneasy as well.

      One serviceable option is to simply do rendering as usual and skip the entire encoding bit, and I can take it from there out of the HDMI or Composite via external means, which may include analog radio transmission, as that has low latency.

  15. By the way... I don't know if you guys have tried this... but I've got my RasPi hooked up to a monitor via HDMI so I can bring the camera up close and feed it back in through the picam_gpu grid demo. This is SUPER trippy. I will even liken it to a portal to the netherworld, you can get some amazing visuals that evolve at whatever framerate the Pi can dish out, transcending the realm of that which is purely digital or analogue.

    This is tremendously powerful stuff.

  16. This comment has been removed by the author.

  17. Hi,
    is it posibble, to run discrete wavelet transformation on Raspi's GPU?

  18. Hi Chris
    I have sent you an email regarding OpenCV implementation with this API. Just to repeat the question here: Is there an eay way to convert the opengl textures used in the API into the OpenCV Mat format so that I can do image manipulation after filtering.
    Thanks for the great API though, it's very helpful

    1. is the closest i've found but it seg faults when i use it (does not crash it just prints it in the console) and the image out is never the same as the texture that went in, i think the issue is that glReadPixels reads from the frame buffer and i don't understand how to be certain what is on the frame buffer. could you post a snipet for your prefered method of converting from gfxtexture to mat.

  19. Thanks, this is awesome!

    I had to make some modification to compile this. They're here in case anyone is interested:

  20. Hi I am newbie on opengl, may I ask you how to hide opengl window in your code?
    Thank you

  21. I'm wanting to do something similar, using a webcam for input and a 128x128 display on the SPI port.
    Ideally with edge-detection filtering in-between, if possible.
    Are the GPUs only able to work on pixels destined for the HDMI output, or can they send data to a secondary frame buffer?

  22. Hey,
    I'm implementing a Harris Corner detection on something similar instead of Sobel and I was wondering if you have any codes for that? I'm getting an error when I try to implement the algorithm instead of the Sobel algorithm, probably due to the size of data I am sending across.