My Robot Blog: GPU Accelerated Camera Processing On The Raspberry Pi

Sunday 27 October 2013

GPU Accelerated Camera Processing On The Raspberry Pi

Hallo!

Over the past few days I've been hacking away at the camera module for the raspberry pi. I made a lot of headway creating a simple and nice api for the camera which is detailed here:

http://robotblogging.blogspot.co.uk/2013/10/an-efficient-and-simple-c-api-for.html

However I wanted to get some real performance out of it and that means GPU TIME! Before I start explaining things, the code is here:

http://www.cheerfulprogrammer.com/downloads/picamgpu/picam_gpu.zip

Here's a picture:

And a video of the whole thing (with description of what's going on!)

The api I designed could use mmal for doing colour conversion and downsampling the image but it was pretty slow and got in the way of opengl. However, I deliberately allowed the user to ask the api for the raw YUV camera data. This is provided as a single block of memory, but really contains 3 separate grey scale textures - one containing the 'luminosity' (Y) and another 2 that contain information to specify the colour of a pixel:

I make a few tweaks to my code to generate these 3 textures:

        //lock the chosen frame buffer, and copy it into textures
        {
            const uint8_t* data = (const uint8_t*)frame_data;
            int ypitch = MAIN_TEXTURE_WIDTH;
            int ysize = ypitch*MAIN_TEXTURE_HEIGHT;
            int uvpitch = MAIN_TEXTURE_WIDTH/2;
            int uvsize = uvpitch*MAIN_TEXTURE_HEIGHT/2;
            int upos = ysize;
            int vpos = upos+uvsize;
            ytexture.SetPixels(data);
            utexture.SetPixels(data+upos);
            vtexture.SetPixels(data+vpos);
            cam->EndReadFrame(0);
        }

And write a very simple shader to convert from yuv to rgb:

varying vec2 tcoord;
uniform sampler2D tex0;
uniform sampler2D tex1;
uniform sampler2D tex2;
void main(void) 
{
    float y = texture2D(tex0,tcoord).r;
    float u = texture2D(tex1,tcoord).r;
    float v = texture2D(tex2,tcoord).r;

    vec4 res;
    res.r = (y + (1.370705 * (v-0.5)));
    res.g = (y - (0.698001 * (v-0.5)) - (0.337633 * (u-0.5)));
    res.b = (y + (1.732446 * (u-0.5)));
    res.a = 1.0;

    gl_FragColor = clamp(res,vec4(0),vec4(1));
}

Now I simply run the shader to read in the 3 yuv textures, and write out an rgb one, ending up with this little number:

Good hat yes? Well, hat aside, the next thing to do is provide downsamples so we can run image processing algorithms at different levels. I don't even need a new shader for that, as I can just run the earlier shader, but aiming it at successively lower resolution textures. Here's the lowest one now:

The crucial thing is that in opengl you can create a texture, and then tell it to also double as a frame buffer using the following code:

bool GfxTexture::GenerateFrameBuffer()
{
    //Create and bind a new frame buffer
    glGenFramebuffers(1,&FramebufferId);
    check();
    glBindFramebuffer(GL_FRAMEBUFFER,FramebufferId);
    check();

    //point it at the texture (the id passed in is the Id assigned when we created the open gl texture)
    glFramebufferTexture2D(GL_FRAMEBUFFER,GL_COLOR_ATTACHMENT0,GL_TEXTURE_2D,Id,0);
    check();

    //cleanup
    glBindFramebuffer(GL_FRAMEBUFFER,0);
    check();
    return true;
}

Once you have a texture as a frame buffer you can set it to be the target to render to (don't forget to set the viewport as well):

        glBindFramebuffer(GL_FRAMEBUFFER,render_target->GetFramebufferId());
        glViewport ( 0, 0, render_target->GetWidth(), render_target->GetHeight() );
        check();

And also use the read pixels function to read the results back to cpu (which I do here to save to disk using the lodepng library):

void GfxTexture::Save(const char* fname)
{
    void* image = malloc(Width*Height*4);
    glBindFramebuffer(GL_FRAMEBUFFER,FramebufferId);
    check();
    glReadPixels(0,0,Width,Height,IsRGBA ? GL_RGBA : GL_LUMINANCE, GL_UNSIGNED_BYTE, image);
    check();
    glBindFramebuffer(GL_FRAMEBUFFER,0);

    unsigned error = lodepng::encode(fname, (const unsigned char*)image, Width, Height, IsRGBA ? LCT_RGBA : LCT_GREY);
    if(error) 
        printf("error: %d\n",error);

    free(image);
}

These features give us a massive range of capability. We can now chain together various shaders to apply multiple levels of filtering, and once the gpu is finished with them the data can read to the cpu and fed into image processing applications such as opencv. This is really handy, as algorithms such as object detection often have to do costly filtering before they can operate. Using the gpu as above we can avoid the cpu needing to do the work.

Thus far I've written the following filters:

Gaussian blur
Dilate
Erode
Median
Threshold
Sobel

Here's a few of them in action:

Enjoy!

p.s. my only annoyance right now is that I still have to go through the cpu to get my data from mmal and into opengl. If anyone knows a way of getting from mmal straight to opengl that'd be super awesome!

pp.s. right at the end, here's a tiny shameless advert for my new venture - http://www.happyrobotgames.com/no-stick-shooter. If you like my writing, check out the dev blog for regular updates on my first proper indie title!

42 comments:

Anonymous28 October 2013 at 12:57
Good work. A direct MMAL->texture path is something we'd like to expose - I'll let you know when it's possible.
-Dom.
ReplyDelete
Replies
Chris Cummings28 October 2013 at 14:07
Thanks Dom - I'll look forward to it. I'm thinking of writing one that goes direct to OMX, as that does have an egl_render component which does the job - it's just not exposed in mmal. One thing to bear in mind for the direct to texture path is how you'd handle the yuv format. You could either supply 3 textures, force an rgba conversion then just supply 1 texture, or just supply the raw yuv in a slightly funky texture and let the user figure it out in a shader :)
ReplyDelete
Replies
jbeale28 October 2013 at 18:28
Really great stuff! Thanks for providing the code, I'm trying to understand it now. Question for you: is there an easy way to generate a number in real-time that is proportional to how "in-focus" the scene is? Maybe compute an overall peak-peak magnitude after a high-pass or edge detecting filter? Could that be done all on GPU, or would it need CPU to do generate RMS or peak-peak magnitude value?
ReplyDelete
Replies
Chris Cummings28 October 2013 at 21:06
Hmmm - well you've got 2 problems there. First, is there an algorithm you can think of that, for a given pixel tells you how 'in focus' a small region around it is? If you can do that then you can calculate a per pixel value from 0 to 1 that indicates focus. Then you can downsample that area to average out the focus level across pixels and get down to a low enough texture size for the cpu to process.
ReplyDelete
Replies
pelrun29 October 2013 at 10:35
One of the better algorithms for calculating a focus value is to calculate the statistical variance of the image - essentially a contrast measurement.

i.e. variance = sum((intensity(x,y) - mean_intensity)^2) / (height*width)

Maximising this value gets close to the proper focus position.
ReplyDelete
Replies
Chris Cummings29 October 2013 at 11:28
Hmmm - well it'd be tricky to do that exact algorithm on a gpu, but you could probably get close. You could calculate the mean of a quadrant of pixels and output it to a smaller texture. Then in a 2nd phase, take that sum for a quadrant of pixels, multiply it by 4 for each one, add them together and divide by 16. Then in a 3rd phase multiply by 16, sum, then divide by 64 etc. A similar process could probably be done for the whole equation. That'd give you an approximation that got less accurate as you did further downsamples, but once the image was of a small enough size (say down from 1024x1024 to 128x128) you could then copy it to cpu and do the remaining calculations in more detail. I'd imagine that would get you a solid 30hz for a hi res image. The main issue right now is that there's a big cpu overhead in getting the camera data to gpu, but it looks like we'll have a solution for that soon enough, making relying on the cpu to finish off the work more feasible.
ReplyDelete
Replies
alex/bluespoon29 October 2013 at 17:03
I'm not an expert on the Pi GPU but chris the algorithm you're describing to implement pelrun's variance calc is indeed the right one for all the GPU's I've ever used. you wouldn't need to send very much data back to the CPU - you repeatedly halve the resolution of the texture, entirely using the GPU, until you get to a tiny size (even, 1x1 pixels) that is read back by the CPU.
this is known most generally as an image pyramid http://en.wikipedia.org/wiki/Pyramid_(image_processing)
re-describing it in relation to pelrun's equation:
you repeatedly run a shader that averages 2x2 pixel blocks (or 3x3 or 4x4), summing x and x^2 for each pixel. ideally you want to use floats or >8 bit integer values, but the resolutions are low so memory isn't the issue. (does the pi gpu do >8 bit integer textures?). the literature nearly always downsamples by a factor of 2 each time in the pyramid, but on some gpus it's a better balance of parallelism vs passes to average bigger blocks eg 4x4. Anyway, assuming 2x2, you only need log2(resolution) passes - eg input is 1024x1024, then you get 512x512, 256x256, 128x128,.... down to 1x1.
Pelrun writes variance as var=E(X-mu^2) where mu is the mean, but you can also write var as var=E(X^2)-E(X)^2
(btw by E(f) I mean the expected value of f, ie the average of f over the whole image - so E(X) is the average of all pixels, E(X^2) is the average of the square of all pixels)
with that version, you can use a pyramid to get you E(X) and E(X^2) together, the CPU reads back 2 floats, subtracts them, and voila, you have variance.
I'm rambling :) sorry.
ReplyDelete
Replies
Chris Cummings29 October 2013 at 18:38
Hey Alex

I don't think the pi supports floating point render targets unfortunately, but I was thinking, given its technically grey scale data, I could treat each rgba value as a 32 bit piece of data by encoding a high resolution value as something akin to:
val = (r+g*2+b*4+a*8)/15
(assuming val is always between 0 and 1).
Then to go back you'd multiply val by 15, then break it down into powers of 2. Somethng like that anyhoo :)

ReplyDelete
Replies
pelrun29 October 2013 at 18:55
Ooh, if only more rambles were that informative!
ReplyDelete
Replies
pelrun29 October 2013 at 20:12
As far as I can tell you're right about not having floating point textures :P Packing/unpacking floats into the RGBA8 ints seems to be possible, though - I found the following with some useful code:

http://smt565.blogspot.co.uk/2011/04/bit-packing-depth-and-normals.html
http://www.ozone3d.net/blogs/lab/20080604/glsl-float-to-rgba8-encoder/
https://web.archive.org/web/20130416194336/http://olivers.posterous.com/linear-depth-in-glsl-for-real
ReplyDelete
Replies
chad1 November 2013 at 09:00
$ ./picam
picam: /home/pi/picam/graphics.cpp:224: bool GfxShader::LoadVertexShader(const char*): Assertion `f' failed.
Aborted

not my area of expertise, and most likely I missed a step?

appreciate
ReplyDelete
Replies
Chris Cummings1 November 2013 at 13:53
fyi, for anyone else who hits this issue, Chad had accidentally built picam in a sub directory (in has case, in picam/build), so the executable file wasn't in the same folder as the shaders. The assertion is indicating the shader wasn't found.
ReplyDelete
Replies
Anonymous13 November 2013 at 19:52
Direct MMAL->texture is now supported. See:
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=43&t=59304
-Dom
ReplyDelete
Replies
Unknown10 April 2014 at 05:47
Hi Chris,

I'm really glad that you have made records of your experiments and that you are kind enough to share it with the rest of us!

I am also looking to do some image processing onboard the Raspberry Pi and given that you are much more familiar with the capabilities of MMAL than I am, I wonder if you could comment on whether it's realistic to be able to retrieve YUV for uncompressed luminance buffer, at, say, 720p, run various image processing code over that (and use the results immediately for external purposes), then "render" some debug lines or shapes (basically draw some boxes over features that were detected that I'm interested in), and then, using this frame buffer that's been drawn over, send to the H.264 encoder for output to a file or network stream?

The way that I have been starting to investigate how to do this is reading RaspiVid.c, and I got to this point:

status = connect_ports(camera_video_port, encoder_input_port, &state.encoder_connection);

It definitely looks like this is where your API (which I am perusing next) comes in! I would just love to know if you know off the top of your head if I can leverage the capabilities of what you're doing here (using the GPU to e.g. get lower mip levels without having to go all the way to copying to an OpenGL frame buffer texture, although -- excitingly -- we now can do that efficiently as well, so it seems) while being able to pump a result into the H.264 encoder?

Fantastic work!
ReplyDelete
Replies
Unknown11 April 2014 at 04:45
By the way... I don't know if you guys have tried this... but I've got my RasPi hooked up to a monitor via HDMI so I can bring the camera up close and feed it back in through the picam_gpu grid demo. This is SUPER trippy. I will even liken it to a portal to the netherworld, you can get some amazing visuals that evolve at whatever framerate the Pi can dish out, transcending the realm of that which is purely digital or analogue.

This is tremendously powerful stuff.
ReplyDelete
Replies
Unknown12 April 2014 at 05:59
This comment has been removed by the author.
ReplyDelete
Replies
TeaPack17 May 2014 at 20:38
Hi,
is it posibble, to run discrete wavelet transformation on Raspi's GPU?
ReplyDelete
Replies
Unknown4 September 2014 at 12:52
Hi Chris
I have sent you an email regarding OpenCV implementation with this API. Just to repeat the question here: Is there an eay way to convert the opengl textures used in the API into the OpenCV Mat format so that I can do image manipulation after filtering.
Thanks for the great API though, it's very helpful
ReplyDelete
Replies
Leon Miller-Out5 October 2014 at 03:30
Thanks, this is awesome!

I had to make some modification to compile this. They're here in case anyone is interested:
https://gist.github.com/sbleon/ccd52c5b7983a226f2d7
ReplyDelete
Replies
Unknown8 March 2015 at 18:36
Hi I am newbie on opengl, may I ask you how to hide opengl window in your code?
Thank you
ReplyDelete
Replies
Unknown18 October 2015 at 04:22
I'm wanting to do something similar, using a webcam for input and a 128x128 display on the SPI port.
Ideally with edge-detection filtering in-between, if possible.
Are the GPUs only able to work on pixels destined for the HDMI output, or can they send data to a secondary frame buffer?

ReplyDelete
Replies
Anonymous4 November 2015 at 21:55
Hey,
I'm implementing a Harris Corner detection on something similar instead of Sobel and I was wondering if you have any codes for that? I'm getting an error when I try to implement the algorithm instead of the Sobel algorithm, probably due to the size of data I am sending across.
ReplyDelete
Replies
Gernot Ziegler5 April 2017 at 22:54
Hi!

Just got a Raspberry Pi C for my birthday - and, being an old GPGPU veteran, I am planning to explore image processing on its GPU as well. ;)

What I wanted to ask:
Does the MMAL layer allow for getting the camrea data as an OpenGL ES texture nowadays? I know that Android (and thus: many cellphones) allow for that, using the OES_texture_external extension.

Thanks for your time and advice,
Gernot ( gz@geofront.eu )

PS: For anyone curious on GPGPU, here a link to my thesis: http://www.geofront.eu/thesis.pdf
You might like chapters 4, 7 and 8 in particular. :-)
ReplyDelete
Replies
Unknown30 November 2017 at 22:57
Hello! The result is amazing. I just don't understand the context, are you using Raspberry PI alone? What additional Hardware did you use? I have no idea about GPUs, can you please tell me how can I learn all these stuff from the beginning??
ReplyDelete
Replies
dinda2811 January 2019 at 15:49
MestiQQ Adalah perusahaan judi online KELAS DUNIA ber-grade A

Sudah saatnya Pencinta POKER Bergabung bersama kami dengan Pemain - Pemain RATING-A

Hanya dengan MINIMAL DEPOSIT RP. 10.000 anda sudah bisa bermain di semua games.

Kini terdapat 8 permainan yang hanya menggunakan 1 User ID & hanya dalam 1 website.
( POKER, DOMINO99, ADU-Q, BANDAR POKER, BANDARQ, CAPSA SUSUN, SAKONG ONLINE, BANDAR66 )

PROSES DEPOSIT DAN WITHDRAWAL CEPAT Dan AMAN TIDAK LEBIH DARI 2 MENIT.

100% tanpa robot, 100% Player VS Player.
Live Chat Online 24 Jam Dan Dilayani Oleh Customer Service Profesional.

Segera DAFTARKAN diri anda dan Coba keberuntungan anda bersama MestiQQ
** Register/Pendaftaran : WWW-MestiQQ-POKER
Jadilah Milionare Sekarang Juga Hanya di MestiQQ ^^

Untuk Informasi lebih lanjut silahkan Hubungi Customer Service kami :
BBM : 2C2EC3A3
WA: +855966531715
SKYPE : mestiqqcom@gmail.com
ReplyDelete
Replies
ACONG COKIN28 February 2019 at 17:04
ULANG TAHUN BOLAVITA YANG KE 6

Sehubungan dengan ulang tahun Bolavita yg ke-6
Bolavita sebagai Agen Taruhan Judi Online terbaik dan terpercaya akan menbagikan FREEBET & FREECHIP kepada member setia bola vita.

FREEBET & FREECHIP hingga 2.000.000 (Dua juta rupiah)
syarat dan ketentuan cek = https://bit.ly/2SsDinv

Gabung sekarang dan klaim bonusnya!

Info Lengkap :
WA: 0812-2222-995 !
BBM : BOLAVITA
WeChat : BOLAVITA
Line : cs_bolavita

Info Togel =>> PREDIKSI TOGEL TERPERCAYA
ReplyDelete
Replies
ADELE ADRIANA AGATHA1 March 2019 at 14:40
Happy Good Day para member setia AGENS128, oke gengs kali ini kami akan memberikan refrensi untuk kalian semua yang sedang mencari permainan TOTO ONLINE dengan bonus terbesar yang akan kami berikan kepada kalian semua, jadi untuk kalian yang mau mencoba bermain bersama kami dengan bonus yang besar yang akan kami berikan kepada kalian,kalian bisa mendaftarkan diri kalian sekarang juga dan menangkan BONUS YANG BESAR HANYA BERSAMA AGENS128.

Untuk keterangan lebih lanjut, segera hubungi kami di:
BBM : D8B84EE1 atau AGENS128
WA : 0852-2255-5128

Ayo tunggu apalagi !!
ReplyDelete
Replies
Unknown30 September 2019 at 11:47
Hello, Chris!

The link for the code seems to be unavaliable: http://www.cheerfulprogrammer.com/downloads/picamgpu/picam_gpu.zip

Could you please re-upload it to somewhere else?

Thanks!
ReplyDelete
Replies
awdawd21 February 2020 at 21:55
judi slot terbaik 2019

judi slot terpopuler

judi slot terbanyak

judi slot online terpercaya

https://taruhanslot.me/bolavita-agen-slot-online-dengan-ribuan-game-terbaik/

BONUS 10% MEMBER BARU SLOT VIVOSLOT, JOKER123, PLAY1628
Judi Slot Bolavita Bisa Deposit Via OVO & GO-Pay.
Taruhan Slot Deposit Via Pulsa XL & TSEL 25rb.

INFO Pendaftaran Slot Online : http://159.89.197.59/register/
INFO Artikel Slot Online : https://taruhanslot.live

WITHDRAW BESAR
JACKPOT BESAR
SLOT GAMES!!
Buruan Daftar , Main dan Withdraw Bersama Agen Judi online BOLAVITA kembali.

Telegram : +62812-2222-995
Wechat : Bolavita
WA : +62812-2222-995
Line : cs_bolavita
ReplyDelete
Replies
Zainab Amjad10 September 2020 at 16:10
Thank you for any other informative web site... Folder Lock
ReplyDelete
Replies
erection pills online viagra25 November 2020 at 14:42
A fascinating discussion is worth comment. I do think that you need to write more on this subject, it might not be a taboo matter but usually people don't talk about such subjects. To the next! Cheers!!
ReplyDelete
Replies
erectile25 November 2020 at 14:47
You're so interesting! I do not suppose I've read through anything like this before. So wonderful to find someone with some genuine thoughts on this subject matter. Really.. thanks for starting this up. This site is something that is needed on the internet, someone with some originality!
ReplyDelete
Replies
cracklayer14 June 2021 at 12:16
Download Full Crack Version;
https://cracklayer.com/falcon-box/
https://cracklayer.com/z3x-lg-tool/
https://cracklayer.com/norton-security/
ReplyDelete
Replies
Nabiha art10 November 2021 at 12:20
This is amzaing blog thanks for that great information. Thanks!
bitwig-studio-crack

combin-crack/

vray-crack
ReplyDelete
Replies
Anonymous18 April 2022 at 07:16
Эмуляторы привлекают масса гостей. Однако хайроллеры, опытные фанаты пыла нередко избирают развлечения с настоящими крупье. Основное превосходство — игра с живым человеком, а не против ГСЧ. Вероятность выигрыша повышается. Одновременно предлагаются неповторимые веселия лучшее казино. Игровой процесс прозрачен, при сомнении на жульнические действия предоставляется видеозапись сессии. Кроме сего, с живыми дилерами возможно пообщаться, ощутить себя в настоящем казино не выходя из жилища.
ReplyDelete
Replies

Add comment

My Robot Blog

Sunday 27 October 2013

GPU Accelerated Camera Processing On The Raspberry Pi

42 comments:

About Me

Blog Archive