Saturday, 26 October 2013

Pi Eyes Stage 5

I'm making real progress now getting the camera module simpler and more efficient. My next goal is to rework the camera API to be a more synchronous process (no more callbacks) where the user can simply call 'ReadFrame' to get the next frame.

A Simple Syncronous API

The first step turned out to be pretty simple thanks to the 'queue' structure in mmal. I simply create my own little queue called 'OutputQueue' and change the internal camera callback to be:

void CCamera::OnVideoBufferCallback(MMAL_PORT_T *port, MMAL_BUFFER_HEADER_T *buffer)
{
    //first, add the buffer to the output queue
    mmal_queue_put(OutputQueue,buffer);
}

That code used to lock the buffer, call a callback, then return it to the port for recycling. However now it just pushes the buffer into an output list for processing by the user. Next up, I add a 'ReadFrame' function:

int CCamera::ReadFrame(void* dest, int dest_size)
{
    //default result is 0 - no data available
    int res = 0;

    //get buffer
    if(MMAL_BUFFER_HEADER_T *buffer = mmal_queue_get(OutputQueue))
    {
        //check if buffer has data in
        if(buffer->length)
        {
            //got data so check if it'll fit in the memory provided by the user
            if(buffer->length <= dest_size)
            {
                //it'll fit - yay! copy it in and set the result to be the size copied
                mmal_buffer_header_mem_lock(buffer);
                memcpy(dest,buffer->data,buffer->length);
                mmal_buffer_header_mem_unlock(buffer);
                res = buffer->length;
            }
            else
            {
                //won't fit so set result to -1 to indicate error
                res = -1;
            }
        }

        // release buffer back to the pool from whence it came
        mmal_buffer_header_release(buffer);

        // and send it back to the port (if still open)
        if (VideoCallbackPort->is_enabled)
        {
            MMAL_STATUS_T status;
            MMAL_BUFFER_HEADER_T *new_buffer;
            new_buffer = mmal_queue_get(BufferPool->queue);
            if (new_buffer)
                status = mmal_port_send_buffer(VideoCallbackPort, new_buffer);
            if (!new_buffer || status != MMAL_SUCCESS)
                printf("Unable to return a buffer to the video port\n");
        }    
    }

    return res;
}

This gets the next buffer in the output queue, copies it into memory provided by the user, and then returns it back to the port for reuse, just like the old video callback used to do.

It all worked fine first time, so my actual application code is now as simple as:

//this is the buffer my graphics code uses to update the main texture each frame
extern unsigned char GTextureBuffer[4*1280*720];

//entry point
int main(int argc, const char **argv)
{
    printf("PI Cam api tester\n");
    InitGraphics();
    printf("Starting camera\n");
    CCamera* cam = StartCamera(1280,720,15);

    printf("Running frame loop\n");
    for(int i = 0; i < 3000; i++)
    {
        BeginFrame();

        //read next frame into the texture buffer
        cam->ReadFrame(GTextureBuffer,sizeof(GTextureBuffer));

        //tell graphics code to draw the texture
        DrawMainTextureRect(-0.9f,-0.9f,0.9f,0.9f);

        EndFrame();
    }

    StopCamera();
}

As an added benefit, doing it synchronously means I don't accidentally write to the buffer while it's being copied to the texture, so no more screen tearing! Nice!

A bit more efficient

Now that I'm accessing the buffer synchronously there's the opportunity to get things more efficient and remove a frame of lag. Basically the current system goes:

  • BeginFrame (updates the main texture from GTextureBuffer - effectively a memcpy)
  • camera->ReadFrame (memcpy latest frame into GTextureBuffer)
  • DrawMainTextureRect (draws the main texture)
  • EndFrame (refreshes the screen)
There's 2 problems here. First up, our read frame call is updating GTextureBuffer after its copied into the opengl texture. This means we're always seeing a frame behind, although that could be easily fixed by calling it before BeginFrame. Worse though, we're doing 2 memcpys - first from camera to GTextureBuffer, and then from GTextureBuffer to the opengl texture. With a little reworking of the api however this can be fixed...

First, I add 'BeginReadFrame' and 'EndReadFrame' functions, which effectively do the same as the earlier ReadFrame (minus the memcpy), but split across 2 function calls:

bool CCamera::BeginReadFrame(const void* &out_buffer, int& out_buffer_size)
{
    //try and get buffer
    if(MMAL_BUFFER_HEADER_T *buffer = mmal_queue_get(OutputQueue))
    {
        //lock it
        mmal_buffer_header_mem_lock(buffer);

        //store it
        LockedBuffer = buffer;
        
        //fill out the output variables and return success
        out_buffer = buffer->data;
        out_buffer_size = buffer->length;
        return true;
    }
    //no buffer - return false
    return false;
}

void CCamera::EndReadFrame()
{
    if(LockedBuffer)
    {
        // unlock and then release buffer back to the pool from whence it came
        mmal_buffer_header_mem_unlock(LockedBuffer);
        mmal_buffer_header_release(LockedBuffer);
        LockedBuffer = NULL;

        // and send it back to the port (if still open)
        if (VideoCallbackPort->is_enabled)
        {
            MMAL_STATUS_T status;
            MMAL_BUFFER_HEADER_T *new_buffer;
            new_buffer = mmal_queue_get(BufferPool->queue);
            if (new_buffer)
                status = mmal_port_send_buffer(VideoCallbackPort, new_buffer);
            if (!new_buffer || status != MMAL_SUCCESS)
                printf("Unable to return a buffer to the video port\n");
        }    
    }
}

The key here is that instead of returning the buffer straight away, I simply store a pointer to it in BeginReadFrame and return the address and size of the data to the user. In EndReadFrame, I then proceed to unlock and release it as normal.

This means my ReadFrame function now changes to:

int CCamera::ReadFrame(void* dest, int dest_size)
{
    //default result is 0 - no data available
    int res = 0;

    //get buffer
    const void* buffer; int buffer_len;
    if(BeginReadFrame(buffer,buffer_len))
    {
        if(dest_size >= buffer_len)
        {
            //got space - copy it in and return size
            memcpy(dest,buffer,buffer_len);
            res = buffer_len;
        }
        else
        {
            //not enough space - return failure
            res = -1;
        }
        EndReadFrame();
    }

    return res;
}

In itself that's not much help. However, if I make a tweak to the application so it can copy data straight into the opengl texture and switch it to use BeginReadFrame and EndReadFrame I can avoid one of the memcpys. In addition, by moving the camera read earlier in the frame I lose a frame of lag:

//entry point
int main(int argc, const char **argv)
{
    printf("PI Cam api tester\n");
    InitGraphics();
    printf("Starting camera\n");
    CCamera* cam = StartCamera(MAIN_TEXTURE_WIDTH, MAIN_TEXTURE_HEIGHT,15);

    printf("Running frame loop\n");
    for(int i = 0; i < 3000; i++)
    {
        //lock the current frame buffer, and copy it directly into the open gl texture
        const void* frame_data; int frame_sz;
        if(cam->BeginReadFrame(frame_data,frame_sz))
        {
            UpdateMainTextureFromMemory(frame_data);
            cam->EndReadFrame();
        }

        //begin frame, draw the texture then end frame
        BeginFrame();
        DrawMainTextureRect(-0.9f,-0.9f,0.9f,0.9f);
        EndFrame();
    }

    StopCamera();
}

Much better! Unfortunately I'm still only at 15hz due to the weird interplay between opengl and the mmal resizing/converting components, but it is a totally solid 15hz at 720p - about 10hz at 1080p. I suspect I'm going to have to ditch the mmal resize/convert components eventually and rewrite them as opengl shaders, but not just yet.

Here's a video of the progress so far:



And as usual, here's the code:


Next up - downsampling!

2 comments:

  1. Great progress! Looking forward to future updates as well. Does it seem possible for the GPU to generate both a full-res H264 output and a downsampled frame for CPU processing at the same time? If you use OpenGL shaders, does that mean the output only goes to the screen display, not available to code on the CPU?

    ReplyDelete
  2. Hey there. In answer to your questions:

    Could you have a high res H264 output and a downsampled frame?
    Yes definitely - I've just got the video splitter working (see next post) and am using it to generate different levels of detail outputs for image processing. However you could just as easily divert the top level to the H264 encoder component instead. Not sure exactly what the performance would be but I'd imagine you could find a way to run it at full frame rate.

    Does using shaders mean you can only output to the screen?
    No, shaders can render to a texture which can then be read by the cpu next frame - although they also provide a convenient way of getting stuff on screen as well.

    ReplyDelete