Often color processing is desirable when a user is viewing the images rather than a machine, such as with the outdoor applications of traffic monitoring or surveillance. Color processing can be done via the frame grabber or right in the camera. This recent blog post on chipsight.com, which we have reposted below, provides some examples and the pros and cons of raw images versus color processed images. Adimec color cameras have Bayer color filter arrays and can provide raw or RGB data to meet the needs of the specific application.
http://chipsight.com/raw-pixel-diet/
RAW Pixel Diet
by Craig Sullender on May 16, 2011
What type of input does a vision application need? Can we use pixel data straight from the camera’s image sensor? Or does the application need pixel data processed by an image sensor pipeline (ISP) with color transformations and denoising?
Computer vision: Extract information from an image to perform a task.
In other words a computer vision application may not need to produce an image for people to look at. This gives us a range of choices about the type of image sensor data we use for the vision input.
The CMOS cameras under consideration (e.g. OV7690) contain an ISP, and output pixel color values in either a Bayer color filter array pattern (CFA), RGB, or YUV formats. The CFA pattern data output is called “raw,” and is a data format, not a file format like “*.raw.”
Let’s look at raw vs. RGB for the computer vision input.
The image sensor array produces only a red, green, or blue value at each pixel location. Green occurs on every horizontal line, while red and blue are on alternating horizontal lines.
The Bayer color filter array (or mosaic). Each sensor pixel location is covered by a red, green, or blue color filter. Image from Wikipedia.
Post processing in the camera interpolates the missing color values with a demosaicing algorithm, so that each pixel location then has three values, representing red, green, and blue intensities estimated for that location.
The camera outputs the raw (CFA) or the interpolated RGB values according to user register settings. Our choices are RGB565 and Raw8.
RGB565 is packed into two 8-bit reads of the camera data bus: 5 bits red, 6 bits green, and 5 bits blue to make 16 bits for each pixel.
Raw8 is one byte per pixel, and represents either red, green, or blue according to the pixel’s location in the color filter array.
RGB pros: complete red, green, and blue values for each pixel
RGB cons: only 5 bits resolution, twice the data rate, twice the data
Raw pros: Full 8 bits resolution, lower data rate
Raw cons: missing two color values at each pixel
What are the trade-offs between RGB and raw for a computer vision processing step like motion detection by background removal?
The background is an earlier image from the camera stored in a frame buffer (SRAM or DRAM) and updated according to rules we won’t go into here. The current image from the camera is subtracted pixel by pixel from the background, with the pixels from the two sources aligned so that they correspond spatially. Pixel 1 of the current image is subtracted from pixel 1 of the background, then pixel 2 of the current image is subtracted from pixel 2 of the background, and so on. If nothing has changed over time in the view of the camera, then there are no moving objects, and the subtraction step results in values near zero, depending on noise. If something moved, a group (region) of pixels will show absolute values greater than a chosen threshold.
Video example using successive frames (video images) as the background. Moving objects are displaced only a few pixels in distance compared to their location in the background frame, so the result is mostly edges. For our purpose the camera is in a fixed location and the background frame is maintained over time, so that moving objects will cause larger groups of motion pixels.
Grayscale images work fine for motion detection, but we are using color for greater sensitivity and for compatibility with other vision functions used in combination with the motion detection.
With RGB you are subtracting red from red, green from green, and blue from blue. With raw you are missing two colors at every pixel location, so how can it work? It works for detecting changes in the scene because the camera is in a fixed position, so the color filter array (CFA) is in a fixed alignment with the scene. Though the colors are spatially distinct, there is still detection happening at each pixel. Objects that move will be detected by the raw format the same as the RGB.
But not exactly. For example a scene with very little red in it will not detect objects at the red CFA locations with as much sensitivity as the other CFA locations. The detected region will have holes for every red location. To fill in the holes, you can filter the detection result by spreading motion pixels from neighboring locations. If the green to the left shows motion, then define the red location to its right as a motion pixel also.
That works for filtering the results horizontally. What about vertically? For example if the blue CFA locations show motion, but not the red and green, you’ll get a horizontally striped result. Filtering vertically is unfortunately a bit more demanding on resources. You have to save the current image line motion result in a line buffer so you can access the motion result for the pixel above the pixel you are filtering. Fortunately you only need to save one bit to indicate a pixel that changed due to motion.
RGB pros: full color detection at each pixel location
RGB cons: lower sensitivity*, three subtractions for each pixel
Raw pros: 8 bit sensitivity, one subtraction per pixel
Raw cons: partial color, requires filtering
* RGB565 sensitivity is arguably greater than 5 or 6 bits when you take into account the combined measurements of red, green, and blue.
Which format you choose depends on how much memory is available for the background frame buffer, what data rate the vision input can handle, and how much sensitivity the application needs, compared to the added filtering required for raw.