Archive for the 'Computer Vision' Category

Reverse Engineering the Lytro .LFP File Format

Lytro Microlens Array

After getting my Lytro camera yesterday, I set about answering the questions about the light field capture format I had from the last time around.  Lytro may be focusing (pun absolutely intended) on the Facebook using crowd with their camera and software, but their file format suggests they don’t mind nerds like us poking around.  The file structure is the same as what they use for their compressed web display .lfp files, complete with a plain text table of contents, so I was able to re-use the lfpsplitter tool I wrote earlier with some minor modifications.  The README with the tool describes in detail the format of the file and how to parse it.

The table of contents in the raw .lfp files gives away most of the camera’s secrets.  It contains a bunch of useful metadata and calibration data like the focal length, sensor temperature, exposure length, and zoom length.  It also gives away the fact that the camera contains a 3 axis accelerometer, storing the orientation of the camera with respect to gravity in each image.   The physical sensor is 3280 by 3280 pixels, and the raw file just contains a BGGR Bayer array of it at 12 bits per pixel.  Saving the array and converting it to tif using the raw2tiff command below shows that each microlens is about 10 pixels in diameter with some vignetting on the edges.

raw2tiff -w 3280 -l 3280 -d short IMG_0004_imageRef0.raw output.tif

Syncing the camera to Lytro’s desktop software backs it up the first time.  Amazingly, the backup file uses the same structure as both .lfp file types.  The file contains a huge amount of factory calibration data like an array of hot or stuck pixels and color calibration under different lighting conditions.  Incredibly, it also lets loose that there is functioning Wi-Fi on board the camera with files named “C:\\CALIB\\WIFI_PING_RESULT.TXT” and “C:\\CALIB\\WIFI_MAC_ADDR.TXT”, which matches what the FCC teardowns show.  There is no mention of Bluetooth support though, despite support by the chipset.  In any case, it seems there is a lot of cool stuff coming via firmware updates.

Hopefully one of those updates enables a USB Mass Storage mode, as there does not appear to be any way to get files off of the camera in Linux. I had to borrow my roommate’s MacBook Air for this escapade. The camera shows up as a SCSI CD drive, but mounting /dev/sr0 only shows a placeholder message intended for Windows users.

Thank you for purchasing your Lytro camera.  Unfortunately, we do not have a
Windows version of our desktop application at this time.  Please check out
http://support.lytro.com for the latest info on Windows support.

It was pretty trivial to write the lfpsplitter to get the raw data shown above, but doing anything useful with it will take more effort.  Normally simple stuff like demosiacing the Bayer array will likely be complicated by the need to avoid the gaps between microlenses and not distort the ray direction information.  Getting high quality results will probably also require applying the calibration information from the camera backups.  A first party light field editing library would be wonderful, but Lytro probably has other priorities.

You can grab my lfpsplitter tool from GitHub at git://github.com/nrpatel/lfptools.git and I uploaded an example .lfp you can use with it if you want to play with light field captures without the $400 hardware commitment.

Thoughts on the Lytro Light Field Camera

Lytro recently made its namesake light field camera available for preordering. The light field camera reaches closer to the plenoptic function than a standard camera in that instead of only summing the photons to arrive at chromacity and luminosity at each pixel, it additionally determines directional information. It does so by placing an array of microlenses above the sensor, each of which represents a light field pixel and covers a region of sensor pixels. Each sensor pixel then captures a ray arriving at a specific direction at its parent microlens. Ren Ng’s thesis is full of fascinating uses for this, but it seems Lytro is primarily focusing on the ability to refocus the light field image.

There is very little information available about the format the camera is capturing the light field in, but I suspect that it will not be impossible to use the files for other purposes like viewing parallax and perspective changes on a single capture. So far, the information we have is that the 8 gigabyte model can store 350 images, the sensor can capture 11 megarays, and the examples in the online gallery have resolutions of 831×831 to 1080×1080. Since the sensor in a light field camera captures one ray per pixel, we can assume the physical sensor is 11 megapixels. Conveniently, 350 11 megapixel images of 2 bytes per pixel add up to roughly 8 gigabytes. This suggests the format may be either a raw 16 bit Bayer array off of the sensor or a processed and packed RGB array. As for the microlens array, I suspect that it is a roughly 831×831 grid of hexagonal lenses, each of which cover a roughly 16 square pixel area, for a total sensor resolution of 3324×3324 pixels. We probably won’t know for sure until the cameras ship in early 2012.

In the meantime, we do have some sample images to play with, but not in the format captured by the camera. The Lytro desktop app apparently exports compressed representations of the light field to reduce file sizes and rendering requirements for web display. The .lfp files are simply a set of JPEGs representing the unique visually interesting sections of the light field. That is, a set in which each image shows a different area in focus. It appears to do so dynamically, picking the minimum number of images necessary to show all focusable objects in narrow depths of field. These images are stored along with their estimated depths and a depth lookup table for the image. This allows for HTML5 and Flash applications like the one embedded above in which the user clicks on a region of the image, the value of that region is looked up, and the depth image closest to that value is displayed.

To allow for viewing the files offline and to satisfy my curiosity, I wrote a tool called lfpsplitter that reads in an .lfp and writes out its component images as .jpg files and the depth lookup table and image metadata as plain text files. It is available on github, along with a README describing the file format in detail. Until we have Lytro cameras and .lfp files of our own to play with, you can find example files by examining the html source of Lytro’s gallery page.

Update: Given the animated parallax shift image of Walt Mossberg on the Lytro blog, it seems that each microlens covers an area 5 pixels across horizontally. Perhaps the sensor is 4096×4096 and 11 megarays describes the number of pixels getting useful photons, or the microlenses are arranged in a honeycomb pattern with a maximum width of 5px.

Maker Ant Farm: Minecraft Skin Generation with a Kinect

Since my seemingly fragile 3D printer had never left my desk before and even in prime condition could only print an object every 10 minutes or so, I decided that I needed a backup project for the Bay Area Maker Faire last month.  I conscripted Will to help me out on a purely software Kinect based project.  After downscoping our ideas several times as the Faire weekend approached, we eventually settled on generating Minecraft player skins of visitors.  The printer ended up working fine (and more reliably than the software only project), but the Minecraft “Maker Ant Farm” was more of a crowd pleaser.

A visitor would stand in front of the Kinect and enter fieldgoal/psi calibration pose.  We used OpenNI and NITE to find their pose and segment them out of the background for a preview display.  Using OpenCV, we mapped body parts to the corresponding sections of the Minecraft skin texture.  Since we could only see the fronts and parts of the sides of a person, we just made up what the back would look like based on the front.  This was of course imprecise and resulted heads that often looked like they had massive bald spots.  Rather than trying to write some kind of intelligent texture fill algorithm on a short schedule, we just gave all of the skins yellow hard hats (not blonde hair, contrary to popular opinion).  After generating the skin, we loaded it back onto ShnitzelKiller’s player rig in Panda3D.  I had planned on writing full skeletal tracking for the rig, but ran out of time and settled on just having it follow the position and rotation of the user and perform an animated walk.  After walking around a bit watching a low res version of him or herself, the user could enter in a Twitter handle or email address to keep the skin.  The blocky doppelgänger was then dropped onto a Minecraft server instance we had running as a bot that did simple things like walk around in circles or drown.

Minecraft Skin

Despite some crashiness in NITE and the extremely short timeframe we wrote the project in, it ended up working reasonably well.  Thanks to the low resolution style and implied insistence on imagination in Minecraft, the players avoid looking like the ghastly zombies in Kinect Me.  You can see examples of some of the generated skins on @MakerAntFarm.  I hate not releasing code, but I almost hate releasing this code more.  It is very likely to be the worst I have ever hacked together, and I can’t help but suspect it will be held against me at some point.  Nonetheless, for the greater good, it’s up on github.  There are vague instructions on how one might use it in the README.  Good luck, and I’m sorry.

Gestural Printing: Jumping the Shark on Kinect Hacks

We’ve seen a seemingly endless array of amazing Kinect hacks over the last few months, from superhero generators to obstacle avoiding quadcopters.  However, it was only a matter of time before someone came up with a hack so inane and irrelevant that it would bring shame to the entire hobby.  That time is now, and that someone is me.  I bring to you, gestural 3D printing!  Using the Kinect to track your hand, you can draw one layer at a time, with the printer following your every move.  Pushing forward extrudes plastic, while pulling your hand back will start a new layer.  Who needs difficult and confusing CAD software when you can just directly draw the object you want to print?

Really though, you can only get through 4 or 5 layers before your arm feels like it’s going to fall off, and the resulting object will look like a stringy blob of plastic vomit.  The source is in the FaceCube GitHub repository.  I don’t recommend actually using it, but if for some reason you want to, the dependencies are mindbogglingly complex.  You’ll need to install OpenNI and NITE to start with; this guide at Keyboardmods is helpful.  You’ll also need my branch of OSCeleton, which improves on hand tracking.  With the Kinect hooked up, you can run ./osceleton -n -f to start hand tracking in an Open Sound Control server.  You can then run the gestureprinter.py script, which requires pyOSC, pygame, and the RepRapArduinoSerialSender script from Skeinforge, which is also in the FaceCube repository.  Of course, you’ll also need both a Kinect and a 3D printer that is compatible with the Gcode that RepRap firmwares use.  The script is set up for my printer specifically, but it should be straightforward to tweak for others if you dare.

Gestural Print

Augmenting Reality with Reality

I combined voxel carving and augmented reality to insert 3d reconstructions of real life objects into real life scenes for a final project for 15-463, Computational Photography.  There is a more detailed writeup here.  It looks kind of bleh at the moment, and it involves a lot of hacked together libraries.  I really like the idea of it though, so this is something I’m planning on revisiting when I have more skill in the third dimension.

Face Morphing from Obama to McCain

This was for an assignment for Computational Photography, a course I’m taking this semester.  As such, I can’t in good faith upload the MATLAB source.  Some day, I might write a face morphing library in C or Python though.

There is no great social or political message here.  I just thought it would look cool.  Both of the images are from Wikipedia.

On the technical side of things, matching points are manually picked on both images.  They are then formed into triangles using a Delaunay function.  The matching triangles are then morphed using an affine transform by the fundamental matrix that matches the three specified pairs of points.  A different amount of “morph” is applied to each of the 61 frames in this.  It all comes together to look surprisingly smooth.