PhillyHistory Augmented Reality: Developer Journal 1

Erik (staff profile) and I (staff profile) are working on an exciting project this month: our mission (having chosen to accept it) is to explore the current state of the art (and industry) of “augmented reality” in order to create prototype mobile phone applications that let you look through your cellphone’s camera and see historic photos blended into the landscape around you, in the place where the photographer first snapped the photo. We’re working with PhillyHistory’s historic archive of photos, which is incredibly rich — the Philadelphia Department of Records has made more than 90,000 photos available (and still growing). We’ll release a whitepaper describing our experiments and discoveries when we’re done, but we thought we’d share our thoughts and progress along the way in the form of a developer’s journal — a diary of our ongoing thoughts and progress.

Erik pretending to hold up a photo

Erik is pretending to hold up a photo, floating in space. Inside, unfortunately, the photos are hard to keep still. You can see my beard in the lower left!

Augmented reality is still more of a dream than a technology — we’ve all seen science fiction movies where futuristic displays can annotate what we see around us, or can create the illusion of virtual 3D objects in the space around us. But to date most augmented reality applications are fun and interesting experiments, but are still quite limited and have issues that keep them from being useful tools that we use day to day. But the promise is huge — imagine walking down a city street, looking at a building near you, and then being able to see the same building as it looked 80 years ago by peering at it through your handheld device. Or beyond that, image a device that would allow you to look around in the forest and virtual museum descriptions would appear that identified the genus and species and tell you interesting facts. Or imagine theater performances in which virtual ghosts performed Macbeth in your living room, prancing around if they were really there. Or imagine you’re in a country where you don’t speak the language, and — as you gaze through your phone’s camera — subtitles appear in the air in front of the people you meet.

Artist's depiction (thanks Carissa Brittain) of PhillyHistory Augmented Reality

But the current reality is far more humble. Most of the applications I’ve used were primarily novelty — little floating icons that are occasionally sort of in the right direction of a nearby restauraunt or clever computer vision art, like replacing corporate logos with the faces of their CEOs. But despite that, we want to try and push what we’ve seen by placing photos in 3D where they were originally taken. We began our research with a broad survey of the current tools (with a focus on open source tools that we could extend and tinker with) and the companies that currently provided augmented reality platforms that developers can build against.

As far as I can tell, there are two main categories of augmented reality applications and platforms at the moment: GPS based and Computer vision based. (These are my own categories, don’t try googling for them.)

GPS based

These applications use your phone’s GPS to determine where you are, and then use whatever other hardware your phone has (accelerometer, gyroscope, etc.) along with the GPS to guess at your current heading — which way you’re pointed (your heading or “yaw”), and how far up or down you are pointing your phone relative to the horizon (your “pitch”) and if the phone is twisted vertically (your “roll”).

Usually these applications take the form of little floating balls or symbols on the horizon in the direction you are looking at. While pretty cool, there are some significant limitations:

  • The location data you get from a GPS is not very precise and is often very wrong. This gets worse in the city (the signal can bounce off of buildings) and GPS barely works at all if you are indoors.
  • Your phone has pretty bad information about which way you’re pointing. The GPS can only guess at your heading if you are moving (e.g. in a car). Newer phones have compasses and accelerometers and some newer ones have gyroscopes, which can help. But the applications we’ve played with still do a pretty poor job. But we’ll see if we can do better.
  • The data is all very ‘jittery’ — like a compass that is jiggling roughly around the direction it should be pointing, the position data from your phone is constantly changing even when you’re standing still. This is no good for creating an illusion of a real, steady object.

Computer vision based

These applications use powerful computer vision libraries to help the computer identify what it is seeing through a digital camera. Often they require some preparation. If you print up a symbol that is simple for the computer to identify — e.g. a piece of paper with a black square and a unique symbol — the computer can identify the symbol, figure out how the camera must be oriented to the square by seeing how it looks, and then add a 3D object to the image. Other examples of augmented reality through computer vision don’t require preparation — for example, certain applications can identify a surface (like a desk surface) and can place 3D objects on that surface. But there are limitations here, as well:

  • Just looking around in the world doesn’t tell you where you are in the world.
  • While very powerful from one perspective, it’s still very limited in what you can easily do — like many computer technologies, it’s amazing but still not very smart.
  • It requires a lot of processing power, often more than what your cellphone can offer.

The two most popular and influential libraries for open source AR development are OpenCV and ARToolkit (and its derivatives).

OpenCV

OpenCV is an extremely powerful computer vision library that is also open source and widely used. Many thanks to David Zwarg (staff profile) for first turning me on OpenCV. It’s awesome. Intel originally built it as an initiative to push forward new applications that would need faster and faster computer processing, and my understanding is that much of the core work was done by Intel Russia. It also can use Intel’s proprietary IPP libraries. Now it has another corporate maintainer, but it’s also widely used. It can do *all sorts* of magic. For example, I’ve been playing with OpenCV (via the open source ROS library for Robitics) at home with my Kinect to process the Kinect’s RGB-D output (an image where the camera tells you the depth of each pixel) into a 3D model of the person it’s seeing. It’s also used in a wide range of applications, like the Stanley, the Stanford autonomous car that won the DARPA Grand Challenge race (with a $2 million dollar prize) for cars that could drive themselves across the desert. In fact, the professor who ran the vision team for that project is the co-author of a great book about OpenCV from O’Reilly called Learning OpenCV. It’s a great book, appropriate for teaching an undergraduate class about computer vision, and I recommend it highly!

ARToolkit

ARToolkit is the other library that is very widely used and adapted. It’s more specific in scope: the library can look through a digital camera and identify those markers I wrote about earlier — printed pages w/ black squares and with special icons.    Once it finds the familiar marker, it can calculate where it thinks the marker is in the real world — and use it to show you what the camera is seeing, but add in a 3D object to the visual frame.

ARToolkit in action (photo from ARToolkit)

There are lots of and lots of libraries porting the same basic concept to various languages, and you’ll see this type of application show up over and over again if you google for augmented reality applications. While this idea is very limited in some ways — you actually have to change the real world by posting up or creating these markers — what’s great about it is that the illusion is really good. The 3D objects are relatively steady and they appear in the right place. And the code is much simpler than OpenCV, which is a vast and powerful but complicated.

Next steps

Okay, we’re going to explore three major approaches.

  • What can we build using an existing proprietary framework from one of the leading companies in augmented reality? There are a wide number of choose from: Layar, Wikitude, Metaio. Which will support 3D objects? How well do they work? Can we use one of these platforms and still package our application as ‘PhillyHistory’ and make it easy and accessible for our users?
  • What can we build from custom code on the Android platform? What open source platforms exist, and what are their limitations? What are the strengths and weaknesses of the hardware on the Android phones?
  • Same as #2, but with the iPhone and iOS.

There are a number of interesting ideas that our initial research suggested that we’re going to have to put aside for now.

  • Using the ARToolkit model, it would be possible to build interesting indoor augmented reality applications if we placed markers inside a building and created a 3D model. How hard would it be to implement the ARToolkit functionality in OpenCV, and then use markers to orient our camera in space — and use the phone’s other hardware to figure out where we are when we are no longer looking at the marker? But it’s not practical to place up posters through Philadelphia to place our historic photos, so we’ll have to box this up for now.
  • I’d love to figure out some way to justify experimenting with the Kinect (the new RGB-D camera for the xbox 360). I want a futuristic Minority Report (gesturing in the air without any other controls) interface to organize and view historic photos, or even place them in 3D in a landscape. But I can’t really figure out how to justify this research, so I’ll have to just accept that I have a crush on the Kinect and put this aside.

So, next up we’ll try and see how far we can get with a commercial platform, like Layar. Check back for our results!

Both comments and trackbacks are currently closed.

1 Trackback

  1. [...] This post was mentioned on Twitter by Mark van 't Hooft, Azavea. Azavea said: New Azavea Labs article: PhillyHistory Augmented Reality: Developer Journal 1 http://bit.ly/fYJKmv [...]