Tag Archives: Humanities

PhillyHistory Augmented Reality Journal 2: Building Data Services

As I talked about in a previous journal, we’re exploring two different approaches to putting together an augmented reality application:  rolling our own client and using an existing framework and client. But regardless of what kind of client you’re using, the data (and images) have to come from somewhere, and that’s where data services come in to play. To support different client-side augmented reality viewers, we wanted to build an architecture that separated the data services from the client technology (the actual application that runs on a phone to provide the AR experience). This means that there could be a single source of digital asset information for any number of augmented reality clients.

Augmented Reality out my window (from our AR app!)

After reviewing the available technology and standards, we decided to build web services that conformed to the Layar standards, a mobile augmented reality platform developed in The Netherlands. Launched in 2009, Layar has quickly become ubiquitous as a platform for augmented reality applications. To implement an augmented reality layer in Layar, one publishes the “augmentations,” the points of interest that are visible in the augmented reality application, by creating a web service that client applications can query for information about what’s around them. A web service is simply a term for a standard method of allowing two computer programs to request and communicate data in a structured way. For example, a request to this “augmentation” database might request all of the points of interest within 200 meters of a given latitude and longitude. While the Layar webservice format has some limitations, it is relatively simple to implement both server-and client-side support for it. While not strictly what is often called a “RESTful” web service, which is a lightweight style of web service that in many ways is similar to loading a web page, the service can be implemented with a simple web application that can read in POST variables. As there is no independent standard for requesting and publishing augmented reality points of interest, the Layar service is as close as we could find. It had the additional advantage that we could directly test the result in the Layar clients available for both the Android and iPhone platforms.

Layar API Architecture (courtesy of Layar)

 

The API documentation for Layar’s REST services are on their documentation wiki.  Overall they have done a great job, but there are two big gotchas here.  First of all, they are in heavy development and their platform doesn’t yet seem fully stable and mature — we often felt like we were developing against a moving target, as what was supported overall and from device to device kept changing.  Secondly, their documentation wiki is somewhere between a really fantastic wiki and relatively chaotic and poorly organized documentation.  The details are spread out across various pages with various comments (some of which are totally critical) thrown in the mix.  We would have loved stable, versioned API documentation that was separate from the wiki and had all of the specifications in one place.   But my overall feeling is a bit like the Churchill quote: “Democracy is the worst form of Gov­ern­ment except for all those other forms that have been tried from time to time.”  While there are problems, it’s a relatively complete API and it’s a defacto standard for augmented reality data services.  That said, I’d love to see an open source standard for augmented reality data services  – and some open source clients that support it!

Winston Churchill at age 7 (thanks wikipedia.org)

While there are many advantages to this architecture, it is important to keep in mind that it means that all imagery is being transmitted to the mobile device while the user is using it. This creates a number of issues. If the user is has no or poor connectivity, the application will not be able to load photos – and even under good circumstances there will be a noticable delay, and there are restrictions to how many photos can be sent in a short amount of time over a network. An alternate approach would be to package the asset images with the application, and install those images along with the application. But while this approach might work well for a custom built application with 100 photos, the storage requirements would make this impractical for a large collection of 100k photos.

So … data munging (transforming from one format to another) projects  are almost always kind of sticky, but we hit some particular challenges in working with data for augmented reality. As discussed earlier, we used Google Street View as a tool to identify and select the desired angle in 3D space that we wished to place each photograph. However, Google Street View and Layar specify this angle differently. Both Layar and Google Street View represent a viewer facing north with a value of 0 degrees (in Layar this is the “angle” parameter and in Google Street View it is called “yaw”). However, in Google Street View rotations go clockwise (so 90 degrees is East and -90 degrees is West) whereas in Layar rotations go counter-clockwise (so 90 degrees is West and -90 degrees is East). However, the effort required to make these transformations was worthwhile. By January 2011, the PhillyHistory.org database management team had “pinned” more than 10,850 images to their Google Street View coordinates, providing a large subset of materials with which to test the 3D space options.

We chose to use to use a spatial database (PostgreSQL database with the PostGIS spatial extensions) to store our assets. A spatial database is designed to store and reason about objects in space – for example, it is possible to ask a spatial database to find the assets that are within a specific distance from the viewer.   Check out this newsletter article from Robert about PostGIS to learn more.   It is possible to add a stored procedure to a non-spatial database to make the same query (for example, the Layar wiki documentation provides such a function) but we found that with large numbers of points, the optimizations found in a spatial database were necessary for reasonable performance. The creation of a “spatial index” allows the database to limit its searches very quickly to likely candidates found within the database, instead of needing to search through all of the assets in the database.    That said, however, the overall performance of an AR application is limited by the network transmission time far more significantly than the backend server performance — but with 90k points it sure doesn’t hurt.

The ImageMagick wizard

Some image processing also needs to occur before images can be displayed on the small screen of a mobile device. This is extremely important because we found that clients (such as Layar’s client) will silently drop images that did not fit their specification. Because the client will not include the photo for a number of distinct reasons but there is no feedback for the developer explaining why the photo was not included, the process of diagnosing missing photos can be tricky. For Layar, the file size of all images must be smaller than 75 kB, and there are specific resolution limitations (e.g. full images in Layar must be less than 640×480). Given that mobile device screens have significantly smaller resolution than 640×480, that resolution is probably much higher than necessary. Additionally, some clients (like Layar) do not support making images transparent. It is therefore necessary to set the alpha channel of the photos in a pre-processing step. For example, using the open source ImageMagick package, the following command line invocation could perform the necessary scaling and transparency conversion on an incoming image stream (on a linux box): “cat input_image.jpg | convert – PNG32:- | convert -scale 240×180 -channel Alpha -evaluate Multiply 0.9 output.png”.

 

There are already a number of open source platforms for publishing data services compatable with the Layar API, most notably PorPOISe (PHP), django-layar (Python) from our open data crush Sunlight Labs and LayarDotNet (C#), with PorPOISe being the most fully featured of the platforms we reviewed. However, PorPOISE lacked some crucial 3D features at the time of our review. The beta release of an online service called Hoppala Augmentation does support 3D layers, but we were unable to get the 3D service to work and found the documentation and usability to be underdeveloped. It is certainly necessary to have a full understanding of the Layar protocol to use the Hoppala service (at this point) as the API allows developers to set a range of settings without explanation or checks on invalid or conflicting settings. Given these limitations and our desire to implement our own interactive capabilities and user settings, we decided to develop our own data web services in Python, which turned out to be a great choice for us because it let us prioritize, shape, and alter the results in a variety of ways.   Our next journal (from Erik) will feature some of the ways that we needed to change the 3D placement of photos and how we went about it.

PhillyHistory Augmented Reality: Developer Journal 1

Erik (staff profile) and I (staff profile) are working on an exciting project this month: our mission (having chosen to accept it) is to explore the current state of the art (and industry) of “augmented reality” in order to create prototype mobile phone applications that let you look through your cellphone’s camera and see historic photos blended into the landscape around you, in the place where the photographer first snapped the photo. We’re working with PhillyHistory’s historic archive of photos, which is incredibly rich — the Philadelphia Department of Records has made more than 90,000 photos available (and still growing). We’ll release a whitepaper describing our experiments and discoveries when we’re done, but we thought we’d share our thoughts and progress along the way in the form of a developer’s journal — a diary of our ongoing thoughts and progress.

Erik pretending to hold up a photo

Erik is pretending to hold up a photo, floating in space. Inside, unfortunately, the photos are hard to keep still. You can see my beard in the lower left!

Augmented reality is still more of a dream than a technology — we’ve all seen science fiction movies where futuristic displays can annotate what we see around us, or can create the illusion of virtual 3D objects in the space around us. But to date most augmented reality applications are fun and interesting experiments, but are still quite limited and have issues that keep them from being useful tools that we use day to day. But the promise is huge — imagine walking down a city street, looking at a building near you, and then being able to see the same building as it looked 80 years ago by peering at it through your handheld device. Or beyond that, image a device that would allow you to look around in the forest and virtual museum descriptions would appear that identified the genus and species and tell you interesting facts. Or imagine theater performances in which virtual ghosts performed Macbeth in your living room, prancing around if they were really there. Or imagine you’re in a country where you don’t speak the language, and — as you gaze through your phone’s camera — subtitles appear in the air in front of the people you meet.

Artist's depiction (thanks Carissa Brittain) of PhillyHistory Augmented Reality

But the current reality is far more humble. Most of the applications I’ve used were primarily novelty — little floating icons that are occasionally sort of in the right direction of a nearby restauraunt or clever computer vision art, like replacing corporate logos with the faces of their CEOs. But despite that, we want to try and push what we’ve seen by placing photos in 3D where they were originally taken. We began our research with a broad survey of the current tools (with a focus on open source tools that we could extend and tinker with) and the companies that currently provided augmented reality platforms that developers can build against.

As far as I can tell, there are two main categories of augmented reality applications and platforms at the moment: GPS based and Computer vision based. (These are my own categories, don’t try googling for them.)

GPS based

These applications use your phone’s GPS to determine where you are, and then use whatever other hardware your phone has (accelerometer, gyroscope, etc.) along with the GPS to guess at your current heading — which way you’re pointed (your heading or “yaw”), and how far up or down you are pointing your phone relative to the horizon (your “pitch”) and if the phone is twisted vertically (your “roll”).

Usually these applications take the form of little floating balls or symbols on the horizon in the direction you are looking at. While pretty cool, there are some significant limitations:

  • The location data you get from a GPS is not very precise and is often very wrong. This gets worse in the city (the signal can bounce off of buildings) and GPS barely works at all if you are indoors.
  • Your phone has pretty bad information about which way you’re pointing. The GPS can only guess at your heading if you are moving (e.g. in a car). Newer phones have compasses and accelerometers and some newer ones have gyroscopes, which can help. But the applications we’ve played with still do a pretty poor job. But we’ll see if we can do better.
  • The data is all very ‘jittery’ — like a compass that is jiggling roughly around the direction it should be pointing, the position data from your phone is constantly changing even when you’re standing still. This is no good for creating an illusion of a real, steady object.

Computer vision based

These applications use powerful computer vision libraries to help the computer identify what it is seeing through a digital camera. Often they require some preparation. If you print up a symbol that is simple for the computer to identify — e.g. a piece of paper with a black square and a unique symbol — the computer can identify the symbol, figure out how the camera must be oriented to the square by seeing how it looks, and then add a 3D object to the image. Other examples of augmented reality through computer vision don’t require preparation — for example, certain applications can identify a surface (like a desk surface) and can place 3D objects on that surface. But there are limitations here, as well:

  • Just looking around in the world doesn’t tell you where you are in the world.
  • While very powerful from one perspective, it’s still very limited in what you can easily do — like many computer technologies, it’s amazing but still not very smart.
  • It requires a lot of processing power, often more than what your cellphone can offer.

The two most popular and influential libraries for open source AR development are OpenCV and ARToolkit (and its derivatives).

OpenCV

OpenCV is an extremely powerful computer vision library that is also open source and widely used. Many thanks to David Zwarg (staff profile) for first turning me on OpenCV. It’s awesome. Intel originally built it as an initiative to push forward new applications that would need faster and faster computer processing, and my understanding is that much of the core work was done by Intel Russia. It also can use Intel’s proprietary IPP libraries. Now it has another corporate maintainer, but it’s also widely used. It can do *all sorts* of magic. For example, I’ve been playing with OpenCV (via the open source ROS library for Robitics) at home with my Kinect to process the Kinect’s RGB-D output (an image where the camera tells you the depth of each pixel) into a 3D model of the person it’s seeing. It’s also used in a wide range of applications, like the Stanley, the Stanford autonomous car that won the DARPA Grand Challenge race (with a $2 million dollar prize) for cars that could drive themselves across the desert. In fact, the professor who ran the vision team for that project is the co-author of a great book about OpenCV from O’Reilly called Learning OpenCV. It’s a great book, appropriate for teaching an undergraduate class about computer vision, and I recommend it highly!

ARToolkit

ARToolkit is the other library that is very widely used and adapted. It’s more specific in scope: the library can look through a digital camera and identify those markers I wrote about earlier — printed pages w/ black squares and with special icons.    Once it finds the familiar marker, it can calculate where it thinks the marker is in the real world — and use it to show you what the camera is seeing, but add in a 3D object to the visual frame.

ARToolkit in action (photo from ARToolkit)

There are lots of and lots of libraries porting the same basic concept to various languages, and you’ll see this type of application show up over and over again if you google for augmented reality applications. While this idea is very limited in some ways — you actually have to change the real world by posting up or creating these markers — what’s great about it is that the illusion is really good. The 3D objects are relatively steady and they appear in the right place. And the code is much simpler than OpenCV, which is a vast and powerful but complicated.

Next steps

Okay, we’re going to explore three major approaches.

  • What can we build using an existing proprietary framework from one of the leading companies in augmented reality? There are a wide number of choose from: Layar, Wikitude, Metaio. Which will support 3D objects? How well do they work? Can we use one of these platforms and still package our application as ‘PhillyHistory’ and make it easy and accessible for our users?
  • What can we build from custom code on the Android platform? What open source platforms exist, and what are their limitations? What are the strengths and weaknesses of the hardware on the Android phones?
  • Same as #2, but with the iPhone and iOS.

There are a number of interesting ideas that our initial research suggested that we’re going to have to put aside for now.

  • Using the ARToolkit model, it would be possible to build interesting indoor augmented reality applications if we placed markers inside a building and created a 3D model. How hard would it be to implement the ARToolkit functionality in OpenCV, and then use markers to orient our camera in space — and use the phone’s other hardware to figure out where we are when we are no longer looking at the marker? But it’s not practical to place up posters through Philadelphia to place our historic photos, so we’ll have to box this up for now.
  • I’d love to figure out some way to justify experimenting with the Kinect (the new RGB-D camera for the xbox 360). I want a futuristic Minority Report (gesturing in the air without any other controls) interface to organize and view historic photos, or even place them in 3D in a landscape. But I can’t really figure out how to justify this research, so I’ll have to just accept that I have a crush on the Kinect and put this aside.

So, next up we’ll try and see how far we can get with a commercial platform, like Layar. Check back for our results!