Tag Archives: PhillyHistory

PhillyHistory Augmented Reality Journal 3: 2D billboards in Layar

In the previous article Josh described how we built data services to support augmented reality applications. This article will be a more detailed discussion of our experiences placing 2D photos in 3D space using position and angle information with Layar. We do this in the PhillyHistory Augmented Reality application to let users browse photographs of historic Philadelphia. I will also provide some of the Python code we ended up needing to get things working.

When dealing with 3D objects, Layar supports two kinds of rotation: relative and absolute. Relative rotation is pretty straightforward: the image will always be rotated relative the viewer’s current viewing angle. You will always see the photo or model from the same angle no matter where you stand. This works pretty reliably but obviously breaks the illusion that the billboard is positioned in the world (unless you imagine a billboard rotating to face you no matter where you stand).

Absolute rotation is more interesting (and trickier to get right). Basically, the object or photo will face a certain direction in real space: you might need to move where you’re standing in order to see it clearly. This means you need to figure out what direction (e.g. North) an object is facing in addition to the latitude and longitude of its coordinates (you might also worry about altitude although we ended up not doing this).

We used absolute rotation when displaying 500 select photographs which had data for both the position (stored as latitude/longitude) and angle (stored as a Google Street View angle) and we used relative rotation when dealing with PhillyHistory’s full archive of 87,000 or so photographs (most of which obviously had no angle data).

Initial problems

We knew we wanted to use absolute rotation, but it took us awhile to get it working well enough to include in our layer. Some of the obstacles we encountered: having to remember high school trigonometry, a lack of high-level documentation on how Layar does 2D/3D rendering, and somewhat flaky or confusing error behavior.

One problem is that there doesn’t seem to be a standard way to represent angles out there. Your high school math class has 0 pointing East, and then the values proceeding counter-clockwise around the unit circle (with one rotation being 2π). Layar uses a similar counterclockwise scheme buts starts with 0 at North instead of East, and uses degrees (with one rotation being 360 degrees). Google Street View also starts with 0 at North and uses degrees, but goes clockwise instead of counterclockwise. Confusion!

Another confusing detail is what the rotation angle means. I had assumed that if a photograph’s angle is “South” then that means that the billboard faces South, and the viewer should look North to see it (and be on the South side of the photograph). In fact, Layar takes the opposite view. If a photograph’s angle is “South” it means that the viewer should look South to see the photograph (and be on the North side). We didn’t find any documentation about this so we had to learn it through trial and error.

Figure 2 illustrating viewing at 60° angle

Fig. 2: viewing at 60° angle

Once we got that worked out, we still noticed that a lot of our points weren’t rendering in the Camera view. They were showing up in the map view and list view but we weren’t seeing images, or icons, or anything. After a bunch of research we figured out that this had to do with their orientation–we were facing the wrong side of the image. If you imagine a billboard, we were seeing the scaffolding and back of it, not the advertisement. It took us awhile to realize that these 2D billboards were “invisible” when seen from behind.

Figure 5 illustrates viewing from behind and to the side

Fig. 5: viewing from behind and to the side

This behavior didn’t work very well for us. We want users to know when they are near a location with historic photographs available, even if the user is on the “wrong side” of the location. And we found it frustrating to have photos that failed to show up, or photos whose viewing angle was so sharp as to completely obscure the image. On the other hand, we liked the 3D effect of seeing photos angled when appropriate, so we didn’t want to just give up on absolute rotations.

Ultimately, we decided to “cheat” and transform the photographs when necessary. Given the general lack of GPS accuracy and the fact that our highest priority is making the photographs available it seemed like a good compromise.

Transformation in Python

What follows is a relatively in-depth description of the kind of processing we ended up needing in order to ensure images were visible when viewed with absolute rotation. Code very similar to this is included in the Layar API endpoint we built (using Python and PostgreSQL).

The first thing we have to do is compute the viewing angle to a point of interest. We can accomplish this by dusting off a little trigonometry. Imagine that viewer (vx, vy) and the point of interest (px, py) form a right triangle. In this case we want to compute the angle at the viewer point, which we can do with atan2 given the lengths of the opposite and adjacent sides (which end up being py – vy and px – vy, respectively). The resulting angle will be 0 when facing East (when px and py are the same), and proceed counter-clockwise (so π/2 is North, π/-π is South and -π/2 is West). We convert this into the form that Layar uses (where 0 is North, -90 is East, 180/-180 is South, and 90 is West).

Figure 6 illustrates how to calculate the viewing angle

Fig. 6: calculating the viewing angle

Here’s some Python code that accomplishes this. It’s worth noting that it transforms the angle from the trigonometric form (pi radians counterclockwise from East) to the Layar form (degrees counterclockwise from North).

import math
 
def get_angle(vx, vy, px, py):
    # find the angle in pi radians (-pi to pi)
    theta = math.atan2(py - vy, px - vx)
 
    # convert from pi radians to degrees (-180 to 180).
    degrees = (theta * 180.0) / math.pi
 
    # return the angle relative to the positive-Y axis
    return degrees - 90

We also use the standard Euclidian distance function to calculate how close points are to each other.

def get_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx ** 2 + dy ** 2)

When a photograph is visible at a particular angle, we need to determine how much difference there is between the user’s viewing angle and the ideal angle at which the photograph should be viewed. In this case, a difference of 0 would mean the user is viewing the photograph at the exact angle at which the photograph was taken, 180 would mean that the user is behind the photograph, and 90 would mean that the user is at a right angle to the photograph.

Since there is a point (180/-180) where angles wrap around, it’s important to make sure to handle this correctly. For instance, -160 and 179 are only 21 degrees apart. We can use modular arithmetic to normalize angles to 0-360. Here is an implementation:

def angle_diff(angle1, angle2):
    # calculate the difference between angle1 and angle2.
    # this value will range from 0-360.
    diff = abs(angle1 % 360 - angle2 % 360)
 
    # if the difference between angle1 and angle2 is more
    # than 180 degrees return the different between angle2
    # and angle1 (which will be less than 180 degrees).
    if diff > 180:
        return 360 - diff
    else:
        return diff

When an angle is too close to 90 degrees the image won’t be visible; in these cases we can soften the angle so the user can see the image a bit better. This function will nudge the start angle closer to the goal angle by a given number of degrees (amount):

def nudge_angle(start, goal, amount):
    # calculate the difference between start and goal.
    # this value will range from 0-360.
    diff = abs(start % 360 - goal % 360)
 
    # don't nudge further than we need to reach the goal.
    if diff < amount:
        amount = diff
 
    # figure out whether we need to subtract or add diff.
    # if start is greater than end then we subtract diff,
    # and otherwise we will be adding diff.
    subtract = start % 360 > goal % 360
 
    # if diff is greater than 180 we need to flip our
    # decision (going the other direction means the
    # difference will be less than 180).
    if diff > 180
        subtract = not subtract
 
    # add or subtract the amount and return the new angle.
    if subtract:
        return start - amount
    else:
        return start + amount

We can put this all together to implement our strategy for dealing with oblique angles and image flipping. The code could be made more terse but it’s easy to get the math wrong so we try to do things in a well-commented procedural way.

# points are represented as (x, y) tuples in web mercator.
# angles are given in degrees counterclockwise from North.
def calc_angle(self, viewer_pt, point_pt, img_angle):
    vx, vy = viewer_pt
    px, py = point_pt
 
    # get the angle the viewer faces when seeing the point.
    angle = get_angle(vx, vy, px, py)
 
    # get the difference between the previous viewing angle
    # and the direction the photograph should be seen from.
    diff = angle_diff(angle, img_angle)
 
    # if the view is behind the photo, flip it.
    if abs(diff) = 90 - wiggle:
        angle = nudge_angle(angle, img_angle, wiggle)
 
    # return two things: the viewing angle,
    # and if the photo was flipped or not.
    return angle, flipped

Hope this helps! It’s the sort of thing that would have saved us a lot of time if we’d had it!

Layar documentation can be found here.

PhillyHistory Augmented Reality Journal 2: Building Data Services

As I talked about in a previous journal, we’re exploring two different approaches to putting together an augmented reality application:  rolling our own client and using an existing framework and client. But regardless of what kind of client you’re using, the data (and images) have to come from somewhere, and that’s where data services come in to play. To support different client-side augmented reality viewers, we wanted to build an architecture that separated the data services from the client technology (the actual application that runs on a phone to provide the AR experience). This means that there could be a single source of digital asset information for any number of augmented reality clients.

Augmented Reality out my window (from our AR app!)

After reviewing the available technology and standards, we decided to build web services that conformed to the Layar standards, a mobile augmented reality platform developed in The Netherlands. Launched in 2009, Layar has quickly become ubiquitous as a platform for augmented reality applications. To implement an augmented reality layer in Layar, one publishes the “augmentations,” the points of interest that are visible in the augmented reality application, by creating a web service that client applications can query for information about what’s around them. A web service is simply a term for a standard method of allowing two computer programs to request and communicate data in a structured way. For example, a request to this “augmentation” database might request all of the points of interest within 200 meters of a given latitude and longitude. While the Layar webservice format has some limitations, it is relatively simple to implement both server-and client-side support for it. While not strictly what is often called a “RESTful” web service, which is a lightweight style of web service that in many ways is similar to loading a web page, the service can be implemented with a simple web application that can read in POST variables. As there is no independent standard for requesting and publishing augmented reality points of interest, the Layar service is as close as we could find. It had the additional advantage that we could directly test the result in the Layar clients available for both the Android and iPhone platforms.

Layar API Architecture (courtesy of Layar)

 

The API documentation for Layar’s REST services are on their documentation wiki.  Overall they have done a great job, but there are two big gotchas here.  First of all, they are in heavy development and their platform doesn’t yet seem fully stable and mature — we often felt like we were developing against a moving target, as what was supported overall and from device to device kept changing.  Secondly, their documentation wiki is somewhere between a really fantastic wiki and relatively chaotic and poorly organized documentation.  The details are spread out across various pages with various comments (some of which are totally critical) thrown in the mix.  We would have loved stable, versioned API documentation that was separate from the wiki and had all of the specifications in one place.   But my overall feeling is a bit like the Churchill quote: “Democracy is the worst form of Gov­ern­ment except for all those other forms that have been tried from time to time.”  While there are problems, it’s a relatively complete API and it’s a defacto standard for augmented reality data services.  That said, I’d love to see an open source standard for augmented reality data services  – and some open source clients that support it!

Winston Churchill at age 7 (thanks wikipedia.org)

While there are many advantages to this architecture, it is important to keep in mind that it means that all imagery is being transmitted to the mobile device while the user is using it. This creates a number of issues. If the user is has no or poor connectivity, the application will not be able to load photos – and even under good circumstances there will be a noticable delay, and there are restrictions to how many photos can be sent in a short amount of time over a network. An alternate approach would be to package the asset images with the application, and install those images along with the application. But while this approach might work well for a custom built application with 100 photos, the storage requirements would make this impractical for a large collection of 100k photos.

So … data munging (transforming from one format to another) projects  are almost always kind of sticky, but we hit some particular challenges in working with data for augmented reality. As discussed earlier, we used Google Street View as a tool to identify and select the desired angle in 3D space that we wished to place each photograph. However, Google Street View and Layar specify this angle differently. Both Layar and Google Street View represent a viewer facing north with a value of 0 degrees (in Layar this is the “angle” parameter and in Google Street View it is called “yaw”). However, in Google Street View rotations go clockwise (so 90 degrees is East and -90 degrees is West) whereas in Layar rotations go counter-clockwise (so 90 degrees is West and -90 degrees is East). However, the effort required to make these transformations was worthwhile. By January 2011, the PhillyHistory.org database management team had “pinned” more than 10,850 images to their Google Street View coordinates, providing a large subset of materials with which to test the 3D space options.

We chose to use to use a spatial database (PostgreSQL database with the PostGIS spatial extensions) to store our assets. A spatial database is designed to store and reason about objects in space – for example, it is possible to ask a spatial database to find the assets that are within a specific distance from the viewer.   Check out this newsletter article from Robert about PostGIS to learn more.   It is possible to add a stored procedure to a non-spatial database to make the same query (for example, the Layar wiki documentation provides such a function) but we found that with large numbers of points, the optimizations found in a spatial database were necessary for reasonable performance. The creation of a “spatial index” allows the database to limit its searches very quickly to likely candidates found within the database, instead of needing to search through all of the assets in the database.    That said, however, the overall performance of an AR application is limited by the network transmission time far more significantly than the backend server performance — but with 90k points it sure doesn’t hurt.

The ImageMagick wizard

Some image processing also needs to occur before images can be displayed on the small screen of a mobile device. This is extremely important because we found that clients (such as Layar’s client) will silently drop images that did not fit their specification. Because the client will not include the photo for a number of distinct reasons but there is no feedback for the developer explaining why the photo was not included, the process of diagnosing missing photos can be tricky. For Layar, the file size of all images must be smaller than 75 kB, and there are specific resolution limitations (e.g. full images in Layar must be less than 640×480). Given that mobile device screens have significantly smaller resolution than 640×480, that resolution is probably much higher than necessary. Additionally, some clients (like Layar) do not support making images transparent. It is therefore necessary to set the alpha channel of the photos in a pre-processing step. For example, using the open source ImageMagick package, the following command line invocation could perform the necessary scaling and transparency conversion on an incoming image stream (on a linux box): “cat input_image.jpg | convert – PNG32:- | convert -scale 240×180 -channel Alpha -evaluate Multiply 0.9 output.png”.

 

There are already a number of open source platforms for publishing data services compatable with the Layar API, most notably PorPOISe (PHP), django-layar (Python) from our open data crush Sunlight Labs and LayarDotNet (C#), with PorPOISe being the most fully featured of the platforms we reviewed. However, PorPOISE lacked some crucial 3D features at the time of our review. The beta release of an online service called Hoppala Augmentation does support 3D layers, but we were unable to get the 3D service to work and found the documentation and usability to be underdeveloped. It is certainly necessary to have a full understanding of the Layar protocol to use the Hoppala service (at this point) as the API allows developers to set a range of settings without explanation or checks on invalid or conflicting settings. Given these limitations and our desire to implement our own interactive capabilities and user settings, we decided to develop our own data web services in Python, which turned out to be a great choice for us because it let us prioritize, shape, and alter the results in a variety of ways.   Our next journal (from Erik) will feature some of the ways that we needed to change the 3D placement of photos and how we went about it.

PhillyHistory Augmented Reality: Developer Journal 1

Erik (staff profile) and I (staff profile) are working on an exciting project this month: our mission (having chosen to accept it) is to explore the current state of the art (and industry) of “augmented reality” in order to create prototype mobile phone applications that let you look through your cellphone’s camera and see historic photos blended into the landscape around you, in the place where the photographer first snapped the photo. We’re working with PhillyHistory’s historic archive of photos, which is incredibly rich — the Philadelphia Department of Records has made more than 90,000 photos available (and still growing). We’ll release a whitepaper describing our experiments and discoveries when we’re done, but we thought we’d share our thoughts and progress along the way in the form of a developer’s journal — a diary of our ongoing thoughts and progress.

Erik pretending to hold up a photo

Erik is pretending to hold up a photo, floating in space. Inside, unfortunately, the photos are hard to keep still. You can see my beard in the lower left!

Augmented reality is still more of a dream than a technology — we’ve all seen science fiction movies where futuristic displays can annotate what we see around us, or can create the illusion of virtual 3D objects in the space around us. But to date most augmented reality applications are fun and interesting experiments, but are still quite limited and have issues that keep them from being useful tools that we use day to day. But the promise is huge — imagine walking down a city street, looking at a building near you, and then being able to see the same building as it looked 80 years ago by peering at it through your handheld device. Or beyond that, image a device that would allow you to look around in the forest and virtual museum descriptions would appear that identified the genus and species and tell you interesting facts. Or imagine theater performances in which virtual ghosts performed Macbeth in your living room, prancing around if they were really there. Or imagine you’re in a country where you don’t speak the language, and — as you gaze through your phone’s camera — subtitles appear in the air in front of the people you meet.

Artist's depiction (thanks Carissa Brittain) of PhillyHistory Augmented Reality

But the current reality is far more humble. Most of the applications I’ve used were primarily novelty — little floating icons that are occasionally sort of in the right direction of a nearby restauraunt or clever computer vision art, like replacing corporate logos with the faces of their CEOs. But despite that, we want to try and push what we’ve seen by placing photos in 3D where they were originally taken. We began our research with a broad survey of the current tools (with a focus on open source tools that we could extend and tinker with) and the companies that currently provided augmented reality platforms that developers can build against.

As far as I can tell, there are two main categories of augmented reality applications and platforms at the moment: GPS based and Computer vision based. (These are my own categories, don’t try googling for them.)

GPS based

These applications use your phone’s GPS to determine where you are, and then use whatever other hardware your phone has (accelerometer, gyroscope, etc.) along with the GPS to guess at your current heading — which way you’re pointed (your heading or “yaw”), and how far up or down you are pointing your phone relative to the horizon (your “pitch”) and if the phone is twisted vertically (your “roll”).

Usually these applications take the form of little floating balls or symbols on the horizon in the direction you are looking at. While pretty cool, there are some significant limitations:

  • The location data you get from a GPS is not very precise and is often very wrong. This gets worse in the city (the signal can bounce off of buildings) and GPS barely works at all if you are indoors.
  • Your phone has pretty bad information about which way you’re pointing. The GPS can only guess at your heading if you are moving (e.g. in a car). Newer phones have compasses and accelerometers and some newer ones have gyroscopes, which can help. But the applications we’ve played with still do a pretty poor job. But we’ll see if we can do better.
  • The data is all very ‘jittery’ — like a compass that is jiggling roughly around the direction it should be pointing, the position data from your phone is constantly changing even when you’re standing still. This is no good for creating an illusion of a real, steady object.

Computer vision based

These applications use powerful computer vision libraries to help the computer identify what it is seeing through a digital camera. Often they require some preparation. If you print up a symbol that is simple for the computer to identify — e.g. a piece of paper with a black square and a unique symbol — the computer can identify the symbol, figure out how the camera must be oriented to the square by seeing how it looks, and then add a 3D object to the image. Other examples of augmented reality through computer vision don’t require preparation — for example, certain applications can identify a surface (like a desk surface) and can place 3D objects on that surface. But there are limitations here, as well:

  • Just looking around in the world doesn’t tell you where you are in the world.
  • While very powerful from one perspective, it’s still very limited in what you can easily do — like many computer technologies, it’s amazing but still not very smart.
  • It requires a lot of processing power, often more than what your cellphone can offer.

The two most popular and influential libraries for open source AR development are OpenCV and ARToolkit (and its derivatives).

OpenCV

OpenCV is an extremely powerful computer vision library that is also open source and widely used. Many thanks to David Zwarg (staff profile) for first turning me on OpenCV. It’s awesome. Intel originally built it as an initiative to push forward new applications that would need faster and faster computer processing, and my understanding is that much of the core work was done by Intel Russia. It also can use Intel’s proprietary IPP libraries. Now it has another corporate maintainer, but it’s also widely used. It can do *all sorts* of magic. For example, I’ve been playing with OpenCV (via the open source ROS library for Robitics) at home with my Kinect to process the Kinect’s RGB-D output (an image where the camera tells you the depth of each pixel) into a 3D model of the person it’s seeing. It’s also used in a wide range of applications, like the Stanley, the Stanford autonomous car that won the DARPA Grand Challenge race (with a $2 million dollar prize) for cars that could drive themselves across the desert. In fact, the professor who ran the vision team for that project is the co-author of a great book about OpenCV from O’Reilly called Learning OpenCV. It’s a great book, appropriate for teaching an undergraduate class about computer vision, and I recommend it highly!

ARToolkit

ARToolkit is the other library that is very widely used and adapted. It’s more specific in scope: the library can look through a digital camera and identify those markers I wrote about earlier — printed pages w/ black squares and with special icons.    Once it finds the familiar marker, it can calculate where it thinks the marker is in the real world — and use it to show you what the camera is seeing, but add in a 3D object to the visual frame.

ARToolkit in action (photo from ARToolkit)

There are lots of and lots of libraries porting the same basic concept to various languages, and you’ll see this type of application show up over and over again if you google for augmented reality applications. While this idea is very limited in some ways — you actually have to change the real world by posting up or creating these markers — what’s great about it is that the illusion is really good. The 3D objects are relatively steady and they appear in the right place. And the code is much simpler than OpenCV, which is a vast and powerful but complicated.

Next steps

Okay, we’re going to explore three major approaches.

  • What can we build using an existing proprietary framework from one of the leading companies in augmented reality? There are a wide number of choose from: Layar, Wikitude, Metaio. Which will support 3D objects? How well do they work? Can we use one of these platforms and still package our application as ‘PhillyHistory’ and make it easy and accessible for our users?
  • What can we build from custom code on the Android platform? What open source platforms exist, and what are their limitations? What are the strengths and weaknesses of the hardware on the Android phones?
  • Same as #2, but with the iPhone and iOS.

There are a number of interesting ideas that our initial research suggested that we’re going to have to put aside for now.

  • Using the ARToolkit model, it would be possible to build interesting indoor augmented reality applications if we placed markers inside a building and created a 3D model. How hard would it be to implement the ARToolkit functionality in OpenCV, and then use markers to orient our camera in space — and use the phone’s other hardware to figure out where we are when we are no longer looking at the marker? But it’s not practical to place up posters through Philadelphia to place our historic photos, so we’ll have to box this up for now.
  • I’d love to figure out some way to justify experimenting with the Kinect (the new RGB-D camera for the xbox 360). I want a futuristic Minority Report (gesturing in the air without any other controls) interface to organize and view historic photos, or even place them in 3D in a landscape. But I can’t really figure out how to justify this research, so I’ll have to just accept that I have a crush on the Kinect and put this aside.

So, next up we’ll try and see how far we can get with a commercial platform, like Layar. Check back for our results!