5 Takeaways from Training a Data Labeling Team

Like many educators, I’m proud to be jaded to the unexpected question. I only needed to be asked “Are all French dogs boys?” once to learn that most valuable of lessons for teachers: “Assume nothing.” So, when I had the opportunity to expand my role at Azavea, in part to provide training for remote workers, I was confident that I’d heard just enough of it all to gauge how much (or how little) I knew when preparing documentation. Collaborating with a data labeling team of Cloud Workers in Kathmandu, however, showed me how many lessons I’ve yet to learn.

An illustration showing a teacher looking at three students. A thought bubble containing a red question mark is above the teacher's head. — “Gallic canines are well-known for asexual reproduction.”
License: Creative Commons, Uploaded by: Wikivisual

Azavea and machine learning

Some here have been involved in Artificial Intelligence (AI) since a degree in it was considered as useful as a degree in basket-weaving. As a company, however, Azavea first began exploring deep learning after receiving a Small Business Innovation Research (SBIR) grant in 2016. Our work using deep learning for semantic segmentation of aerial imagery led to the development of Raster Vision, an open-source framework for deep learning projects.

Our Research and Development (R&D) team continued development of Raster Vision the following year, experimenting with multi-label image classification for the Understanding the Amazon from Space Kaggle competition. In 2018, Azavea collaborated with the Inter-American Development Bank (IDB) on a machine learning pilot. Using OpenStreetMap (OSM) building footprints as labels, we built a model to predict the location of buildings in South American cities using Raster Vision. Excited by the work we’ve done with and for our partners as well as the potential applications of neural networks, the logical next step for R&D was investing further in machine learning projects.

A photograph of the Philadelphia offices of Azavea at 990 Spring Garden Street. — Azavea’s Philadelphia office.

In early 2019, we partnered with CloudFactory to provide data labeling for machine learning projects. I provided support to the R&D team through the process. One job was creating instructional materials to train Cloud Workers. The task was two-fold: train them in the use of our in-house data labeling tool as well as in the parameters of particular use cases. With no public models available, I created materials using my best judgment. Some of the calls I made were correct, while others proved less accurate. As the materials I developed evolved, I learned quite a bit. Here are five key takeaways.

Show, don’t tell

I wrote the first piece of documentation for our team with the help of a colleague. While thorough, the result, a Google Doc, was…verbose. My initial answer to the question “How do I explain fine detail when I’m not in the same room as the people I’m working with?” was: words, lots and lots of words. The problem? The result was visually overwhelming and didn’t succeed in communicating what I had hoped.

Screen capture of document with a large amount of text. — A picture is worth a thousand words.

The next iterations of the training document involved adding more and bigger screenshots. The screenshots made it more legible and provided a second method for learners to access the information it contained. While it’s always important to consider different learning types, images are particularly important when you and your team members have different first languages. In my experience, a clear, properly contextualized image communicated more and more precisely than an entire paragraph of text.

Screen capture of a document with text and screenshots. — Screenshots are the truth.

The real breakthrough, however, was upgrading from screenshots to screen captures. Our Nepalese team leads suggested using Loom. The use of video necessitated a switch to Google Slides, which made the material easier to digest. Our Cloud Workers supported this switch, reporting, “It’s clear and understandable as a form of slide rather than a doc…The walk through videos were also amazing.”

Screen captures also made it easier to give feedback. Reviewers captured video as they corrected errors, which allowed them to spread information quickly and clearly. Likewise, labelers captured confusing instances and flagged them for my review. Our Cloud Workers proactively provided us with visual documentation, which deepened our understanding of our data and their needs.

A short video showing a slide with an embedded screen capture of a data labeling tool. — Screen captures are the truth 24 times a second.

Speak often and honestly

Open lines of communication are fundamental in creating quality data for your machine learning project and providing appropriate support for your team. Timely and thorough conversation prevents obstacles in your workflow and provides opportunities for retraining and relearning.

Workflows developed by CloudFactory on this project proved indispensable. Their staff worked with us to schedule an appropriate number of check-ins with our Team Leads, and their messaging platform allowed us to chat easily with our team in Kathmandu. In addition, our Cloud Workers took the lead in developing and sharing other communication tools with us. Shared spreadsheets made it easy to track questions and detailed notes on daily work enabled asynchronous conversation and alerted us to any issues.

Screenshot of a spreadsheet with questions and information about a data labeling project. — Cloud Workers provided us with tools that made conversation easy.

You should also leverage your team’s knowledge by making sure they feel comfortable offering feedback. Whether they are requesting additional documentation, noticing a bug, or experiencing a more mundane problem, they should feel free to come to you. When there are nearly ten hours and 8,000 miles between you and your colleagues, even the need to move a meeting can be the difference between dinner with your family or dinner at the office.

Animated gif of a white rabbit falling asleep while sitting in a miniature "office." — That late night meeting is dangerous!
Source: Giphy.com

When our team leads noticed conversation in weekly meetings was slowing down, they suggested we switch to a bi-weekly schedule. Worthwhile conversations were happening, but at nearly 10 p.m. for our team members in Nepal. Azavea believes that work-life balance improves “productivity, creativity, and happiness at work,” and aims to support employees holistically. When our Nepalese team members let us know that the timing and structure were not the best for them, we changed course.

Embrace the edge (cases)

No matter how well defined your classes are, you and your data labeling team will consistently encounter objects that manage to test their boundaries. While it may feel disheartening to have your carefully crafted tool riddled with questions almost as soon as you share it, those edge cases are critical to help clarify your project’s parameters and refine its class definitions.

In one use-case, we asked Cloud Workers to identify, label, and classify crosswalks in images of New York. It was quickly clear that far more thought was needed for the definition of “crosswalk.” Questions we hadn’t considered included:

Does a crosswalk have to be striped?
Does a crosswalk have to be a certain color?
Is a path in a parking lot a crosswalk?
What about a path on the grounds of a school? A bike path?
Are the “islands” connecting crosswalks on wider streets considered a part of the crosswalk?
If, as in the above case, two or more portions are significantly offset from each other do they count as one crosswalk or two?

A screenshot of a data labeling tool working with an object detection machine learning project. — Is this a crosswalk?

Edge cases force you to evaluate and reevaluate your priorities and goals. Again and again you return to the question, “What is important?” If well-documented, they can also improve your training materials. Cloud Workers on our team appreciated the additions to the documentation and they also proved valuable when we needed to onboard new workers.

Hire experts

Alegion, a data labeling platform provider, recently conducted a survey that revealed that 96% of companies engaged in machine learning run into problems due to low quality labels. Without trained and accurate data analysts to annotate your imagery, implementation of your deep learning project can stall. The proven benefits of a managed workforce led us to choose CloudFactory, and our choice was proven wise.

A photo of 8 CloudWorkers taken in July 2019. — The team of Cloud Workers assigned to Azavea in July 2019.
Photo by CloudFactory.

One of the most surprising ways our partnership with CloudFactory proved valuable was in their assistance with improving our in-house labeling tool. As the first outside users, they’ve helped us shape a more intuitive and user-friendly tool. Simple changes such as the ability to hide an annotation made it easier to create accurate labels.

Our team leads also advocated for a “dashboard” that would allow them to track productivity. The dashboard also features an insightful “Collaborators” section that tracks key metrics such as label speed. CloudFactory’s expertise has so enhanced our tool, we may decide to repackage it for public use at some later date.

An animated screen capture of the dashboard in Azavea's in-house data-labeling tool. — Cloud Workers advocated for a project dashboard.

Challenge your cultural assumptions

As machine learning becomes a more prominent segment of the AI field, many are working to ensure that the ethical implications of such work are interrogated, and that the well-being of our planet and fellow humans are considerations in the process from beginning to end. A plus of working with CloudFactory is their engagement in this area — including their commitment to impact sourcing and acknowledging the worker as a whole.

What’s data labeling got to do with it?

While training documents might seem an odd place for cultural exchange, it’s surprising how many cultural assumptions are implicit in even the most granular of documents. Consider the truck. One challenge was teaching our Cloud Workers to classify vehicles in three categories, including “Passenger Vehicle” and “Truck.” While the difference between a pick-up truck and, you know, a “truck truck”, was clear to those of us in the room when we selected those terms, it most certainly was not to those who we were asking to label our imagery. And, why should it be? Even a quick glimpse at the Wikipedia entry for truck reveals how much the term can vary by culture.

A screenshot of the introduction to the Wikipedia entry for the term "Truck." — Definitely a “*truck* truck”.

The cultural gap need not be as great as that between the U.S. and Nepal to cause issues, either. While it might seem strange for a former Southern Californian, I’m not a driver. In fact, I failed the only driving test I’ve ever taken within three blocks (I still maintain that I was tricked!). In any case, this was not something I expected to come up in my machine learning work until I needed to distinguish crosswalks from say, speed bumps or gore points. I’ll be honest, I didn’t even know what a gore point was. Understanding your own cultural viewpoints is an important step in creating useful documentation that both respects your teammates and ensures you achieve the results you desire.

A screenshot of an aerial image of a faded speed bump in a residential New York neighborhood. — This is a faded speed bump, so I’m told.

Conclusion

In creating documentation to train computer vision workers, I’m sure that I learned as much as I taught. Working carefully and thoughtfully — with judicious revision — is vital in ensuring that you are feeding accurate labels into your machine learning model. Most importantly, you need to trust and respect your labeling team, and your training materials should reflect that. Whether you’re sizing imagery, determining the best mode of presentation, or deciding what exactly is a crosswalk, a user-focus and a collaborative spirit are two of your best tools.

Is This a Crosswalk?: 5 Takeaways from Training a Data Labeling Team