Reflections on GEOINT 2017: From manual to automatic—leveraging crowdsourcing for machine learning training

In 2005, Jeff Howe coined a term to describe “outsourcing to a crowd” in an article in Wired magazine, titled “The Rise of Crowdsourcing.” At the time, the growing need for globally distributed workforces and hyper-connectivity enabled by the internet gave rise to solutions that could use this model. Now more than a decade later, crowdsourcing was a popular topic at the USGIF’s GEOINT 2017 Symposium. What began as a way to distribute work over large groups of people to gain efficiencies in scale, speed, accuracy and cost for mostly manual tasks is now also a means to enable machine learning. Crowdsourcing enables the large communities of users to sift through large volumes of data to identify the key pieces requiring analysis from experts, effectively acting as a force multiplier.

The volume of data created from devices and sensors, ranging from the Internet of Things (IoT) to satellite imagery, is growing exponentially. To gain meaningful insights from expansive datasets, automation—enabled by machine learning—is needed to quickly identify objects of interest as they are collected. This is where curated or labeled datasets generated from crowdsourcing come in. Such datasets serve as “ground truth” to train algorithms.

Government successfully applying commercial innovation

As crowdsourcing became more prevalent for scaled, distributed work, DigitalGlobe’s Tomnod applied these crowdsourcing techniques to identify small objects appearing in large areas of new satellite imagery. While Tomnod is heavily used for commercial and humanitarian purposes, we have adapted the underlying technology to benefit our government customers. In his keynote address at GEOINT 2017, NGA Director Robert Cardillo talked about the NSG Open Mapping Enclave (NOME), a crowdsourced mapping platform created in 2016 that allows NGA customers on multiple security domains to extract roads, buildings and points of interest from satellite imagery and share data in the cloud. DigitalGlobe developers and analysts have had the opportunity to support NOME:

At GEOINT Foreword, Dr. Peter Highnam, Director of NGA Research, talked about how artificial intelligence and machine learning are showing great promise in providing automation, which can reduce the “drudge” for analysts by helping sort through the large volume of data to find the important information and patterns. This will allow analysts to focus on applying their deep, tacit knowledge to the mission. Dr. Highnam acknowledged that there is “no free lunch” with AI and machine learning when you have to apply it to real applications beyond games. He referenced SpaceNet as an example of a project focused on addressing the limited amount of labeled overhead imagery training data. The NGA contributed data to SpaceNet.

DigitalGlobe, a leader in crowdsourcing, applies this expertise and data to enable machine learning

Since 2010, DigitalGlobe has been a leader in using crowdsourcing to rapidly label satellite imagery for geospatial intelligence uses. Today, DigitalGlobe engages public and private crowds for a variety of uses, such as rapidly identifying features in imagery for emergency workers to use while responding to crises around the globe, generating training data for machine learning, and promoting the development of machine learning algorithms.

The following are examples of how we have applied crowdsourcing:

  • Tomnod – A volunteer, public crowdsourcing community that gained popularity during the 2014 search for Malaysian Airlines Flight 370 and the aftermath of the 2015 Nepal Earthquake. This innovative startup applied crowdsourcing to high-resolution satellite imagery, which ultimately led it to becoming part of DigitalGlobe in 2010.
  • GeoHive – A paid, private crowdsourcing community created by DigitalGlobe where members use a web interface to digitally create and validate geospatial features in imagery, providing critical geospatial data on-demand and at scale. This community is currently capable of assessing over 100,000 km2 of high-resolution imagery in a single day.
  • SpaceNet – A collaboration by CosmiQ Works, DigitalGlobe and NVIDIA, which released an online repository of publicly accessible satellite imagery and co-registered data layers for training algorithms on AWS. Prize challenges are designed to engage developers and data scientists through gamification and financial rewards. These challenges have the broader goal of collaborative innovation in geospatial uses of machine learning. Check out CosmiQ Works and DigitalGlobe blog posts for more information.

Reflecting on GEOINT 2017, our team is excited by the growing interest in crowdsourcing as an enabler for machine learning. DigitalGlobe | Radiant is committed to applying proven techniques and emerging technologies along with our geospatial expertise to enable decision advantage that has mission impact for our customers.