DigitalGlobe has been collecting imagery of Earth from space since our QuickBird satellite launched in 2001. As we’ve continued to launch new satellites to join our world-class constellation, our collection capacity and resolution have increased. In the past 17 years, we’ve collected more than 7 billion square kilometers of imagery. These are big, heavy images from a processing standpoint and add up to a lot of storage; an image from a satellite like WorldView-3 can be 30 GB. Our archive now consumes 100 petabytes of storage and increases by 10 PB per year.
One of the cool things about our work at DigitalGlobe is also one of our challenges: we never know when an image in our archive will be needed by a customer or partner. We do experience a relatively predictable pattern of access as an image becomes less current and is used less frequently over 90 days. But we never know on any given day when something interesting will happen on our changing planet that will trigger the retrieval of an image (or a series of images) of a particular location. In the example below, our QuickBird satellite imaged Fiery Cross Reef in the South China Sea in 2005. We may not have been able to predict then that the islands in the South China Sea would become a hotly contested political issue in 2017, but we’re able to look back in our archive and provide visible evidence of change.
Given a relatively predictable usage curve and the expense of maintaining a 100 PB imagery archive, we made a decision long ago to become excellent at tape management. Our main tape library has 12,000 tape bays feeding 60 LTO-5 tape drives, and it is constantly in motion. This tape-centric model has worked great for us for a decade. We can recall and deliver any image in our archive to a customer within four hours; last year, we did this 4 million times. But just as a car with lots of miles breaks down more often, our library is showing its age. Our tape heads have more than 10,000 miles of movement within an enclosed 60 foot space. Keeping the library calibrated is like keeping an old sports car in tune—it has to be done constantly.
During the last two years, our customers and partners have become increasingly public cloud-based, enabling DigitalGlobe to deliver directly to Amazon Web Services’ S3 storage service rather than via FTP or by shipping fire wire drives. Our most frequent use cases are increasingly focused on deriving information from large quantities of imagery. Today, more than 350 developers are building new applications and machine learning algorithms on our GBDX platform. In addition, our recently expanded Services business is adapting large-scale analysis and algorithms to mission applications. The more imagery we provide to the GBDX platform, the higher quality the results the GBDX algorithms can produce. We had to get off tape AND move our data to where our customers are working. Where? Where else: Amazon Web Services (AWS).
But how do you load 8,700 tapes into Amazon S3? Not by usual methods. Our network is consistently highly utilized already. A year ago, DigitalGlobe became a beta user of AWS Snowball—a great solution, but at 80 TB each, not suited to this scale. During our AWS Snowball experience, we developed a strong working relationship with AWS. When they approached us to be the inaugural user of the AWS Snowmobile, we climbed onboard. AWS Snowmobile is an exabyte-scale data transfer service used to move extremely large amounts of data to AWS. You can transfer up to 100PB per Snowmobile, a 45-foot long ruggedized shipping container, pulled by a semi-trailer truck.
DigitalGlobe and AWS have learned a lot from this experience. First of all, not every company has a loading dock in a convenient location. For a variety of reasons, we chose to park the AWS Snowmobile truck (basically a data center on wheels) at our corporate headquarters in Westminster, Colorado. We put down steel plates so it wouldn’t sink into the ground during our ever-changing Colorado weather, hooked it up with some of the largest power, cooling and network cables you’ve seen and powered it up. To our network, it looked like a big disk array sitting outside of our firewall. Because this was the first time an AWS Snowmobile was being operated at a customer site, there was a fair amount of performance-tuning and a lot of collaboration between the AWS and DigitalGlobe technical teams to ensure we were loading at an optimal rate and not impacting our production operations. (Helpful hint: If your rented chiller is sitting outside when you get a foot of snow, the chiller may determine that nothing needs cooling.)
This wasn’t simply a file transfer. While we are transitioning our operations to a service-oriented, object-based system, our legacy systems are heavily dependent on NFS file systems. To get our archive on to AWS Snowmobile, we had to bring every tape online, mount it, move the files (not just the images themselves, but all of the ancillary data that describes them), convert the files to Amazon S3 objects, and encrypt. During Q1 2017, we repeated that operation for each tape, carefully balancing the Snowmobile load while carrying on with our normal, heavy production volume on the aging and super-busy tape library I previously described. Ultimately, we transferred (and converted into S3 objects) 54 million files during our load in. We’re happy to announce we have turned the trailer over to AWS for validation, load out and ingest on to the cloud.
When data ingest is complete, every single image taken by any satellite in the history of DigitalGlobe will be online in AWS. This would have happened eventually without AWS Snowmobile, but may have literally taken years. With AWS Snowmobile, we drastically accelerated availability of the world’s largest commercial imagery archive for incorporation into large area mosaics to be consumed for analytic purposes and for anything else our product teams, partners and customers dream up. Our “daily take” of 80 to 100 TB of imagery now goes straight to S3—no more tape, and a big step closer to our goal of closing our commercial data centers. The next big step for my team will be mastery of data lifecycle management. This entails using tipping and cueing to move data between Amazon Glacier, SIA and S3 to provide rapid access to relevant imagery while optimizing the cost per petabyte of the growing archive as we continue to add not only imagery from our constellation, but also solution sets from GBDX and imagery collected by other constellations as well.
Not many companies are so customer-centric that they would dream up a solution like AWS Snowmobile to solve a big data migration problem. Even fewer would have the technical capability and audacity to pull it off. Amazon rocks. We’re looking forward to future collaborations like this one as DigitalGlobe continues our AWS migration.