Serious Brainpower Tackled SkyTruth Challenge at AWS re:Invent Hackathon for Good

SkyTruth’s goal to stop oil pollution at sea from bilge dumping is off to a strong start.

The call came two weeks in advance: SkyTruth was chosen to be one of four nonprofits featured at the AWS re:Invent Hackathon for Good held December 2, 2019 in Las Vegas, Nevada. What followed was a frenzy of activity in the SkyTruth offices. Assembling databases for the hackathon teams to work from. Generating FAQs and documentation. Developing materials to share the SkyTruth story. Crafting just the right pitch to lure the best and brightest from a roomful of 150 computer scientists and engineers to work on our challenge — namely, automating the detection of bilge dumping at sea by vessels violating international law and polluting the ocean. 

Finally, the big day arrived. Early in the morning, SkyTruthers Ry Covington, Jona Raphael, and John Amos staffed a table at Vegas’ MGM Grand, offering SkyTruth swag to entice hackers to our cause. 


But cool T-shirts and stickers are one thing, and a compelling challenge is another. Here’s SkyTruth President John Amos’ pitch to the crowd: Help us stop oil pollution at sea.



The competition was tough. Three other worthy nonprofits were vying for the same brilliant brainpower that we were. After a convincing presentation and a little Q & A, SkyTruth attracted seven separate teams with a total of 35 computer scientists and engineers to work on different components of our goal: an automated system that detects bilge dumping every day around the world, identifies the perpetrators, and alerts law enforcement and the public in near real-time.



Time to roll up the sleeves and work.



And work.



And work.  Eight straight hours on laptops, at flip charts, and in discussion. Lots of Red Bull to stay alert and free massages to stay limber after hours hunched over a keyboard. 



Finally, at 6 p.m. it was time to present the results to the judges.



And here’s just a sample of what our teams came up with.



But that’s not the end; it’s just the beginning. We’re still evaluating all of the new material our teams generated and we’re excited about the possibilities. And the week-long AWS re:Invent conference followed the Hackathon, with lots of opportunities to make valuable contacts.



Have a little fun.



And, perhaps most importantly, win an AWS Imagine Grant to support continued work to stop illegal bilge dumping at sea. Here’s Vice President of AWS-Worldwide Public Sector, Teresa Carlson, announcing the seven Imagine Grant winners – including SkyTruth.



With the valuable contacts we made at the AWS re:Invent Hackathon and conference, the volunteers who promised to continue helping us with this project, and support from the AWS Imagine Grant and others, SkyTruth will find a way to stop illegal oil pollution at sea. 


Photos by John Amos and Jona Raphael.

Systematic GPS Manipulation Occuring at Chinese Oil Terminals and Government Installations

Analysis reveals precise location and timing of GPS interference but purpose remains unclear.

Last month, an article in MIT Technology Review described strange GPS anomalies  in Shanghai. I began investigating, and have now found evidence of a novel form of GPS manipulation occuring at at least 20 sites on the Chinese coast during the past year. The majority of these sites are oil terminals, but government installations in Shanghai and Qingdao also show the same striking pattern of interference in GPS positioning. We don’t know the reason for this interference. It may simply be a general security or anti-surveillance system but it is also possible that it is intended to avoid scrutiny of imports of Iranian crude which have recently come under U.S. sanctions. Whatever the intention, we are able to demonstrate here, through analysis of vessel tracking data, that this GPS interference can be pinpointed very precisely in both time and location.

According to the MIT Technology Review article, this phenomenon was first documented by the U.S. flagged container ship Manukai when the vessel entered the port of Shanghai in July. The captain noticed that the vessel’s AIS (Automatic Identification System) appeared to malfunction — vessels on the navigation screen appeared and disappeared without explanation and appeared to move when they were in fact stationary. AIS, originally designed for collision avoidance, transmits vessels’ GPS locations, courses, and speed every few seconds via VHF (very high frequency) radio. These signals are not only picked up by nearby vessels and terrestrial antennas, but some private companies have also launched satellites able to receive these signals. For this analysis we were able to use data made available by two of these companies, Spire and Orbcomm, through our research partnership with Global Fishing Watch.

An investigation by non-profit C4ADS (Center for Advanced Defence Studies) showed that AIS vessel locations from hundreds of ships navigating Shanghai’s Huangpu river were coming up at false locations. Strangely, vessels on the river would have their GPS location jump to a ring of positions appearing on land. And this was not just affecting ships; looking at the cycling and running app STRAVA’s tracking map of cyclists, C4ADS also confirmed that this strange pattern of interference was affecting all GPS receivers.

To further investigate the GPS manipulation documented in Shanghai, I examined AIS position broadcasts from ships in the area. A distinct pattern emerged. Upon approaching the area of interference, a vessel’s broadcast position jumps from the vessel’s true location to a point on land where false AIS broadcasts occur in a ring approximately 200 meters in diameter. Many of the positions within the ring had speeds of precisely 31 knots or 21 knots (much faster than vessels would be moving near dock) and showed a course varying depending on the position within the ring. The GPS anomaly appears to affect vessels once they are a few kilometers out from the center of the ring. Once affected, vessels begin broadcasting seemingly random positions within the ring or from other high speed positions scattered around it.

Image 1. The Chinese cargo ship Huai Hia Ji 1 Hao (yellow) transits southeast on the Huangpu river. Upon nearing the center of GPS interference area the track jumps to the ring on land and to other random positions nearby. Positions from other affected vessels are shown in red. AIS data courtesy Global Fishing Watch / Orbcomm / Spire.

Image 2. GPS interference can be pinpointed based on this ring of false AIS positions. Approximately 200 meters in diameter, many of the positions in the ring had reported speeds near 31 knots (much faster than a normal vessel speed) and a course going counterclockwise around the circle. AIS data courtesy Global Fishing Watch / Orbcomm / Spire.

Because the ring of false AIS broadcasts follows this very specific pattern, I was able to query AIS tracking data to check if there are other locations where these rings are also occurring. The results are striking. This GPS manipulation is occuring not only in Shanghai but has occurred in at least 20 locations in six Chinese cities within the past year. The focus of these apparent GPS manipulation devices is clearly oil terminals (where 16 of the 20 detected locations were observed). But three prominent office buildings in Shanghai and Qingdao are also affected: the Industrial and Commercial Bank of China in Shanghai, the Qingdao tax administration office, and the Qingdao headquarters of the Qingjian industrial group.

Image 3. A ring of false AIS positions marks an apparent GPS interference device deployed in an office building identified as the Qingdao tax administration office. AIS data courtesy Global Fishing Watch / Orbcomm / Spire.

Image 4. Locations of detected GPS manipulation occuring in six Chinese cities in 2019. Interference following this pattern was not found beyond the Chinese coast.

It seems likely that the centers of these rings of false AIS positions actually mark the physical location of some sort of GPS disrupting device. A device having precisely this effect on GPS receivers, including shipborne AIS systems, has not been previously documented, though there have been other cases of GPS blocking and manipulation. Earlier this year C4ADS published a report with details on GPS manipulation clearly being carried out by the Russian government. These Russian systems appeared to have the effect of making all receiving devices within range show some particular location, such as a nearby airport, rather than the true location of the device. This was seen in one striking example of vessels approaching Putin’s alleged palace on the Black Sea coast.

This Chinese system is clearly being deployed both at central government offices and at the much more remote locations of oil terminals. In the case of the government office buildings it seems likely that these GPS disrupting devices were activated as a security measure. Some are only active for a few days, perhaps to coincide with the visit of an important official. However,  the AIS manipulation occuring at oil terminals particularly interests us at SkyTruth: One possible motive for deploying GPS manipulation devices at oil terminals could be recent U.S. sanctions on Chinese companies importing Iranian crude. And the intentional disruption of a navigation safety system, in close proximity to crude oil storage, is a serious concern.

Almost half of the specific locations where these presumed GPS disrupting devices have been deployed are at oil terminals near Dalian in northeast China. In an August analysis, The New York Times matched Planet satellite imagery from June and July with AIS tracking data to show Iranian tankers delivering oil to China in violation of U.S. sanctions. The Financial Times also documented Chinese flagged tankers importing Iranian crude after ship to ship transfers with Iranian tankers.

I took a closer look at exactly how this GPS disruption is affecting vessel tracking in one oil terminal east of Dalian. Here I identified four locations where GPS disrupting devices appear to have been deployed in 2019. I compared AIS vessel position data from March 1, 2019  and September 5, 2019. The differences were dramatic.

These two days showed similar numbers of AIS positions in the area. But on September 5 approximately two-thirds of the vessel positions at dock disappeared and appeared to be replaced by positions orbiting the GPS disrupting devices or scattered randomly in the region. At the same time, it does appear that some normal AIS broadcasts are coming through and that the GPS disruption does not entirely mask all vessel movements in the area.

Image 5. On March 1, 2019 AIS vessel position data around an oil terminal east of Dalian China shows accurate vessel positions and speeds. On that date, none of the four locations of GPS interference were active. Consequently no vessel positions appear on land and stationary vessels are accurately shown with near 0 speeds (green). AIS data courtesy Global Fishing Watch / Orbcomm / Spire.

Image 6. On September 5, 2019 two GPS interference locations were active and this had a dramatic effect on scrambling vessel positions in the area. Many positions now appear orbiting the presumed GPS interference devices and others appear scattered on land. On the water many positions are appearing with very high speeds (over 25 knots, red) and it’s not possible to distinguish true and false locations. However some slow speed positions (green) are appearing at dock where they would be expected, so some AIS broadcasts appear to be unaffected. AIS data courtesy Global Fishing Watch / Orbcomm / Spire.

Image 7. The distribution of AIS speeds in the area is significantly altered by the activation of the GPS interference devices. Above AIS speed distributions are compared between March 1 (left, no GPS interference) and September 5 (right, active GPS interference). On Sept 5 the total number of slow speed positions from docked vessels is greatly reduced and spikes now appear at 21 and 31 knots from positions orbiting the presumed GPS interference devices.

I also examined one individual vessel track to see how it was affected by GPS interference. This is the Chinese flagged tanker Jin Nui Zou which entered the Dalian oil terminal on September 5. Initially a normal track is seen as the vessel approaches the terminal from the southeast. With closer proximity to the presumed interference device, scrambled positions — often with very high speeds — start to appear. Eventually almost all of the vessel’s AIS positions appear in the ring orbiting the interference device.

Image 8. The tanker Jin Niu Zuo approaches an oil terminal east of Dalian on September 5. Initially, positions with normal transit speeds appear (yellow). With closer proximity, scattered high speed positions begin to emerge (red) and eventually most positions appear in the ring surrounding the presumed AIS interference device. AIS data courtesy Global Fishing Watch / Orbcomm / Spire.

The timing of GPS interference at different sites on the Chinese coast can be inferred based on the appearance of AIS positions on land with 21 and 31 knot speeds. Of the 20 locations identified, interference appears earliest at office buildings in Qingdao but only over a couple days (April 17 – 18, 2019). The first GPS interference at oil terminals appears in June and has continued until recently but timing varies by location. Activation of interference at different terminals is intermittent and may be in response to specific events. For instance at an oil terminal near Quanzhou GPS interference appears to have been activated only between September 25th and 27th, 2019.

At the Dalian oil terminals GPS interference appears to have begun in late June 2019. It is possible that this was a reaction to increased scrutiny of crude imports after the U.S. ended exemptions for purchase of Iranian oil on May 2nd. In fact, Dalian is the headquarters of two subsidiaries of Cosco shipping which were sanctioned on September 25 for importing Iranian crude. Based on what can be seen with vessel activity in Dalian, it is clear that GPS interference is not able to entirely mask vessels approaching the terminal. However, it likely would make it impossible to reliably link a vessel’s AIS track with satellite imagery of a vessel discharging crude at dock. While it is not at all clear that GPS interference was intended to obscure shipping activity, we do see that it had a significant impact on AIS tracking and that the interference was specifically concentrated at oil terminals.

In the November article first documenting the strange GPS anomaly in Shanghai, the question was posed whether this was the work of the Chinese state or some other actor like a mafia engaged in smuggling river sand. Based on the very specific characteristics of the GPS manipulation observed and its deployment at high level installations, it seems very likely that the Chinese state is responsible. It remains to be seen whether this is simply a security measure or if GPS manipulation is also being deployed specifically to prevent monitoring of oil imports.

Training Computers to See What We See

To analyze satellite data for environmental impacts, computers need to be trained to recognize objects.

The vast quantities of satellite image data available these days provide tremendous opportunities for identifying environmental impacts from space. But for mere humans, there’s simply too much — there are only so many hours in the day. So at SkyTruth, we’re teaching computers to analyze many of these images for us, a process called machine learning. The potential for advancing conservation with machine learning is tremendous. Once taught, computers potentially can detect features such as roads infiltrating protected areas, logging decimating wildlife habitat, mining operations growing beyond permit boundaries, and other landscape changes that reveal threats to biodiversity and human health. Interestingly, the techniques we use to train computers rely on the same techniques used by people to identify objects.

Common Strategies for Detecting Objects

When people look at a photograph, they find it quite easy to identify shapes, features, and objects based on a combination of previous experience and context clues in the image itself. When a computer program is asked to describe a picture, it relies on the same two strategies. In the image above, both humans and computers attempting to extract meaning and identify object boundaries would use similar visual cues:

  • Colors (the bedrock is red)
  • Shapes (the balloon is oval)
  • Lines (the concrete has seams)
  • Textures (the balloon is smooth)
  • Sizes (the feet are smaller than the balloon)
  • Locations (the ground is at the bottom)
  • Adjacency (the feet are attached to legs)
  • Gradients (the bedrock has shadows)

While none of the observations in parentheses capture universal truths, they are useful heuristics: if you have enough of them, you can have some confidence that you’ve interpreted a given image correctly.

Pixel Mask

If our objective is to make a computer program that can find the balloon in the picture above as well as a human can, then we first need to create a way to compare the performances of computers and humans. One solution is to task both a person and a computer to identify, or “segment,” all the pixels that are part of the balloon. If results from the computer agree with those from the person, then it is fair to say that the computer has found the balloon. The results  are captured in an image called a “mask,” in which every pixel is either black (not balloon) or white (balloon), like the following image.

However, unlike humans, most computers don’t wander around and collect experiences on their own. Computers require datasets of explicitly annotated examples, called “training data,” to learn to identify and distinguish specific objects within data. The black and white mask above is one such example. After seeing enough examples of an object, a computer will have embedded some knowledge about what differentiates balloons from their surroundings.

Well Pad Discovery

At SkyTruth, we are starting our machine learning process with oil and gas well pads. Well pads are the base of operations for most active oil and gas drilling sites in the United States, and we are identifying them as a way to quantify the impact of these extractive industries on the natural environment and neighboring communities. Well pads vary greatly in how they appear. Just take a look at how different these three are from each other.

Given this diversity, we need to provide the computer many examples, so that the machine learning model we are creating can distinguish between important features that characterize well pads (e.g. having an access road) and unimportant ones that are allowed to vary (e.g. the shape of the well pad, or the color of its surroundings). Our team generates masks (the black and white pixel labels) for these images by hand, and inputs them as “training data” into the computer. We provide both the image and its mask separately to the machine learning model, but for the rest of this post we will superimpose the mask in blue.

Finally, our machine learning model looks at each image (about 120 of them), learns a little bit from the mask provided with it, and then moves onto the next image. After looking at each picture once, it has already reached 92% accuracy. But we can then tell it to go back and look at each one again (about 30 times), and add a little more detail to its learning, until it reaches almost 98% accuracy.

After the model is trained, we can feed it raw satellite images and ask it to create a mask that identifies all the pixels belonging to any well pads in the picture. Here are some actual outputs from our trained machine learning model:

The top three images show well pads that were correctly identified, and fairly well masked — note the blue mask overlaying the well pads. The bottom three images do not contain well pads, and you can see that our model ignores forests, fields, and houses very well in the first two images, but is a little confused by parking lots — it has masked the parking lot in the third image in blue (incorrectly), as if it were a well pad. This is reasonable, as parking lots share many features with well pads — they are usually rectangular, gray, contain vehicles, and have an access road. This is not the end of the machine learning process; rather it is a first pass through that informs us of a need to capture more images of parking lots and further train the model that those are negative examples.

When working on image segmentation, there are a number of challenges that we need to mitigate. 

Biased Training Data

Predictions that the computer makes are based solely on training data, so it is possible for idiosyncrasies in the training data set to be encoded (unintentionally) as meaningful. For instance, imagine a model that detects a person’s happiness from a picture of their face. If it is only shown open-mouth smiles in the training data, then it is possible that when presented with real world images, it classifies closed-mouth smiles as unhappy.

This challenge often affects a model in unanticipated ways because those biases can be inherent in the data scientist. We try to mitigate this by making sure that our training dataset comes from the same set of images as those that we need to be automatically classified. Two examples of how biased data might creep into our work are: training a machine learning model on well pads in Pennsylvania and then asking it to identify pads from California (bias in the data source), or training a model on well pads centered in the picture, and then asking it to identify them when halfway out of the image (bias in the data preprocessing).

Garbage In, Garbage Out

The predictions that the computer makes can only be as good as the samples that we provide in the training data. For instance, if the person responsible for training accidentally includes the string of a balloon in half of the images created for the training dataset and excludes it in the other half, then the model will be confused about whether or not to mask the string in its predictions. We try to mitigate this by adhering to strict explicit guidelines about what constitutes the boundary of a well pad.

Measuring Success

In most other machine learning systems, it is useful to measure success as a product of two factors. First, was the guess right or wrong? And second, how confident was the guess? However, in image segmentation, that is not a great metric, because the model can be overwhelmed by an imbalance between the number of pixels in each class. For instance, imagine the task is to find a single ant on a large white wall. Out of 1000 white pixels, only 1 is gray. If your model makes a mask that searches long and hard and guesses that one pixel correctly, then it gets 100% accuracy. However, a much simpler model would say there is no ant, that every pixel is white wall, and get rewarded with 99.9% accuracy. This second model is practically unusable, but is very easy for a training algorithm to achieve.

We mitigate this issue by using a metric known as the F-beta score, which for our purposes avoids objects that are very small being ignored in favor of ones that are very large. If you’re hungry for a more technical explanation of this metric, check out the Wikipedia page.

Next Steps

In the coming weeks we will be creating an online environment in which our machine learning model can be installed and fed images with minimal human guidance. Our objective is to create two pipelines: the first allows training data to flow into the model, so it can learn. The second allows new images from satellites to flow into the model, so it can perform image segmentation and tell us the acreage dedicated to these well pads.

We’ll keep you posted as our machine learning technology progresses.

Update 2019-12-13:

In a major step forward, we set up Google SQL and Google Storage environments to house a larger database of training data, containing over 2000 uniquely generated polygons that cover multiple states in the Colorado River Basin. The GeoJSON is publicly available for download at These data were used as fodder for a deep learning neural network, which was trained in this iPython notebook. We reached DICE accuracies up to 86.3%. The trained models were then used to run inference on sites that were permitted for drilling to identify the extent of the well pads in this second iPython notebook.