Using machine learning to map the footprint of fracking in central Appalachia

Fossil fuel production has left a lasting imprint on the landscapes and communities of central and northern Appalachia.  Mountaintop mining operations, pipeline right-of-ways, oil and gas well pads, and hydraulic fracturing wastewater retention ponds dot the landscapes of West Virginia and Pennsylvania.  And although advocacy groups have made progress pressuring regulated industries and state agencies for greater transparency, many communities in central and northern Appalachia are unaware of, or unclear about, the extent of human health risks that they face from exposure to these facilities.  

A key challenge is the discrepancy that often exists between what is on paper and what is on the landscape.  It takes time, money, and staff (three rarities for state agencies always under pressure to do more with less) to map energy infrastructure, and to keep those records updated and accessible for the public.  But with advancements in deep learning, and with the increasing amount of satellite imagery available from governments and commercial providers, it might be possible to track the expansion of energy infrastructure—as well as the public health risks that accompany it—in near real-time.

Figure 1.  Oil and gas well pad locations, 2005 – 2015.

Mapping the footprint of oil and gas drilling, especially unconventional drilling or “fracking,” is a critical piece of SkyTruth’s work.  Since 2013, we’ve conducted collaborative image analysis projects called “FrackFinder” to fill the gaps in publicly available information about the location of fracking operations in the Marcellus and Utica Shale.  In the past, we relied on several hundred volunteers to identify and map oil and gas well pads throughout Ohio, Pennsylvania, and West Virginia.  But we’ve been working on a new approach: automating the detection of oil and gas well pads with machine learning.  Rather than train several hundred volunteers to identify well pads in satellite imagery, we developed a machine learning model that could be deployed across thousands of computers simultaneously.  Machine learning is at the heart of today’s companies. It’s the technology that enables Netflix to recommend new shows that you might like, or that allows digital assistants like Google, Siri, or Alexa to understand requests like, “Hey Google, text Mom I’ll be there in 20 minutes.”

Examples are at the core of machine learning.  Rather than try to “hard code” all of the characteristics that define a modern well pad (they are generally square, generally gravel, and generally littered with industrial equipment), we teach computers what they look like by using examples.  Lots of examples. Like, thousands or even millions of them, if we can find them. It’s just like with humans: the more examples of something that you see, the easier it is to recognize that thing later. So, where did we get a few thousand images of well pads in Pennsylvania?  

We started with SkyTruth’s Pennsylvania oil and gas well pad dataset. The dataset contains well pad locations identified in National Agriculture Imagery Program (NAIP) aerial imagery from 2005, 2008, 2010, 2013, and 2015 (Figure 1).  We uploaded this dataset to Google Earth Engine, and used it to create a collection of 10,000 aerial images in two classes: “well pad” and “non-well pad.” We created the training images by buffering each well pad by 100 meters, clipping the NAIP imagery to the bounding box, and exporting each image.

The images above show three training examples from our “well pad” class. The images below show three training examples taken from our “non-well pad” class.

We divided the dataset into three subsets: a training set with 4,000 images of each class, a validation set with 500 images of each class, and a test set with 500 images of each class.  We combined this work in Google Earth Engine with Google’s powerful TensorFlow deep learning library.  We used our 8,000 training images (4,000 from each class, remember) and TensorFlow’s high-level Keras API to train our machine learning model.  So what, exactly, does that mean? Well, basically, it means that we showed the model thousands and thousands of examples of what well pads are (i.e., images from our “well pad” class) and what well pads aren’t (i.e., images from our “non-well pad” class).  We trained the model for twenty epochs, meaning that we showed the model the entire training set (8,000 images, remember) twenty times.  So, basically, the model saw 160,000 examples, and over time, it “learned” what well pads look like.

Our best model run returned an accuracy of 84%, precision and recall measures of 87% and 81%, respectively, and a false positive rate and false negative rate of 0.116 and 0.193, respectively.  We’ve been pleased with our initial model runs, but there is plenty of room for improvement. We started with the VGG16 model architecture that comes prepackaged with Keras (Simonyan and Zisserman 2014, Chollet 2018).  The VGG16 model architecture is no longer state-of-the-art, but it is easy to understand, and it was a great place to begin.  

After training, we ran the model on a few NAIP images to compare its performance against well pads collected by SkyTruth volunteers for our 2015 Pennsylvania FrackFinder project.  Figures 4 and 6 depict the model’s performance on two NAIP images near Williamsport, PA. White bounding boxes indicate landscape features that the model predicted to be well pads.  Figures 5 and 7 depict those same images with well pads (shown in red) delineated by SkyTruth volunteers.

Figure 4.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 5.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.
Figure 6.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 7.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.

One of the first things that stood out to us was that our model is overly sensitive to strong linear features.  In nearly every training example, there is a clearly-defined access road that connects to the well pad. As a result, the model regularly classified large patches of cleared land or isolated developments (e.g., warehouses) at the end of a linear feature as a well pad.  Another major weakness is that our model is also overly sensitive to active well pads.  Active well pads tend to be large, gravel squares with clearly defined edges. Although these well pads may be the biggest concern, there are many “reclaimed” and abandoned well pads that lack such clearly defined edges.  Regrettably, our model is overfit to highly-visible active wells pads, and it performs poorly on lower-visibility drilling sites that have lost their square shape or that have been revegetated by grasses.

Nevertheless, we think this is a good start.  Despite a number of false detections, our model was able to detect all of the well pads previously identified by volunteers in images 5 and 7 above.  In several instances, false detections consisted of energy infrastructure that, although not active well pads, remain of high interest to environmental and public health advocates as well as state regulators: abandoned well pads, wastewater impoundments, and recent land clearings.  NAIP imagery is only collected every two or three years, depending on funding. So, tracking the expansion of oil and gas drilling activities in near real-time will require access to a high resolution, near real-time imagery stream (like Planet, for instance).  For now, we’re experimenting with more current model architectures and with reconfiguring the model for semantic segmentation — extracting polygons that delineate the boundaries of well pads which can be analyzed in mapping software by researchers and our partners working on the ground.

Keep checking back for updates.  We’ll be posting the training data that we created, along with our initial models, as soon as we can.

Monitoring the tailings dam failure of the Córrego do Feijão mine

On Friday, January 25th, the tailings dam to the Córrego do Feijão mine burst near Brumadinho, State of Minas Gerais, Brazil (the moment of failure was captured on video). Operated by Brazilian mining company Vale S.A., this incident recalls the collapse of Vale’s Samarco Mine in 2015 which unleashed 62 million cubic meters of toxic sludge downstream. As of Monday, the death toll reached 120, however, the full extent of damage is unknown. To monitor the impact, here is a Sentinel-2 scene of Córrego do Feijão from eighteen days before and seven days after the dam’s failure. As of February 2nd, approximately 2.85 km2 of sludge surrounds the region.

Sentinel 2 scene showing the extent of flooding as a result of the tailings dam failure. As a result of the failure, 3 billion gallons of mining waste were spilled.

This slider, below, shows the area near the town of Brumadinho before and after the dam failure with the inundation highlighted in yellow, it can be accessed here.

Taylor Energy Oil Spill: This Is How Change Happens

Recently a front-page article ran in The Washington Post, describing the ongoing, 14-year-long leak of crude oil from hurricane-damaged wells at the former location of an oil platform in the Gulf of Mexico, operated by a company called Taylor Energy.  The article stated that — based on the latest scientific estimates of the leak rate — the Taylor spill was about to surpass BP’s disastrous 2010 blowout in the Gulf, becoming the world’s worst oil spill.  News outlets around the world pounced on this headline, shining a global spotlight on this egregious chronic leak. Within weeks the US Coast Guard announced they had finally ordered Taylor Energy to fix the leak or face a daily $40,000 fine.  The team at SkyTruth was thrilled when we heard the news: when Taylor finally fixes the leak, this will be a great result for the environment in the Gulf and will send a strong message to the offshore oil industry that we won’t let them walk away from their messes.  And, this is the vindication of eight years of persistent, dogged work by SkyTruth and our partners.

Taylor Energy - Washington Post

Source: The Washington Post, October 21, 2018

How did we achieve this significant victory for the environment and the people of the Gulf Coast?  We….  

  • Built partnerships.  We teamed up with Southwings and Waterkeeper Alliance to form the Gulf Monitoring Consortium.  Gulf-area citizens groups, notably the Louisiana Environmental Action Network, Louisiana Bucket Brigade, and Gulf Restoration Network soon joined, giving us the ability to monitor, investigate, and systematically document the Taylor spill from space, from small aircraft, and on the water.  Alerted by our work, researchers from Florida State University conducted their own independent sampling and measurements, bringing a higher level of scientific expertise to the growing public scrutiny of this continuous pollution event.  
  • Worked with journalists to help them understand the significance of this unchecked spill.  Our methodical, transparent, and conservative analysis helped us build a reputation as being a trustworthy source of credible information.  We developed long-running relationships with journalists, particularly Mike Kunzelman at The Associated Press.  Reporters reached out for our comments and expert insights whenever new information or developments in the Taylor saga came to light.  These relationships resulted in dozens of articles in major media markets over the years, helping to maintain public attention and interest, and a steady drumbeat of public criticism.

And finally, an hour-long interview with Washington Post reporter Darryl Fears resulted in an article that triggered Coast Guard action.  Now, of course, we will continue to monitor the Taylor Energy leak to ensure that effective action is taken.  And we’ll let the world know what we see.

This is what it takes, to make positive change happen for the environment.  We’d like to thank the foundations and individuals who have donated to SkyTruth, making it possible for us to dedicate the time and resources to sustaining this watchdog effort over so many years.  We couldn’t have done it without you.

Please help us keep it going.  Donate to SkyTruth today!

Big Changes Coming to SkyTruth Alerts

For the last year or so we’ve been working to revamp SkyTruth Alerts, an app we built for ourselves in 2011, and then opened up to the public a year later. The Alerts lets you see environmental incidents and notifications on a map as they are reported, and allows subscribers to sign up to receive email notifications about reported environmental incidents in areas they care about (aka “Areas of Interest” or AOIs). Technology has made a few leaps since then, so it was time for an overhaul. We’ve added some new features too. We’re excited about the changes, and we hope you will be too.  

What’s Changed?

In a lot of ways, the new SkyTruth Alerts app works the way it always has: anyone can view the map and see the latest reported alerts for a particular area. These notifications come from federal and state websites that we have “scraped” to obtain the reports. The largest source of data is the nationwide oil and hazardous materials spill reports collected by the National Response Center (i.e., NRC Reports). Anyone can sign up to receive email notifications about incidents in their AOIs.

New and Restored Sources of Alerts

As part of the Alerts revamp, we’ve restored and/or added the following sources of alerts:

  • West Virginia oil & gas drilling permits –  restored
  • Colorado oil & gas drilling permits – new
  • Florida Pollution Reports – new

Account Management

We’ve also added account management so you can update your AOIs or change your email address more easily. Signing up for an account will also let you take advantage of new tools we’ll be rolling out in the coming months (we’ll keep you posted after the launch).

New Look!

This is what SkyTruth Alerts looks like right now:

This is what you’ll see when you first open the revamped SkyTruth Alerts (subject to some possible changes over the next few weeks, as we respond to feedback and suggestions from our alpha testers):

New Features!

Here are a few of the new features we’ve added:

New Ways to Create AOIs

Right now, SkyTruth Alerts lets you create AOIs that are a square or rectangle, but that’s not always the ideal shape. In the new version, we’ve added some tools that give you a little more control: you can draw a polygon, take a “snapshot” your current map view, or select a state or county boundary from a list of pre-defined AOIs):

After you create an AOI, you can edit your AOI (or delete it and start again) before giving it a name and saving it to your My Areas list.

Filter Out the Noise

We’ve added several ways to filter what you see in Alerts, so that you can focus on what’s important to you. SkyTruth Alerts shows you the 100 most-recent incidents in your map view (double what’s shown in the current version), so filtering has the added benefit of showing you more of the types of alerts you want to see. You can filter alerts by:

Date Range

Type of Alert

Base Layers

Select from a couple of different map backdrops (“base layers”) so you can focus on what’s important to you. Below is a screenshot of SkyTruth Alerts using the “Minimal” base layer.

Alerts Within AOI Only

This can be useful if there are a lot of alerts in the surrounding area that are more recent than the alerts within your AOI.  

In the image below, there are only a few alerts are shown in the West Virginia AOI:

However, once alerts are limited to within the AOI, the picture changes:

New Ways to View Alerts Markers

In some places, there are many, many alerts in the same location. This can make it hard to move around the map because you end up clicking or tapping an alert marker when you wanted move around in the map view. It can also be hard to “grab” a particular alert marker from a stack of them.

In the first screenshot, clustering is turned on, making it easier to move around the map. Click on a cluster marker to zoom in on that area and see more alerts.

Once you’ve zeroed in on your area of interest, it can still be hard to see the forest for the trees in some locations. In the next image, clustering is turned off. You can see that there are a lot of alerts in this area, but how many exactly?

The image below shows the same area as the one above. If you click on one of the alert markers, it will “explode,” showing you all of the alerts in that location. Click on any of the exploded markers to view the report.

So when’s the Launch?

We’re getting ready to begin alpha testing in in mid-November. Lots of our current subscribers have volunteered to be testers and will be helping us put the finishing touches on the app. A big thank you to all of you who are helping us with that! Testing will last four weeks, and during that time we’ll be making continuous updates based on feedback we receive. We expect to go live with the new version by the end of the year.

SkyTruth Founder John Amos to Speak in Shepherdstown

Did you see the recent front page article in the Washington Post that featured SkyTruth’s work tracking a 14-year oil spill in the Gulf of Mexico? Why is Shepherdstown’s own SkyTruth featured so often in the international press? Come find out!

Founder John Amos and key staff members Christian Thomas and Ry Covington will talk about our work tracking pollution, mapping flaring and fracking, revealing the true scope of devastation from mountaintop mining and illuminating commercial overfishing. There’ll be a Q&A session followed by some light refreshments. For those of you who know us already, please join us with a friend you think might be interested to learn more about this West Virginia-based nonprofit that has global impacts.

SkyTruth: sharing the view from space to inspire people to protect the environment. If you can see it, you can change it!

Event details:

Tuesday, November 13th, 2018 5:30pm-7pm
Shepherd University
Robert C. Byrd Center for Congressional History and Education
213 N King Street
Shepherdstown, West Virginia 25443