Using machine learning to map the footprint of fracking in central Appalachia

Fossil fuel production has left a lasting imprint on the landscapes and communities of central and northern Appalachia.  Mountaintop mining operations, pipeline right-of-ways, oil and gas well pads, and hydraulic fracturing wastewater retention ponds dot the landscapes of West Virginia and Pennsylvania.  And although advocacy groups have made progress pressuring regulated industries and state agencies for greater transparency, many communities in central and northern Appalachia are unaware of, or unclear about, the extent of human health risks that they face from exposure to these facilities.  

A key challenge is the discrepancy that often exists between what is on paper and what is on the landscape.  It takes time, money, and staff (three rarities for state agencies always under pressure to do more with less) to map energy infrastructure, and to keep those records updated and accessible for the public.  But with advancements in deep learning, and with the increasing amount of satellite imagery available from governments and commercial providers, it might be possible to track the expansion of energy infrastructure—as well as the public health risks that accompany it—in near real-time.

Figure 1.  Oil and gas well pad locations, 2005 – 2015.

Mapping the footprint of oil and gas drilling, especially unconventional drilling or “fracking,” is a critical piece of SkyTruth’s work.  Since 2013, we’ve conducted collaborative image analysis projects called “FrackFinder” to fill the gaps in publicly available information about the location of fracking operations in the Marcellus and Utica Shale.  In the past, we relied on several hundred volunteers to identify and map oil and gas well pads throughout Ohio, Pennsylvania, and West Virginia.  But we’ve been working on a new approach: automating the detection of oil and gas well pads with machine learning.  Rather than train several hundred volunteers to identify well pads in satellite imagery, we developed a machine learning model that could be deployed across thousands of computers simultaneously.  Machine learning is at the heart of today’s companies. It’s the technology that enables Netflix to recommend new shows that you might like, or that allows digital assistants like Google, Siri, or Alexa to understand requests like, “Hey Google, text Mom I’ll be there in 20 minutes.”

Examples are at the core of machine learning.  Rather than try to “hard code” all of the characteristics that define a modern well pad (they are generally square, generally gravel, and generally littered with industrial equipment), we teach computers what they look like by using examples.  Lots of examples. Like, thousands or even millions of them, if we can find them. It’s just like with humans: the more examples of something that you see, the easier it is to recognize that thing later. So, where did we get a few thousand images of well pads in Pennsylvania?  

We started with SkyTruth’s Pennsylvania oil and gas well pad dataset. The dataset contains well pad locations identified in National Agriculture Imagery Program (NAIP) aerial imagery from 2005, 2008, 2010, 2013, and 2015 (Figure 1).  We uploaded this dataset to Google Earth Engine, and used it to create a collection of 10,000 aerial images in two classes: “well pad” and “non-well pad.” We created the training images by buffering each well pad by 100 meters, clipping the NAIP imagery to the bounding box, and exporting each image.

The images above show three training examples from our “well pad” class. The images below show three training examples taken from our “non-well pad” class.

We divided the dataset into three subsets: a training set with 4,000 images of each class, a validation set with 500 images of each class, and a test set with 500 images of each class.  We combined this work in Google Earth Engine with Google’s powerful TensorFlow deep learning library.  We used our 8,000 training images (4,000 from each class, remember) and TensorFlow’s high-level Keras API to train our machine learning model.  So what, exactly, does that mean? Well, basically, it means that we showed the model thousands and thousands of examples of what well pads are (i.e., images from our “well pad” class) and what well pads aren’t (i.e., images from our “non-well pad” class).  We trained the model for twenty epochs, meaning that we showed the model the entire training set (8,000 images, remember) twenty times.  So, basically, the model saw 160,000 examples, and over time, it “learned” what well pads look like.

Our best model run returned an accuracy of 84%, precision and recall measures of 87% and 81%, respectively, and a false positive rate and false negative rate of 0.116 and 0.193, respectively.  We’ve been pleased with our initial model runs, but there is plenty of room for improvement. We started with the VGG16 model architecture that comes prepackaged with Keras (Simonyan and Zisserman 2014, Chollet 2018).  The VGG16 model architecture is no longer state-of-the-art, but it is easy to understand, and it was a great place to begin.  

After training, we ran the model on a few NAIP images to compare its performance against well pads collected by SkyTruth volunteers for our 2015 Pennsylvania FrackFinder project.  Figures 4 and 6 depict the model’s performance on two NAIP images near Williamsport, PA. White bounding boxes indicate landscape features that the model predicted to be well pads.  Figures 5 and 7 depict those same images with well pads (shown in red) delineated by SkyTruth volunteers.

Figure 4.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 5.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.
Figure 6.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 7.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.

One of the first things that stood out to us was that our model is overly sensitive to strong linear features.  In nearly every training example, there is a clearly-defined access road that connects to the well pad. As a result, the model regularly classified large patches of cleared land or isolated developments (e.g., warehouses) at the end of a linear feature as a well pad.  Another major weakness is that our model is also overly sensitive to active well pads.  Active well pads tend to be large, gravel squares with clearly defined edges. Although these well pads may be the biggest concern, there are many “reclaimed” and abandoned well pads that lack such clearly defined edges.  Regrettably, our model is overfit to highly-visible active wells pads, and it performs poorly on lower-visibility drilling sites that have lost their square shape or that have been revegetated by grasses.

Nevertheless, we think this is a good start.  Despite a number of false detections, our model was able to detect all of the well pads previously identified by volunteers in images 5 and 7 above.  In several instances, false detections consisted of energy infrastructure that, although not active well pads, remain of high interest to environmental and public health advocates as well as state regulators: abandoned well pads, wastewater impoundments, and recent land clearings.  NAIP imagery is only collected every two or three years, depending on funding. So, tracking the expansion of oil and gas drilling activities in near real-time will require access to a high resolution, near real-time imagery stream (like Planet, for instance).  For now, we’re experimenting with more current model architectures and with reconfiguring the model for semantic segmentation — extracting polygons that delineate the boundaries of well pads which can be analyzed in mapping software by researchers and our partners working on the ground.

Keep checking back for updates.  We’ll be posting the training data that we created, along with our initial models, as soon as we can.

Mapping potential “drill out” scenarios in Allegheny County, Pennsylvania

SkyTruth has just launched its first Google Earth Engine app, detailing potential natural gas drilling scenarios in Allegheny County, Pennsylvania.  If you’re interested, you can view the app here.

Hydraulic fracturing — fracking — has unlocked natural gas resources from formations like the Utica Shale and Marcellus Shale, resulting in an explosion of gas-drilling activity across the Mid-Atlantic states. One of the states sitting above this hot commodity is Pennsylvania; the state boasts a massive reserve of nearly 89.5 trillion cubic feet of dry natural gas, according to the US Energy Information Administration.  In the thick of it all, Allegheny County, in the southwestern portion of the state, is one of the few counties where drilling activity has been relatively light. The county’s main defense against well drilling has been zoning regulations which require a “setback” between unconventional natural gas drilling sites and “occupied buildings.”  At present, the minimum distance required between a well pad and a building is 500 feet (unless consent has been received by the building’s owner). However, this distance may not adequately protect human health, especially in communities surrounded by drilling. Municipal officials might want to consider alternative setbacks, based on the latest scientific research on the impacts of drilling on the health of nearby residents.  This analysis evaluates a range of setback scenarios, and illustrates the likely drilling density and distribution of drilling sites across the county for each scenario.

To better understand the potential impact of drilling in Allegheny County, I analyzed several different “drill out” scenarios (Figure 1).  I developed our first Google Earth Engine app to give users a glimpse of how different setback distances and different well spacing intervals might impact the number of homes at risk from drilling impacts in the future.  Check out the analysis here.

Figure 1. A screenshot of the app when first launched.

To begin this analysis, I downloaded building footprint data for Allegheny County from the Pennsylvania Geospatial Data Clearinghouse.  Next, I downloaded shapefiles representing the centerlines of major rivers passing through the county, other hydrological features in Allegheny County, and county-owned roads from the Allegheny County GIS Open Data Portal. I also downloaded a TIGER shapefile representing Allegheny County’s Major Highways as of 2014, courtesy of the US Census Bureau. Setback distances of 500 feet, 1,000 feet, 1,500 feet, and 2,500 feet were used to buffer the center points of “occupied buildings” in the county. I selected the minimum and maximum setback distances based upon the current Pennsylvania setback laws (500 ft.) and a recently proposed and defeated setback distance from Colorado (2,500 ft.). The latter regulation, if passed, would have been the most restrictive regulation on fracking of any state.  The 1,000 and 1,500 foot setbacks are meant to serve as intervals between these two demonstrated extremes of zoning regulation. I also created buffers around rivers and streams as well as roads. I applied a 300 foot buffer to the centerlines of all rivers and streams in the county (based upon the current regulations). I also applied a 328 foot buffer to all major highways and a 40 foot buffer to all county roads. These three buffer zones remained constant throughout the project.  

After applying these buffer distances to rivers, roads, and buildings, I calculated how many acres of Allegheny County were potentially open to drilling.  Using the currently required distance of 500 feet, there are approximately 53,000 acres potentially available for drilling in Allegheny County, PA (See Figure 2).  

Figure 2: Screenshot from the app showing the available drilling area in Allegheny County (shown in grey) when considering the 500-foot setback from occupied structures.  Current well pad locations are denoted by red points on the map.

Using the setback distances that we identified (e.g., 500 feet, 1,000 feet, 1,500 feet, 2,500 feet), I wanted to visualize what different potential “drill out” scenarios might look like.  To do that, I had to decide how much space to leave between potential well sites. I chose to space out the potential drilling sites according to three different intervals: 40 acres per well, 80 acres per well, and 640 acres per well.  Calculating different setback distances and different spacing intervals allowed me to investigate the range of possible “drill out” in Allegheny County.  I calculated the number of new drilling sites that each “drill out” scenario could potentially support. I’ve summarized the results below:

40 acre well spacing80 acres well spacing640 acre well spacing
500 ft. setback92846552
1,000 ft. setback25715614
1,500 ft. setback84488
2,500 ft. setback18103

So, for example, a setback distance of 500 feet coupled with a spacing between well pads of 40 acres would allow for 928 new potential drilling locations.  Taking into consideration the approximate 3-5 acre area required for the development of a well pad, this suggests that 2,700-4,600 acres of land in Allegheny County could be subjected to surface well development.

For each “drill out” scenario, I mapped the number of potentially supported wells, and I put a two-mile buffer around each point to simulate the potential zone of adverse health impacts (See Figure 3).  I used the buffered points to calculate the number of “occupied structures” that would be at risk of exposure if a drilling site was built. The number of occupied structures at risk when considering each of the different scenarios is summarized in the table below:

40 acre well spacing80 acre well spacing640 acre well spacing
500 ft. setback446,901380,284194,053
1,000 ft. setback222,481215,41543,256
1,500 ft. setback90,04660,91926,722
2,500 ft. setback4,8164,5243,626
Figure 3. Screenshot from the app showing potential drill-out locations (shown in yellow), considering a 500-ft setback from occupied structures and a separation between potential drilling operations of 40 acres. Notice the area of the county potentially subjected to adverse health consequences considering a two-mile buffer zone (shown in black) around each of these locations.

Setback distances can be an important tool for municipal governments looking to reign in drilling to protect the health, safety, and quality of life of local residents.  My analysis demonstrates how setback distances can help protect the public from the adverse impacts of oil and gas drilling in Allegheny County, Pennsylvania. Please be sure to check out the app here.  

Benzene Contamination Caused by Fracking. Or Something Else?

In Erie, Colorado, a local mom is understandably alarmed by the level of benzene — a known human carcinogen — in her 6-year-old son’s blood. There is plenty of drilling and fracking happening around Erie, including a wellpad 1,300′ to the west of the Erie elementary school that was built in 2012 and now hosts at least 8 producing wells. Prevailing winds in Erie typically blow from the west, putting the elementary school and the neighboring middle school directly downwind from this large drilling site, making the drilling operations an obvious suspect for the cause of this contamination. Slam dunk, right? 

Map showing locations of Erie, Colorado elementary and middle schools, and nearby features of interest noted in the text.

But the situation may not be that simple, as illustrated in the map above. The schools have a much closer neighbor — a gasoline station that’s right across the street, 250′ north of the elementary school, that has been there since at least 1993. When I worked for the Environmental Protection Agency in the 1990s, the problem of fuel oozing out of leaking underground storage tanks (yes, we called them LUSTs) at homes, gas stations, on farms, and other sites around the country was just beginning to get nationwide attention and prompted a suite of new rules from the EPA. Gas stations around the country were required to replace their old tanks. Many sites had plumes of gasoline floating on the local water table, sometimes migrating off the gas station property and into surrounding neighborhoods, sending fumes into basements and chemicals into water-supply wells. Gasoline contains benzene. Could kids at these schools be exposed to old gasoline contamination from this nearby filling station? Or to gasoline vapors being released today, as customers fill up their vehicles?
 

Looking south-southwest at gasoline filling station across the street from Erie Elementary School (just beyond the treeline).

There’s also a lumber mill 900′ south of the elementary school property line, and it too has been there since at least 1993. The mill probably operates diesel-powered equipment, and may even have its own diesel fuel storage tank onsite. Diesel fuel and fumes, and exhaust from diesel engines, all contain benzene. This site is not upwind from the school, so I would consider it a less-likely source of exposure for the kids there.
 
And I don’t know where this boy lives; maybe he’s grown up with a filling station or some other benzene-spewing industrial site nearby.  He may not even go to this school.  
 
None of this speculation — and it is pure speculation on our part — is intended to deflect attention from the increasingly well-documented health impacts that result from living near modern drilling and fracking operations. Everybody’s situation is different, and we just want to be sure we’re pointing our fingers at the right culprit so that A) we’ll be taken seriously, and B) the problem will be fixed. Sometimes that culprit may be oil and gas drilling. At other times it may be something that we’re overlooking.  
Aerial survey photos from the 2013 National Agricultural Imagery Program (NAIP) show how drilling and fracking have altered the West Virginia landscape.

SkyTruth data supports Maryland’s ban on fracking

In April 2017, Maryland Governor Larry Hogan signed a bill reinstating a fracking ban in the state. The Maryland General Assembly imposed a temporary moratorium on hydraulic fracturing for natural gas in 2013, and — following similar bans in Vermont in 2012 and New York in 2015 — the 2017 bill makes Maryland the third state in the country to ban fracking. 

SkyTruth’s crowd-assisted FrackFinder work mapping oil and gas well pads played an important role in this environmental and public health victory. Lawmakers evaluated recent research led by Dr. Brian Schwartz at Johns Hopkins that found higher premature birth rates for mothers in Pennsylvania that live near fracking sites. In a related study, Johns Hopkins researcher Sara Rasmussen found that Pennsylvania residents with asthma living near fracking sites are up to four times more likely to suffer asthma attacks.

The research conducted by Johns Hopkins relied on oil and gas infrastructure data produced by SkyTruth. That means our work was among the things that Maryland legislators considered when they chose to extend the state’s ban on fracking. It’s incredibly exciting to see our work play such a direct role in policy-making, and it highlights the importance of continuing to update our oil and gas footprint data sets and sharing them for free with researchers and the public. We’re continuing to map the footprint of oil and gas development in Appalachia, so keep checking in for updates.  Way to go Maryland!

PA FrackFinder Screenshot

Pennsylvania FrackFinder Data Update

We’re excited to announce the 2015 update to our Pennsylvania FrackFinder data set! Using the USDA’s most recent high-resolution aerial imagery for Pennsylvania, we’ve updated our maps of the state’s drilling sites and wastewater impoundments. Our revised maps show Pennsylvania’s drilling sites and wastewater impoundments as of Fall 2015.

Our previous Pennsylvania FrackFinder project identified the location of active well pads in imagery from 2005, 2008, 2010 and 2013. We are pleased to add the 2015 update to this already rich data set.

The goal of our FrackFinder projects has always been to fill the gaps in publicly available information related to where fracking operations in the Marcellus and Utica Shale were taking place. Regrettably, there are often discrepancies between what’s on paper and what’s on the landscape. Permits for individual oil and gas wells are relatively accessible, but the permits are just approvals to drill: they don’t say if a site is active, when drilling and fracking began or ended, or if development of the drill site ever happened at all.

We compared permit locations against high-resolution aerial imagery from the USDA’s 2015 National Agricultural Inventory Program (NAIP) to determine whether drilling permits issued since the close of our last Pennsylvania FrackFinder project in 2013 were active. There were more than 4,500 drilling permits issued in Pennsylvania during our study period (May 1, 2012,  to September 30, 2015), many of them located quite close together. Ultimately, we ended up with roughly 2,000 unique ‘clusters’ of drilling permits to investigate and map.

We look forward to seeing how the public will use these revised data sets. We hope researchers, NGOs and community advocates can use these unique data sets to gain a better understanding of the impact of fracking on Pennsylvania’s environment and public health.