Using machine learning to map the footprint of fracking in central Appalachia

Fossil fuel production has left a lasting imprint on the landscapes and communities of central and northern Appalachia.  Mountaintop mining operations, pipeline right-of-ways, oil and gas well pads, and hydraulic fracturing wastewater retention ponds dot the landscapes of West Virginia and Pennsylvania.  And although advocacy groups have made progress pressuring regulated industries and state agencies for greater transparency, many communities in central and northern Appalachia are unaware of, or unclear about, the extent of human health risks that they face from exposure to these facilities.  

A key challenge is the discrepancy that often exists between what is on paper and what is on the landscape.  It takes time, money, and staff (three rarities for state agencies always under pressure to do more with less) to map energy infrastructure, and to keep those records updated and accessible for the public.  But with advancements in deep learning, and with the increasing amount of satellite imagery available from governments and commercial providers, it might be possible to track the expansion of energy infrastructure—as well as the public health risks that accompany it—in near real-time.

Figure 1.  Oil and gas well pad locations, 2005 – 2015.

Mapping the footprint of oil and gas drilling, especially unconventional drilling or “fracking,” is a critical piece of SkyTruth’s work.  Since 2013, we’ve conducted collaborative image analysis projects called “FrackFinder” to fill the gaps in publicly available information about the location of fracking operations in the Marcellus and Utica Shale.  In the past, we relied on several hundred volunteers to identify and map oil and gas well pads throughout Ohio, Pennsylvania, and West Virginia.  But we’ve been working on a new approach: automating the detection of oil and gas well pads with machine learning.  Rather than train several hundred volunteers to identify well pads in satellite imagery, we developed a machine learning model that could be deployed across thousands of computers simultaneously.  Machine learning is at the heart of today’s companies. It’s the technology that enables Netflix to recommend new shows that you might like, or that allows digital assistants like Google, Siri, or Alexa to understand requests like, “Hey Google, text Mom I’ll be there in 20 minutes.”

Examples are at the core of machine learning.  Rather than try to “hard code” all of the characteristics that define a modern well pad (they are generally square, generally gravel, and generally littered with industrial equipment), we teach computers what they look like by using examples.  Lots of examples. Like, thousands or even millions of them, if we can find them. It’s just like with humans: the more examples of something that you see, the easier it is to recognize that thing later. So, where did we get a few thousand images of well pads in Pennsylvania?  

We started with SkyTruth’s Pennsylvania oil and gas well pad dataset. The dataset contains well pad locations identified in National Agriculture Imagery Program (NAIP) aerial imagery from 2005, 2008, 2010, 2013, and 2015 (Figure 1).  We uploaded this dataset to Google Earth Engine, and used it to create a collection of 10,000 aerial images in two classes: “well pad” and “non-well pad.” We created the training images by buffering each well pad by 100 meters, clipping the NAIP imagery to the bounding box, and exporting each image.

The images above show three training examples from our “well pad” class. The images below show three training examples taken from our “non-well pad” class.

We divided the dataset into three subsets: a training set with 4,000 images of each class, a validation set with 500 images of each class, and a test set with 500 images of each class.  We combined this work in Google Earth Engine with Google’s powerful TensorFlow deep learning library.  We used our 8,000 training images (4,000 from each class, remember) and TensorFlow’s high-level Keras API to train our machine learning model.  So what, exactly, does that mean? Well, basically, it means that we showed the model thousands and thousands of examples of what well pads are (i.e., images from our “well pad” class) and what well pads aren’t (i.e., images from our “non-well pad” class).  We trained the model for twenty epochs, meaning that we showed the model the entire training set (8,000 images, remember) twenty times.  So, basically, the model saw 160,000 examples, and over time, it “learned” what well pads look like.

Our best model run returned an accuracy of 84%, precision and range measures of 87% and 81%, respectively, and a false positive rate and false negative rate of 0.116 and 0.193, respectively.  We’ve been pleased with our initial model runs, but there is plenty of room for improvement. We started with the VGG16 model architecture that comes prepackaged with Keras (Simonyan and Zisserman 2014, Chollet 2018).  The VGG16 model architecture is no longer state-of-the-art, but it is easy to understand, and it was a great place to begin.  

After training, we ran the model on a few NAIP images to compare its performance against well pads collected by SkyTruth volunteers for our 2015 Pennsylvania FrackFinder project.  Figures 4 and 6 depict the model’s performance on two NAIP images near Williamsport, PA. White bounding boxes indicate landscape features that the model predicted to be well pads.  Figures 5 and 7 depict those same images with well pads (shown in red) delineated by SkyTruth volunteers.

Figure 4.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 5.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.
Figure 6.  Well pads detected by our machine learning algorithm in NAIP imagery from 2015.
Figure 7.  Well pads detected by SkyTruth volunteers in NAIP imagery from 2015.

One of the first things that stood out to us was that our model is overly sensitive to strong linear features.  In nearly every training example, there is a clearly-defined access road that connects to the well pad. As a result, the model regularly classified large patches of cleared land or isolated developments (e.g., warehouses) at the end of a linear feature as a well pad.  Another major weakness is that our model is also overly sensitive to active well pads.  Active well pads tend to be large, gravel squares with clearly defined edges. Although these well pads may be the biggest concern, there are many “reclaimed” and abandoned well pads that lack such clearly defined edges.  Regrettably, our model is overfit to highly-visible active wells pads, and it performs poorly on lower-visibility drilling sites that have lost their square shape or that have been revegetated by grasses.

Nevertheless, we think this is a good start.  Despite a number of false detections, our model was able to detect all of the well pads previously identified by volunteers in images 5 and 7 above.  In several instances, false detections consisted of energy infrastructure that, although not active well pads, remain of high interest to environmental and public health advocates as well as state regulators: abandoned well pads, wastewater impoundments, and recent land clearings.  NAIP imagery is only collected every two or three years, depending on funding. So, tracking the expansion of oil and gas drilling activities in near real-time will require access to a high resolution, near real-time imagery stream (like Planet, for instance).  For now, we’re experimenting with more current model architectures and with reconfiguring the model for semantic segmentation — extracting polygons that delineate the boundaries of well pads which can be analyzed in mapping software by researchers and our partners working on the ground.

Keep checking back for updates.  We’ll be posting the training data that we created, along with our initial models, as soon as we can.

Mapping potential “drill out” scenarios in Allegheny County, Pennsylvania

SkyTruth has just launched its first Google Earth Engine app, detailing potential natural gas drilling scenarios in Allegheny County, Pennsylvania.  If you’re interested, you can view the app here.

Hydraulic fracturing — fracking — has unlocked natural gas resources from formations like the Utica Shale and Marcellus Shale, resulting in an explosion of gas-drilling activity across the Mid-Atlantic states. One of the states sitting above this hot commodity is Pennsylvania; the state boasts a massive reserve of nearly 89.5 trillion cubic feet of dry natural gas, according to the US Energy Information Administration.  In the thick of it all, Allegheny County, in the southwestern portion of the state, is one of the few counties where drilling activity has been relatively light. The county’s main defense against well drilling has been zoning regulations which require a “setback” between unconventional natural gas drilling sites and “occupied buildings.”  At present, the minimum distance required between a well pad and a building is 500 feet (unless consent has been received by the building’s owner). However, this distance may not adequately protect human health, especially in communities surrounded by drilling. Municipal officials might want to consider alternative setbacks, based on the latest scientific research on the impacts of drilling on the health of nearby residents.  This analysis evaluates a range of setback scenarios, and illustrates the likely drilling density and distribution of drilling sites across the county for each scenario.

To better understand the potential impact of drilling in Allegheny County, I analyzed several different “drill out” scenarios (Figure 1).  I developed our first Google Earth Engine app to give users a glimpse of how different setback distances and different well spacing intervals might impact the number of homes at risk from drilling impacts in the future.  Check out the analysis here.

Figure 1. A screenshot of the app when first launched.

To begin this analysis, I downloaded building footprint data for Allegheny County from the Pennsylvania Geospatial Data Clearinghouse.  Next, I downloaded shapefiles representing the centerlines of major rivers passing through the county, other hydrological features in Allegheny County, and county-owned roads from the Allegheny County GIS Open Data Portal. I also downloaded a TIGER shapefile representing Allegheny County’s Major Highways as of 2014, courtesy of the US Census Bureau. Setback distances of 500 feet, 1,000 feet, 1,500 feet, and 2,500 feet were used to buffer the center points of “occupied buildings” in the county. I selected the minimum and maximum setback distances based upon the current Pennsylvania setback laws (500 ft.) and a recently proposed and defeated setback distance from Colorado (2,500 ft.). The latter regulation, if passed, would have been the most restrictive regulation on fracking of any state.  The 1,000 and 1,500 foot setbacks are meant to serve as intervals between these two demonstrated extremes of zoning regulation. I also created buffers around rivers and streams as well as roads. I applied a 300 foot buffer to the centerlines of all rivers and streams in the county (based upon the current regulations). I also applied a 328 foot buffer to all major highways and a 40 foot buffer to all county roads. These three buffer zones remained constant throughout the project.  

After applying these buffer distances to rivers, roads, and buildings, I calculated how many acres of Allegheny County were potentially open to drilling.  Using the currently required distance of 500 feet, there are approximately 53,000 acres potentially available for drilling in Allegheny County, PA (See Figure 2).  

Figure 2: Screenshot from the app showing the available drilling area in Allegheny County (shown in grey) when considering the 500-foot setback from occupied structures.  Current well pad locations are denoted by red points on the map.

Using the setback distances that we identified (e.g., 500 feet, 1,000 feet, 1,500 feet, 2,500 feet), I wanted to visualize what different potential “drill out” scenarios might look like.  To do that, I had to decide how much space to leave between potential well sites. I chose to space out the potential drilling sites according to three different intervals: 40 acres per well, 80 acres per well, and 640 acres per well.  Calculating different setback distances and different spacing intervals allowed me to investigate the range of possible “drill out” in Allegheny County.  I calculated the number of new drilling sites that each “drill out” scenario could potentially support. I’ve summarized the results below:

40 acre well spacing80 acres well spacing640 acre well spacing
500 ft. setback92846552
1,000 ft. setback25715614
1,500 ft. setback84488
2,500 ft. setback18103

So, for example, a setback distance of 500 feet coupled with a spacing between well pads of 40 acres would allow for 928 new potential drilling locations.  Taking into consideration the approximate 3-5 acre area required for the development of a well pad, this suggests that 2,700-4,600 acres of land in Allegheny County could be subjected to surface well development.

For each “drill out” scenario, I mapped the number of potentially supported wells, and I put a two-mile buffer around each point to simulate the potential zone of adverse health impacts (See Figure 3).  I used the buffered points to calculate the number of “occupied structures” that would be at risk of exposure if a drilling site was built. The number of occupied structures at risk when considering each of the different scenarios is summarized in the table below:

40 acre well spacing80 acre well spacing640 acre well spacing
500 ft. setback446,901380,284194,053
1,000 ft. setback222,481215,41543,256
1,500 ft. setback90,04660,91926,722
2,500 ft. setback4,8164,5243,626
Figure 3. Screenshot from the app showing potential drill-out locations (shown in yellow), considering a 500-ft setback from occupied structures and a separation between potential drilling operations of 40 acres. Notice the area of the county potentially subjected to adverse health consequences considering a two-mile buffer zone (shown in black) around each of these locations.

Setback distances can be an important tool for municipal governments looking to reign in drilling to protect the health, safety, and quality of life of local residents.  My analysis demonstrates how setback distances can help protect the public from the adverse impacts of oil and gas drilling in Allegheny County, Pennsylvania. Please be sure to check out the app here.  

A look back at 20 years of oil and gas permitting in Wyoming

A shift in priorities of the EPA under the current administration has raised awareness of an increase in oil and gas permitting across the USA. However, the increase began before the current administration. Although the federal government controls most regulations and laws that affect permitting, other factors such as global oil and gas prices, advances in drilling and production technology, and state governments’ willingness to accommodate investors have an effect on permitting and investment by energy companies. It should be pointed out that permitting does not necessarily indicate drilling as companies can request permits but then hold on to the permits until either eventually drilling, requesting a new permit, or selling the permit to another company. This can tie up land for decades and is covered in more detail by The Wilderness Society’s report: “Land Hoarders: How Stockpiling Leases is Costing Taxpayers”.

Wyoming has an economy that is built on coal and oil, but in the 80s and early 90s it was suffering from an oil glut that caused prices to drop. As prices began to recover throughout the 1990s and 2000s and eventually boom (Fig.1), some companies sought to diversify into natural gas (read more in James Hamilton’s paper “Causes and Consequences of the Oil Shock of 2007-08). Many began to drill for gas in the coal fields of Wyoming, and to apply the relatively new technology of hydraulic fracturing (“fracking”) to extract natural gas from previously uneconomic, low-permeability sandstone and shale reservoirs found throughout the Rocky Mountain West.

Oil and gas prices since 1985.

Figure 1. Oil and gas prices since 1985.

The oil and gas boom ended abruptly in 2008 when the effect of the global financial crisis reached the oil and gas markets and prices plummeted.

To better understand the effect these events had on Wyoming, I analyzed permits for new oil and gas wells, issued by the state over the past 20 years. This data is freely available from the Wyoming Oil and Gas Conservation Commision website: First, I should point out that this data has inconsistencies and holes, due to apparent data entry errors like missing or incorrect dates, missing latitude or longitude, typos, etc. Unfortunately, this meant nearly 24% of the total permits had to be left out of my analysis. Some errors still remain, as seen in this map of permit applications received by the state (Fig. 2). Each county is colored differently and there appear to be some permits which either have the wrong county listed or incorrect map coordinates.

Distribution of oil and gas drilling permit applications, color coded by county.

Figure 2. Distribution of oil and gas drilling permit applications, color coded by county.

What immediately stands out is the relatively densely-packed permits in Campbell county, in the north-east of the state. When I looked closer at this county over time, I saw that most of the permit applications were submitted during the beginning of the boom of 1998-2008. This is quickly followed by a sharp drop around 2000, the time hydraulic fracking made drilling in other parts of the state (and country) more profitable. The original method of coal bed methane drilling was considered uneconomical compared to this new fracking method. At that time, I saw a rise in permit applications across other counties (Fig. 3), but far more subdued than the earlier rush, possibly because fracking made deposits across the country viable and so the increase was more widespread across and outside Wyoming. This is just a theory though, these could easily be due to business strategies of companies “capturing” land before their competitors.

Applications for oil and gas drilling permits received over time by county.

Figure 3. Applications for oil and gas drilling permits received over time by county.

The rate of permit applications slows for all counties as the boom ended around 2008 with a short-lived rise leading up to 2016. The boom and bust periods can be seen more clearly when I looked at the overall quantity of permit applications across Wyoming (Fig. 4).

Total number of oil and gas drilling permits applied for in Wyoming.

Figure 4. Total number of oil and gas drilling permits applied for in Wyoming.

The initial rush of the boom was followed by a dip and second climb as fracking technology took off. This is followed by the bust of 2008. There is a slight rise again around 2016, but it drops off by 2017. The effect of this activity is closely reflected in unemployment figures for the state (Fig. 5). Considering that I am looking at permitting however, and not drilling, this correlation should be seen as a reflection of oil and gas companies’ business activities in a holistic sense.

Unemployment rate for Wyoming over the past 20 years.

Figure 5. Unemployment rate for Wyoming over the past 20 years.

Initially, there’s an overall steady decline in unemployment as the boom sweeps up employees but this rockets up once the bust comes along. Interestingly, between 2012 and 2016, there is a steady rise in permit applications which is reflected by the steady drop in unemployment but this is interrupted by a bump in unemployment around 2016. The restoring of the unemployment level after 2016 is not reflected in the drop in permit applications, however. Those appear to drop off.

Although there are booms and busts, the overall number of well permits is constantly increasing (by simple fact of the number of new permits applied for always outweighing the number of permits expiring). The animated image below (Img. 1) shows the growth of oil and gas permit applications as companies move across the state.

Image 1. Permits applied for over the past 20 years.

Image 1. Permits applied for over the past 20 years. (Click to see time-series)

Graphs and maps give us a good idea of the trends but sometimes it is even more helpful to see the physical reality of these numbers.  This is an area in the most heavily permitted county, Campbell (Img. 2).

Image 2. Comparison of an area of Campbell county from July 1999 to July 2018.

As well as the dramatic increase in well pads (i.e., drilling sites), these images show the addition of access roads threading across the landscape.

What this data doesn’t show is the large amount of orphaned wells that were left behind after the price of oil and natural gas dropped in 2008. This has left a legacy of about 3600 abandoned wells (scroll to bottom for total number of orphaned wells currently tracked by Wyoming Oil and Gas Conservation Commision). Often the state, and therefore, the taxpayers, are left to handle this burden because the responsible companies are either unknown, unable to cover the cleanup costs, or have declared bankruptcy and disappeared. Understandably, the state would prefer to see the wells operate once more rather than paying considerable amounts of money to seal them up and restore the land. But these aging, unsecured wells pose a threat to the environment and to public health.  

Many of the coalbed methane wells built at the beginning of the boom were approved with permission to dump untreated “flowback water” on the surface. The companies convinced the state that this  fluid, coming straight from the coal seams targeted by the drilling, would be beneficial for the parched land even though most of the untreated fluid was highly saline. Also, the effect of flooding the land with large volumes of water was extremely unnatural to the existing ecosystem. Many areas that were normally good for grazing became unusable because they were flooded with this salty water. Land that was adapted to little rainfall and snowmelt was suddenly exposed to a constant flow of brine. The companies pushed the idea of plentiful of water for agriculture and wildlife to drink while downplaying the issue of the quality of the water. The state also towed this line while court battles challenging the “beneficial use” permits, led by landowners and conservation groups, were upheld in court. Eventually, they implemented a water-to-gas ratio cap on surface discharges since many of the wells were producing plenty of salty water but little or even no gas at all.

One other trend that I discovered while scrutinizing the permit database was the time it took to process these permits (Fig. 6 & 7). Plotting permit approval times at first appears to show a distribution that follows the general trends that I’ve seen so far, tracking the boom and bust periods. For comparison, I plotted these for both the year of permit application (Fig. 6) and year of approval (Fig. 7).

Figure 6. Permit approval time arranged by year of application.

Figure 6. Permit approval time arranged by year of application.


Figure 7. Permit approval time arranged by year of approval.

Figure 7. Permit approval time arranged by year of approval.

The red lines track the annual average wait time and give a clearer picture of the trend. The spread of wait times fluctuate far more than the actual average wait time. Although the average does not appear to fluctuate much, the scale is a little deceptive as the average wait time extends from 15 days in 1998 to 40 days in the year 2000. The average wait time appears to initially rise with the start of each drilling boom but even out fairly quickly. This changes later when the average wait time climbs sharply around 2013. By 2017, the average wait time has increased considerably to 130 days.

These trends offer insight into the recent history of oil and gas permitting activity in Wyoming. It should be noted that although there was a lot of ‘noise’ in the data that I had to correct or discard, the remaining data helps give me a clearer sense of how oil and gas development is driving change on Wyoming’s landscape. My analysis has been based purely on the history of permitting in Wyoming, not actual drilling. For an analysis on drilling, please look at the Fracktracker Alliance’s page on oil and gas activity in Wyoming. I hope you’ve enjoyed this breakdown of permit data for Wyoming. I hope to take a similar look at other states’ drilling permits, so stay tuned!

Benzene Contamination Caused by Fracking. Or Something Else?

In Erie, Colorado, a local mom is understandably alarmed by the level of benzene — a known human carcinogen — in her 6-year-old son’s blood. There is plenty of drilling and fracking happening around Erie, including a wellpad 1,300′ to the west of the Erie elementary school that was built in 2012 and now hosts at least 8 producing wells. Prevailing winds in Erie typically blow from the west, putting the elementary school and the neighboring middle school directly downwind from this large drilling site, making the drilling operations an obvious suspect for the cause of this contamination. Slam dunk, right? 

Map showing locations of Erie, Colorado elementary and middle schools, and nearby features of interest noted in the text.

But the situation may not be that simple, as illustrated in the map above. The schools have a much closer neighbor — a gasoline station that’s right across the street, 250′ north of the elementary school, that has been there since at least 1993. When I worked for the Environmental Protection Agency in the 1990s, the problem of fuel oozing out of leaking underground storage tanks (yes, we called them LUSTs) at homes, gas stations, on farms, and other sites around the country was just beginning to get nationwide attention and prompted a suite of new rules from the EPA. Gas stations around the country were required to replace their old tanks. Many sites had plumes of gasoline floating on the local water table, sometimes migrating off the gas station property and into surrounding neighborhoods, sending fumes into basements and chemicals into water-supply wells. Gasoline contains benzene. Could kids at these schools be exposed to old gasoline contamination from this nearby filling station? Or to gasoline vapors being released today, as customers fill up their vehicles?

Looking south-southwest at gasoline filling station across the street from Erie Elementary School (just beyond the treeline).

There’s also a lumber mill 900′ south of the elementary school property line, and it too has been there since at least 1993. The mill probably operates diesel-powered equipment, and may even have its own diesel fuel storage tank onsite. Diesel fuel and fumes, and exhaust from diesel engines, all contain benzene. This site is not upwind from the school, so I would consider it a less-likely source of exposure for the kids there.
And I don’t know where this boy lives; maybe he’s grown up with a filling station or some other benzene-spewing industrial site nearby.  He may not even go to this school.  
None of this speculation — and it is pure speculation on our part — is intended to deflect attention from the increasingly well-documented health impacts that result from living near modern drilling and fracking operations. Everybody’s situation is different, and we just want to be sure we’re pointing our fingers at the right culprit so that A) we’ll be taken seriously, and B) the problem will be fixed. Sometimes that culprit may be oil and gas drilling. At other times it may be something that we’re overlooking.  
Aerial survey photos from the 2013 National Agricultural Imagery Program (NAIP) show how drilling and fracking have altered the West Virginia landscape.

SkyTruth data supports Maryland’s ban on fracking

In April 2017, Maryland Governor Larry Hogan signed a bill reinstating a fracking ban in the state. The Maryland General Assembly imposed a temporary moratorium on hydraulic fracturing for natural gas in 2013, and — following similar bans in Vermont in 2012 and New York in 2015 — the 2017 bill makes Maryland the third state in the country to ban fracking. 

SkyTruth’s crowd-assisted FrackFinder work mapping oil and gas well pads played an important role in this environmental and public health victory. Lawmakers evaluated recent research led by Dr. Brian Schwartz at Johns Hopkins that found higher premature birth rates for mothers in Pennsylvania that live near fracking sites. In a related study, Johns Hopkins researcher Sara Rasmussen found that Pennsylvania residents with asthma living near fracking sites are up to four times more likely to suffer asthma attacks.

The research conducted by Johns Hopkins relied on oil and gas infrastructure data produced by SkyTruth. That means our work was among the things that Maryland legislators considered when they chose to extend the state’s ban on fracking. It’s incredibly exciting to see our work play such a direct role in policy-making, and it highlights the importance of continuing to update our oil and gas footprint data sets and sharing them for free with researchers and the public. We’re continuing to map the footprint of oil and gas development in Appalachia, so keep checking in for updates.  Way to go Maryland!