Drilling Detection with Machine Learning Part 2: Segmentation Starter Kit

Geospatial Analyst Brendan Jarrell explains, step-by-step, how to develop a machine learning model to detect oil and gas well pads from satellite imagery.

[This is the second post in a 3-part blog series describing SkyTruth’s effort to automate the detection of oil and gas well pads around the world using machine learning. This tool will allow local communities, conservationists, researchers, policymakers and journalists to see for themselves the growth of drilling in the areas they care about. This is a central part of SkyTruth’s work: to share our expertise with others so that anyone can help protect the planet, their communities, and the places they care about. You can read the first post in the series here. All of the code that will be covered in this post can be found here. Our training dataset is also available here.]

SkyTruth Intern Sasha Bylsma explained in our first post in this series how we create training data for a machine learning workflow that will be used to detect oil and gas well pads around the world. In this post, I’m going to explain how we apply a machine learning model to satellite imagery, explaining all the tools we use and steps we take to make this happen, so that anyone can create similar models on their own.

Once we have created a robust set of training data, we want to feed a satellite image into the machine learning model and have the model scan the image in search of well pads. We then look to the model to tell us where the well pads are located and give us the predicted boundary of each of the well pads. This is known as segmentation, as shown in Figure 1. 

Figure 1: An example of our current work on well pad segmentation. The original image is seen on the left; what the ML model predicts as a well pad can be seen on the right. Notice that the algorithm is not only returning the drilling site’s location, but also its predicted boundaries.

We want the model to identify well pad locations because of the crucial context that location data provides. For example, location can tell us if there is a high density of drilling in the area, helping nearby communities track increasing threats to their health. It can also calculate the total area of disturbed land in the area of interest, helping researchers, advocates and others determine how severely wildlife habitat or other land characteristics are diminished.  

In the past, SkyTruth did this work manually, with an analyst or volunteer viewing individual images to search for well pads and laboriously drawing their boundaries. Projects like FrackFinder, for example, may have taken staff and volunteers weeks to complete. Now, with the help of a machine learning model, we can come in on a Monday morning, let the model do its thing, and have that same dataset compiled and placed on a map in an hour or two. The benefits of leveraging this capability are obvious: we can scan thousands of images quickly and consistently, increasing the likelihood of finding well pads and areas with high levels of drilling.

Formatting the data

So how do we do this? The first thing we need to do is get our data into a format that will be acceptable for the machine learning model. We decided that we would use the TensorFlow API as our framework for approaching this task. TensorFlow is an open-source (i.e. “free-to-use”) software package that was developed by Google to give users access to a powerful math library specifically designed for machine learning. We exported data from Google Earth Engine in the TFRecord format; TFRecords are convenient packages for exporting information from Earth Engine for later use in TensorFlow. In our code under the section labeled “Get Training, Validation Data ready for UNET,” we see that there are a few steps we must fulfill to extract the TFRecords from their zipped up packages and into a usable format (see Figure 2). 

# Bands included in our input Feature Collection and S2 imagery.

bands = ['R','G','B']
label = 'Label'
featureNames = bands + [label]
# Convert band names into tf.Features.

cols = [
         tf.io.FixedLenFeature(shape=[256,256],dtype=tf.float32) for band in featureNames
       ]

"""Pass these new tensors into a dictionary, used to describe pieces of the input dataset."""
featsDict = dict(zip(featureNames,cols))

Figure 2:  Preprocessing code

Second, we create Tensorflow representations of the information we are interested in drawing out of each of our examples from the Google Earth Engine workflow (see the first post in this series for more explanation on how we made these samples). Each of the samples has a Red, Green, and Blue channel associated with it, as well as a mask band, called “label” in our code. As such, we create Tensorflow representations for each of these different channels that data will be plugged into. Think of the representations we create for each channel name as sorting bins; when a TFRecord is unpacked, the corresponding channel values from the record will be placed into the bin that represents it. After loading in all of our TFRecords, we push them into a TFRecord Dataset. A TFRecord Dataset is a dataset which is populated by several TFRecords. We then apply a few functions to the TFRecord Dataset that make the records interpretable by the model later on.

Validation dataset

Once the dataset is loaded in, we split the dataset into two. This is an important part of machine learning, where we set aside a small amount of the whole dataset. When the model is being trained on the larger portion of the dataset, known as the training data, it will not see this smaller subset, which we call the validation set. As its name suggests, the model uses this smaller fraction of information to perform a sanity check of sorts. It’s asking itself, “Okay, I think that a well pad looks like this. Am I close to the mark, or am I way off?” All of this is put in place to help the model learn the minute details and intricacies of the data we’ve provided it. Typically, we will reserve 15-30% of our total dataset for the validation set. The code necessary for splitting the dataset is shown in Figure 3 below.

# Get the full size of the dataset.
full_size = len(list(data))
print(f'Full size of the dataset: {full_size}','\n')

# Define a split for the dataset.
train_pct = 0.8
batch_size = 16
split = int(full_size * train_pct)

# Split it up.
training = data.take(split)
evaluation = data.skip(split)

# Get the data ready for training.
training = training.shuffle(split).batch(batch_size).repeat()
evaluation = evaluation.batch(batch_size)

# Define the steps taken per epoch for both training and evaluation.
TRAIN_STEPS = math.ceil(split / batch_size)
EVAL_STEPS = math.ceil((full_size - split)  / batch_size)

print(f'Number of training steps: {TRAIN_STEPS}')
print(f'Number of evaluation steps: {EVAL_STEPS}')

Figure 3: Validation split code snippet

Implementation in U-Net

Now it’s time for the fun stuff! We’re finally ready to begin setting up the model that we will be using for our segmentation task. We will be leveraging a model called a U-Net for our learning. Our implementation of the U-Net in TensorFlow follows a very similar structure to the one seen in the example here. In a nutshell, here is what’s happening in our U-Net code:

1.) The machine learning model is expecting a 256 pixel by 256 pixel by 3 band input. This is the reason why we exported our image samples in this manner from Earth Engine. Also, by chopping up the images into patches, we reduce the amount of information that needs to be stored in temporary memory at any given point. This allows our code to run without crashing.

2.) The computer scans the input through a set of encoders. An encoder’s job is to learn every little detail of the thing we’re instructing it to learn. So in our case, we want it to learn all of the intricacies that define a well pad in satellite imagery. We want it to learn that well pads are typically squares or rectangles, have well defined edges, and may or may not be in close proximity to other well pads. As the number of encoders increases further down the “U” shape of the model, it is learning and retaining more of these features that make well pads unique.

3.) As the computer creates these pixel-by-pixel classifications sliding down the “U,” it sacrifices the spatial information that the input once held. That is to say, the image no longer appears as a bunch of well pads scattered across a landscape. It appears more so as a big stack of cards. All of the pixels in the original image are now classified with their newly minted predictions (i.e. “I am a well pad” or “I am not a well pad”), but they don’t have any clue where in the world they belong. The task of the upper slope of the “U” is to stitch the spatial information onto the classified predictions generated by our model. In this light, the upward slope of the “U” is made up of filters known as decoders. The cool thing about the U-Net is that as we go further up the “U”, it will grab the spatial pattern associated with the same location on the downward slope of the U-Net. In short, the model gives its best shot at taking these classified predictions and making them back into an image. To see a visual representation of the U-Net model, refer to Figure 4 below.

Figure 4: A graphic representing the U-Net architecture, courtesy of Ronneberger, et al.

At the end of the trip through the model, we are left with an output image from the model. This image is the model’s best guess at whether or not what we’ve fed it shows well pads or not. Of course, the model’s best guess will not be absolute for each and every pixel in the image. Given what it has learned about well pads, (how they’re shaped, what color palette usually describes a well pad, etc.), the model returns values on a spectrum from 0 to 1. Wherever the values land in between these two numbers can be called the model’s confidence in its prediction. So for example, forested areas in the image would ideally show a confidence value near zero; conversely, drilling sites picked up in the image would have confidence values close to one. Ambiguous features in the image, like parking lots or agricultural fields, might have a value somewhere in the middle of zero and one. Depending on how well the model did when compared to the mask associated with the three band input, it will be reprimanded for mistakes or errors it makes using what’s known as a loss function. To read more about loss functions and how they can be used, be sure to check out this helpful blog. Now that we have the model set up, we are ready to gear up for training!

Data augmentation

Before we start to train, we define a function which serves the purpose of tweaking the inputs slightly every time they are seen by the model. This is a process known as data augmentation. The reason why we make these small changes is because we don’t have a large dataset. If we give the model a small dataset without making these tweaks, each time the model sees the image, it will essentially memorize the images as opposed to learning the characteristics of a well pad. It’s a pretty neat trick, because we can make a small dataset seem way larger than it actually is simply by mirroring the image on the y-axis or by rotating the image 90 degrees, for example. Our augmentation workflow is shown in Figure 5.

# Augmentation function to pass to Callback class.
def augment(image, mask):
 rand = np.random.randint(100)
  if rand < 25:
   image = tf.image.flip_left_right(image)
   mask = tf.image.flip_left_right(mask)

 elif rand >= 25 and rand < 50:
   image = tf.image.rot90(image)
   mask = tf.image.rot90(mask)

 elif rand >= 50 and rand < 75:
   image = tf.image.flip_up_down(image)
   mask = tf.image.flip_up_down(mask)

 else:
   pass

 return (image, mask)

# Callback for data augmentation.
class aug(tf.keras.callbacks.Callback):
 def on_training_batch_begin(self, batch, logs = None):
   batch.map(augment, num_parallel_calls = 5)
   batch.shuffle(10)

Figure 5: Augmentation function and checkpoints cell

Fitting the model to the dataset

Now it’s time to put this model to the test! We do this in a TensorFlow call known as .fit(). As the name suggests, it is going to “fit” the model to our input dataset. Let’s go ahead and take a look at the code from Figure 6, shown below. 

history = UNet.fit(
     x = training,
     epochs = model_epochs,
     steps_per_epoch = TRAIN_STEPS,
     validation_data = evaluation,
     validation_steps = EVAL_STEPS,
     callbacks = [aug(),cp,csv])

Figure 6: Fitting the model to the input dataset

It’s important to conceptually understand what each of the values passed into this function call represents. We start with the variable “x”: this expects us to pass in our training dataset, which was created earlier. The next argument is called epochs. Epochs describe how many times the model will see the entire dataset during the fitting process. This is somewhat of an arbitrary number, as some models can learn the desired information more quickly, thus requiring less training. Conversely, training a model for too long can become redundant or potentially lead to overfitting. Overfitting is when a model learns to memorize the images it’s trained on, but it doesn’t learn to generalize. Think of overfitting like memorizing a review sheet the night before a test; you memorize what is covered in the review, but any minor changes in the way questions are asked on the actual test could trip you up. For this reason, it is generally up to the user to determine how many epochs are deemed necessary based on the application. 

The next argument, steps_per_epoch (also validation_steps) describes how many batches of data should be taken from our training and validation sets respectively through each epoch. Batches are small chunks of the dataset; it is useful to divide up the dataset into batches to make the training process more computationally efficient. One would typically want to go through the whole dataset every epoch, so it’s best to set the steps as such. Validation_data is where we would specify the data we set aside during training to validate our model’s predictions. Remember, that data will not be seen by the model during its training cycle. The last argument is called callbacks. This is where we pass in the augmentation function. This function is instructed by our callback to run at the beginning of each new batch, therefore constantly changing the data during training. We also optionally pass in other callbacks which might be useful for later reference to our training session. Such callbacks might export the loss and metrics to our Google Drive in a comma-separated values format or might save checkpoints throughout the model, keeping track of which training epoch produces the lowest loss. There are many other pre-packaged callbacks which can be used; a full list of these callbacks can be found here. Now that we have all of that covered, it’s time to start learning! By running this code, we begin the training process and will continue until the model has finished running through all of the epochs we specified.

Once that has finished, we save the model and plot its metrics and its loss, as shown in Figure 7. Based upon how these plots look, we can tell how well we did during our training.

Figure 7: An example chart, showing plotted metrics (top) and loss (bottom). Metrics are used to evaluate the performance of our model, while loss is directly used during training to optimize the model. As such, a good model will have a greatly reduced loss by the time we reach the end of training.

And voila! You have made it through the second installment in our series. The next entry will cover post-processing steps of our machine learning workflow. Questions we will answer include:

– How do we make predictions on an image we’ve never seen before?

– How do we take a large image and chop it into smaller, more manageable pieces? 

– How do we take some new predictions and make them into polygons?

Stay tuned for our next entry, brought to you by Dr. Ry Covington, SkyTruth’s Technical Program Director. In case you missed it, be sure to check out the first post in this series. Happy skytruthing!

Drilling Detection with Machine Learning Part 1: Getting to Know The Training Data

Intern Sasha Bylsma explains the first steps in teaching computers how to detect oil and gas well pads from satellite imagery.

[This blog post is the first in a three-part series describing SkyTruth’s effort to automate the detection of oil and gas well pads around the world using machine learning. This tool will allow local communities, conservationists, researchers, policymakers, and journalists to see for themselves the growth of drilling in the areas they care about. By the end of the series, we will have demystified the basic principles of machine learning applied to satellite imagery, and will have a technical step-by-step guide for others to follow. At SkyTruth, we aim to inspire people to protect the environment as well as educate those who want to learn from our work to develop applications themselves that protect people and the planet.]

Detecting environmental disturbances and land use change requires two things: the right set of technological tools to observe the Earth and a dedicated team to discover, analyze, and report these changes. SkyTruth has both. It is this process of discovery, analysis, and publication — this form of indisputable transparency that SkyTruth offers by bringing to light the when, the where, and hopefully the who of environmental wrongdoings — that appealed to me most about this organization, and what ultimately led me to apply for their internship program this summer.

In my first weeks as an intern, I was tasked with analyzing dozens of ocean satellite images, searching for oil slicks on the sea surface left behind by vessels, which show up as long black streaks. As a student with an emerging passion for Geographic Information System science (GIS), I was eager to find a more efficient way to scan the oceans for pollution. I wished I could simply press a “Next” button and have tiles of imagery presented to me, so I could search for patterns of oil quickly. It was a relief to find out that my coworkers were developing such a solution. Instead of relying on me and others to scan imagery and recognize the patterns of oil, they were training a computer to do it for us, a project called Cerulean. They were using machine learning and computer vision to teach a model to learn the visual characteristics of oil spills in images, pixel by pixel, after giving it many examples. I was really interested in getting involved in this work, so I asked if I could join a different project using machine learning: detecting new oil and gas drilling sites being built in important habitat areas in Argentina. 

Creating the training data

One of my first tasks was to organize a handful of existing polygons that we would use to create training data, which is what we call the information that is used to teach the model. Using Google Earth Engine, I placed the polygons over imagery collected from the European Space Agency’s Sentinel-2 satellites. The imagery is pretty coarse – these satellites collect images with a 10 meter spatial resolution. This means that every pixel in the image covers 100 square meters on the ground. The imagery can also be pretty cloudy, so one of the first things that I had to do was remove the clouds by creating cloud-free composite images. Basically, this combines several images of the same place, but at different times, and only uses the pixels that aren’t cloudy. This allowed me to create a single, cloud-free image of each of our sample areas. Once we’d done that, we were ready to make examples for the model to take in. 

Figure 1 shows a visual representation of the process that my colleagues and I developed to create the training data. On the left, we have a view of two well pads in Colorado from the default satellite base map in Google Earth Engine. This is the same imagery that you would see in the “Satellite” view of Google Maps; it’s very high resolution commercial satellite imagery, so it’s easy to see objects like these drilling sites in great detail. In the middle is a Sentinel-2 image of the same well pads. Sentinel-2 imagery is publicly available for free, and it is the imagery source that we use for our model. On the right, we have the Sentinel-2 image overlain with the well pad polygons that I’ve manually drawn.

Figure 1: Overlaying well pad polygons onto Sentinel-2 images

From here, we want to be able to select an area that captures each well pad and its surroundings. To accomplish this, we take the center of each blue well pad polygon, create a buffer of 200 meters, and then select a random spot within that circular buffer zone to drop a point, which appears below as a red dot. Figure 2 illustrates this step.

Figure 2: Area capturing well pads

These red dots are then used as the center of a 256 pixel by 256 pixel square – what we call a patch – that will house the well pad and its surroundings. I’ve illustrated what this box would look like in Figure 3, just using the left well pad for simplicity.

Figure 3: A “patch” housing a well pad and its surroundings

Next, we need to classify the image into “well pad” and “no well pad” labels. We create a binary mask with white representing well pad areas and black representing no well pad areas. This mask covers the entire image, and Figure 4 is a closeup of the two well pads we’ve been looking at with the boundary box visible as well.

Figure 4: A mask with white representing well pad, and black representing everything else

Finally, let’s zoom into the extent of the white boundary and put it all together.

This pair of small pictures in Figure 5 – the image patch on the left and the image’s label on the right – is what the model sees. Well, almost. Let’s break it down. Every image is made up of pixels that have numerical values for the amount of red, green, and blue in that pixel — three colors. If you can imagine each pixel as being three numbers deep, you can then imagine that the colored image on the left is a matrix with the dimensions 256 x 256 x 3. The right image is a matrix with the dimensions 256 x 256 x 1, since it only has one channel storing the label: 0’s for black pixels and 1’s for white pixels. One of these pairs – an image and its label – constitutes a single example that will go into the model.   

Thousands of examples

In order for the model to learn, we needed to create thousands of examples. So, we mapped well pads in Colorado, New Mexico, Utah, Nevada, Texas, Pennsylvania, West Virginia, and Argentina to use for training examples. My team and I tried to keep a couple of additional things in mind. First, we created a dataset of “look-alike” well pads, meaning that we found areas that the model could easily mistake for a well pad (such as square parking lots, plots of farmland, housing developments, etc.) and made labels for them. This indicates to the model that although these examples share similar features, they are not well pads, and this strengthens the neural connections of the model by refining its definition of what a well pad is, by showing it what a well pad isn’t

Second, we made sure to capture some variation in the appearance of well pads. While some are very bright in contrast to their landscape, others were darker than their surroundings, and some, especially in the American West, are essentially a dirt patch on desert land. By collecting training data of both the obvious well pads and the harder-to-distinguish ones, we added variance and complexity to the model. Since in the real world some well pads are old, some have been overgrown by vegetation, and some are covered with equipment, it’s important to include several examples of these special cases in the training data so that the model can recognize well pads regardless of the condition that they might be in. 

Prepping the training data for the model

To complete the process of preparing the training data, I packaged up the examples as TFRecords, a data format that is ideal for working with TensorFlow, a popular machine learning platform. These TFRecords will be fed into the model so that it can learn the visual characteristics of well pads well enough to be able to detect drilling sites in previously unseen imagery. 

Now that we’ve discussed how to develop the training data, Geospatial Analyst Brendan Jarrell will explain how we developed our model in the second post in this series. 

 

 

Correcting Recent Reporting on Offshore Flaring in Guyana

Recent reporting misrepresented SkyTruth data.

We’re always glad to have conservation-minded groups and individuals use our flaring maps, but we would like to correct some errors in how our data was interpreted in two recent articles in the Stabroek News concerning natural gas flaring from an ExxonMobil-owned vessel, the Liza Destiny, anchored off the coast of Guyana. 

In early June, 2020, the Guyana Marine Conservation Society (GMCS) contacted SkyTruth to see if we could help monitor natural-gas flaring from the Liza Destiny. The Liza Destiny had mechanical issues that required it to continuously flare, and GMCS wanted to be able to verify the flaring that ExxonMobil was reporting.

This isn’t a request that SkyTruth can normally help with, but the unique circumstances surrounding the Liza Destiny allowed us to provide GMCS with some meaningful data. Our global flaring map is a visualization of flaring events detected around the world, every day, using satellite data. The source of our data is the Earth Observation Group, which identifies flaring based on measurements of brightness and temperature captured by National Oceanic and Atmospheric Administration satellites. Due to the low level of detail of these images (each pixel represents a spot on the ground about 750 meters across), we usually can’t pinpoint flaring to a specific source such as an individual oil or gas well. However, since there were no other flaring vessels near the Liza Destiny, we could confidently assign all flaring events within the satellite’s accuracy to this vessel. 

In mid-July, GMCS asked for an update containing the most recent data, which we provided by way of this document. The ensuing article in Stabroek News on July 25, 2020, erroneously claimed that our data showed the Liza Destiny was flaring from June 27 through July 7, a period when ExxonMobil reported to the Guyana EPA that there was no flaring because the vessel was undergoing maintenance.

Contrary to what the article suggests, the data SkyTruth provided did not contradict ExxonMobil. Our data did not show flaring on these dates, with the exception of June 28. It’s important to note that the lack of flaring in our data for that time period doesn’t conclusively prove there was no flaring, because clouds can block the satellites’ ability to “see” flares. 

And none of this is to imply that there are not legitimate concerns about the persistent, long-term flaring at this vessel documented in the data we shared with GMCS. 

SkyTruth Visualization and App of Drilling Near Chaco Canyon Available to Activists and Others

The Bureau of Land Management has permitted intensive oil and gas drilling around Chaco Culture National Historical Park, threatening a landscape that supports one of the most important cultural sites in the world.

[This discussion of the threats to Chaco Culture National Historical Park, and SkyTruth tools highlighting that threat, was written as a collaborative effort between SkyTruth team members Matthew Ibarra and Amy Mathews.]

Reminders of an ancient civilization dominate the desert landscape in northwestern New Mexico. Ruins of massive stone “Great Houses,” once several stories high with hundreds of rooms, remain at Chaco Culture National Historical Park. Their complexity and numbers reveal that a sophisticated culture thrived in this place a thousand years ago. Descendants of those native peoples — today’s Pueblo tribes and several Navajo clans — say that Chaco was a central gathering place where people shared ceremonies, traditions, and knowledge. Yet much about Chaco remains a mystery. During the late 1200s, construction of buildings and monuments slowed and the Chacoan people moved from the area. However, Chaco is still considered to be a spiritual and sacred place by many Native Americans. 

Parts of Chaco were first designated as a national monument by President Theodore Roosevelt in 1907. Eighty years later the United Nations recognized the monument as a World Heritage Site because of its unique cultural significance. Despite these protections, the area surrounding the park is now threatened. 

Over the past two decades, the federal Bureau of Land Management (BLM) has allowed oil and gas companies to drill hundreds of wells within 15 miles of the park using the technique known as hydraulic fracturing, or fracking. Fracking typically creates air and noise pollution, threatens water supplies, increases truck traffic on local roads, and harms communities with toxic chemicals. SkyTruth’s data on fracking in Pennsylvania has been used by scientists at Johns Hopkins University to demonstrate some of the harmful health effects associated with fracking. 

Many tribal groups have voiced concerns about the spiritual, cultural, physical and health impacts from drilling in the area. In September of 2019, the U.S. House of Representatives approved the Chaco Cultural Heritage Area Protection Act that would create a 10-mile buffer zone on federal lands around the park to prevent any future leasing for oil and natural gas drilling. Although the entire New Mexico congressional delegation supports this legislation, the Senate has not taken action on this bill. Reportedly, the bill does not have Republican support in the Senate, which substantially reduces its chances of becoming law under the current majority.

To illustrate the extent of drilling in recent years, SkyTruth created an animation of wells surrounding Chaco Culture National Historical Park by illustrating data from New Mexico’s Oil Conservation Division as well as using the most current imagery from  the U.S. Department of Agriculture’s National Agriculture Imagery Program (NAIP) as a backdrop. (See “About the Data” below to learn more about how we did this.) The visualization shows the growth of wells throughout the region surrounding the park, with distances from the park boundary delineated. New wells have emerged throughout the region in this time period, from the park boundary to 15 miles and beyond. The region within 15 miles of the park now contains 33% more oil and gas wells than it did in 2000 — an increase of 367 wells. 

The growth of oil and gas wells within a 15-mile radius of Chaco Culture National Historical Park from 2000 – 2018

Despite local opposition and congressional action, the BLM currently is proposing additional leasing for drilling around the park. The public comment period for input on this leasing plan has been extended to September 25th, 2020. (Click here for information on how to submit comments.) 

In addition to the animation of drilling build out, SkyTruth has also created still images showing the changes around Chaco Canyon from 2000-2018. Each image highlights change in drilling activity for the year and features the most recent NAIP imagery from 2018 as a backdrop.

Still images for each year in the animation

SkyTruth also has developed an interactive app that allows users to view a map of the Chaco Culture National Historical Park and its surrounding area with all the surrounding oil and gas wells. Users will be able to click on a well pad to see more information such as the well pad identification or the status of the well, such as whether it is being plugged, or is still fully operational. (See “About the App” section below.) This app can be viewed here.

Chaco Culture National Historical Park is home to the largest and best preserved ancient architectural structures in all of North America. It was home to communities throughout the 1000s and remains important to Native Americans and others. Today, this magnificent region is becoming an industrialized area cluttered by oil and gas wells and threatens to harm the people who honor this place of heritage. SkyTruth hopes the visualizations and tools we’ve created will help arm activists, draw attention to the leasing process, and support congressional action to protect a remarkable place.

About the Data

The data used to identify wells comes from New Mexico’s Oil Conservation Division. The link for this dataset can be found here, labeled Public FTP Site. This large dataset was analyzed to create buffer zones based on the distance to Chaco Culture National Historical Park. The dataset was used in QGIS — a geographic information system tool — alongside NAIP imagery exported from Google Earth Engine to create an accurate map of the data and wells. We used TimeManager, a plugin for QGIS, to create this visualization. TimeManager allows users to easily add data to the working map based on time. Wells were added to the working map by month starting from January of 2000 through September of 2019, creating over 200 still images. TimeManager also allows users to export these still images as frames to create an MP4 file. We then used Final Cut Pro to add an overlay over this MP4 and create a visualization with a legend, scale bar, and other necessary features. 

About the App  

The Chaco Canyon Well Inspector app allows users to pan and zoom around an interactive map and inspect each individual well around the Chaco Canyon area. Upon clicking an individual well point, data such as the well identification number (API number) and status becomes visible to the user. Users will be able to inspect the area surrounding Chaco Culture National Historical Park to see how the growing number of wells has impacted the surrounding area and gain a better understanding of the status of oil and gas wells in the area.

Update 9/9/20: The animation, photos, and Chaco Canyon Well Inspector app were updated to reflect the spud date at each site; that is, the date when wells officially broke ground for drilling.