Skip to content

helloworlddata/usgs-earthquakes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

U.S. Geological Survey Earthquake Events Archive

The U.S. Geological Earthquake Hazards program has a searchable catalog of earthquake data -- including geospatial coordinates, date/time to the milliseconds, and magnitude. This repository packages the data in easy-to-download CSV format for analysis, including packages small enough to fit into memory/spreadsheets.

Here's an example of a multi-layered data visualization (URL TK) created using ggplot, showing the massive increase of M3.0+ earthquakes in Oklahoma in the last decade:

img

As an animated GIF:

Earthquakes of at least magnitude 3.0 in the lower contiguous United States

The data packages

Here is the USGS data broken down into several, easy to download packages:

The stories

In the past few years, the most prominent use of the USGS data has been to examine the sudden surge of significant earthquakes in Oklahoma, purportedly due to fracking.

Here's a vignette (URL TK) using ggplot and ggmaps showing the earthquakes by year within Oklahoma:

ok

On September 3, 2016, the USGS recorded a M5.6 earthquake near Pawnee, Oklahoma, the state's biggest earthquake in recorded history.

NPR's StateImpact project has been covering the Oklahoma earthquakes and fracking for the past few years.

Drilling data is spotty, which makes it difficult to understand how activity has changed Oklahoma's underground structure. From the New Yorker longform story, Weather Underground (April 2015):

“We know more about the East African Rift than we know about the faults in the basement in Oklahoma.” In seismically quiet places, such as the Midwest, which are distant from the well-known fault lines between tectonic plates, most faults are, instead, cracks within a plate, which are only discovered after an earthquake is triggered. The O.G.S.’s Austin Holland has long had plans to put together two updated fault maps, one using the available published literature on Oklahoma’s faults and another relying on data that, it was hoped, the industry would volunteer; but, to date, no updated maps have been released to the public.

Recently, researchers and scientists have attempted to show a correlation between drilling activity and Oklahoma's earthquakes. Here's a paper funded by the Stanford Center for Induced and Triggered Seismicity:

Oklahoma’s recent earthquakes and saltwater disposal by F. Rall Walsh III and Mark D. Zoback, June 18, 2015, Science Advances: abstract pdf Stanford news article

Fig. 1 Earthquakes and injection wells in Oklahoma.

Data Sources

Development notes (i.e. about this data extraction/packaging process)

This is just me hacking around until I find proper conventions for making a hand-operated extract-transform-load system for data that is entirely based around hosting flat-files on Github (and the limitations thereof).

The Rakefile contains all the tasks needed to fetch, process, and package the data. The final products are in data/.

For example, to generate the file data/usgs-earthquakes-decade-1970.csv:

# create the subdirectories, including wrangle/corral
$ rake setup

# run it with --build-all to force it to rebuild dependencies
$ rake data/usgs-earthquakes-decade-1970.csv

This will run a series of Python 3 scripts in wrangle/scripts/, which are all one-off tasks that spit out to stdout.

The file data/usgs-earthquakes-decade-1970.csv is dependent on 120 separate data files in an untracked directory named wrangle/corral/fetched. Running the rake task will build that directory and fetch the required data from the USGS archive:

└── wrangle
    └── corral
        └── fetched
            ├── 1970-01.csv
            ├── 1970-02.csv
            ├── 1970-03.csv
            ...
            ├── 1979-10.csv
            ├── 1979-11.csv
            ├── 1979-12.csv

Each file represents a month's worth of data, e.g. 1970-03.csv represents the earthquake data in the USGS archive for March 1970.

This fetching is done by wrangle/scripts/fetch_month_from_archive.py, which is just a thin wrapper around this call:

http://earthquake.usgs.gov/fdsnws/event/1/query.csv?starttime=1970-03-01%2000:00:00&endtime=1970-04-01%2000:00:00&orderby=time-asc

Why does the fetching script only pull one month at a time? Because the USGS Archive won't return more than 20,000 hits per query, and fetching by month bypasses the need to write pagination logic in the fetching script.

Why does the data/ directory contain packages of arbitrary time periods, e.g. data/usgs-earthquakes-2010-through-2014.csv and data/usgs-earthquakes-decade-1980.csv? Because Github has a file size limit of 100MB.

There is also a soft limit for total size of a repo. For that reason, the wrangle/corral folder, which is where files are fetched and stored to, is not tracked.

This repo is not meant to be a direct mirror of the USGS archive, but to contain easy-to-access packages of data for educational/experimental purposes. http://www.nytimes.com/2016/09/04/us/earthquake-ties-record-for-strongest-in-oklahoma-history.html

About

Data fetched from the USGS Earthquake Archives

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors