U.S. Geological Survey Earthquake Events Archive

The U.S. Geological Earthquake Hazards program has a searchable catalog of earthquake data -- including geospatial coordinates, date/time to the milliseconds, and magnitude. This repository packages the data in easy-to-download CSV format for analysis, including packages small enough to fit into memory/spreadsheets.

Here's an example of a multi-layered data visualization (URL TK) created using ggplot, showing the massive increase of M3.0+ earthquakes in Oklahoma in the last decade:

As an animated GIF:

The data packages

Here is the USGS data broken down into several, easy to download packages:

The stories

In the past few years, the most prominent use of the USGS data has been to examine the sudden surge of significant earthquakes in Oklahoma, purportedly due to fracking.

Here's a vignette (URL TK) using ggplot and ggmaps showing the earthquakes by year within Oklahoma:

On September 3, 2016, the USGS recorded a M5.6 earthquake near Pawnee, Oklahoma, the state's biggest earthquake in recorded history.

NPR's StateImpact project has been covering the Oklahoma earthquakes and fracking for the past few years.

Drilling data is spotty, which makes it difficult to understand how activity has changed Oklahoma's underground structure. From the New Yorker longform story, Weather Underground (April 2015):

“We know more about the East African Rift than we know about the faults in the basement in Oklahoma.” In seismically quiet places, such as the Midwest, which are distant from the well-known fault lines between tectonic plates, most faults are, instead, cracks within a plate, which are only discovered after an earthquake is triggered. The O.G.S.’s Austin Holland has long had plans to put together two updated fault maps, one using the available published literature on Oklahoma’s faults and another relying on data that, it was hoped, the industry would volunteer; but, to date, no updated maps have been released to the public.

Recently, researchers and scientists have attempted to show a correlation between drilling activity and Oklahoma's earthquakes. Here's a paper funded by the Stanford Center for Induced and Triggered Seismicity:

Oklahoma’s recent earthquakes and saltwater disposal by F. Rall Walsh III and Mark D. Zoback, June 18, 2015, Science Advances: abstract pdf Stanford news article

Data Sources

Archive search form: http://earthquake.usgs.gov/earthquakes/search/
USGS CSV specification: http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php

Development notes (i.e. about this data extraction/packaging process)

This is just me hacking around until I find proper conventions for making a hand-operated extract-transform-load system for data that is entirely based around hosting flat-files on Github (and the limitations thereof).

The Rakefile contains all the tasks needed to fetch, process, and package the data. The final products are in data/.

For example, to generate the file data/usgs-earthquakes-decade-1970.csv:

# create the subdirectories, including wrangle/corral
$ rake setup

# run it with --build-all to force it to rebuild dependencies
$ rake data/usgs-earthquakes-decade-1970.csv

This will run a series of Python 3 scripts in wrangle/scripts/, which are all one-off tasks that spit out to stdout.

The file data/usgs-earthquakes-decade-1970.csv is dependent on 120 separate data files in an untracked directory named wrangle/corral/fetched. Running the rake task will build that directory and fetch the required data from the USGS archive:

└── wrangle
    └── corral
        └── fetched
            ├── 1970-01.csv
            ├── 1970-02.csv
            ├── 1970-03.csv
            ...
            ├── 1979-10.csv
            ├── 1979-11.csv
            ├── 1979-12.csv

Each file represents a month's worth of data, e.g. 1970-03.csv represents the earthquake data in the USGS archive for March 1970.

This fetching is done by wrangle/scripts/fetch_month_from_archive.py, which is just a thin wrapper around this call:

http://earthquake.usgs.gov/fdsnws/event/1/query.csv?starttime=1970-03-01%2000:00:00&endtime=1970-04-01%2000:00:00&orderby=time-asc

Why does the fetching script only pull one month at a time? Because the USGS Archive won't return more than 20,000 hits per query, and fetching by month bypasses the need to write pagination logic in the fetching script.

Why does the data/ directory contain packages of arbitrary time periods, e.g. data/usgs-earthquakes-2010-through-2014.csv and data/usgs-earthquakes-decade-1980.csv? Because Github has a file size limit of 100MB.

There is also a soft limit for total size of a repo. For that reason, the wrangle/corral folder, which is where files are fetched and stored to, is not tracked.

This repo is not meant to be a direct mirror of the USGS archive, but to contain easy-to-access packages of data for educational/experimental purposes. http://www.nytimes.com/2016/09/04/us/earthquake-ties-record-for-strongest-in-oklahoma-history.html

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
catalog		catalog
meta		meta
vignettes		vignettes
wrangle/scripts		wrangle/scripts
.gitignore		.gitignore
README.md		README.md
Rakefile		Rakefile
TODOS.md		TODOS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

U.S. Geological Survey Earthquake Events Archive

The data packages

The stories

Data Sources

Development notes (i.e. about this data extraction/packaging process)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

U.S. Geological Survey Earthquake Events Archive

The data packages

The stories

Data Sources

Development notes (i.e. about this data extraction/packaging process)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages