Saturday, February 27, 2016

Scraping crowd-sourced shake reports to produce a cumulative shake map for Oklahoma earthquakes

Last year, Oklahoma had more earthquakes than ever before. In 2015, the Oklahoma Geological Survey (OGS) counted 5,691 earthquakes[1] centered in the state. That’s 270 more quakes than what Oklahoma experienced in 2014.

Along with more reports of earthquakes, came more reports of earthquake damage[2]. In one of the worst earthquake swarms in 2015, a chimney was torn from a house in Edmond[3], and an exterior wall of bricks came tumbling down from an apartment complex in northeast Oklahoma City[4]Much has been learned since Oklahoma's earthquake surge began in 2009. 

Scientists now link these earthquakes to the injection of waste water into deep disposal wells[5]. Water exists naturally in the earth along with oil and gas deposits, and when the oil and gas is drawn from the earth, the water comes with it. This water is separated from the oil and gas, and is disposed of in deep wells. Because these quakes are caused by human activities, they are known as “induced earthquakes.”[6]

Many questions remain, however. Namely, what are the long-term effects of having so many small earthquakes so frequently? And how is it possible to compare the impact of these quakes across Oklahoma? The United States Geological Survey produces damage estimates automatically after significant earthquakes, but it does not produce damage estimates for swarms of smaller earthquakes, which may last for months or years.

The difficult task of measuring a quake

An earthquake swarm in Edmond, OK, caused significant property damage for some homes in December, 2015. Photo by OKC station News9.

Instead of damage assessments, the public usually is told that an earthquake is a “Magnitude 4.0,” or “Magnitude 2.5,” or some other number. Magnitude is a way to measure the size of an earthquake and compare it to historic earthquakes. But what does magnitude really mean?

The answer has to do with the instruments that scientists use to record quakes. Seismometers are instruments that measure ground motion, and the charts they produce are called seismograms. Even though the public may not know how to interpret a seismogram, most people associate the tell-tale “squiggly” line of a seismogram with an earthquake.

Magnitude is a measurement of earthquake strength based on the maximum amplitude recorded by seismometers (the maximum “height” of the “squiggle”) and the duration of the quake (length of the “squiggle”)[7]. Ultimately, this magnitude is based on the energy released during an earthquake.

Interpretation of magnitude from an electronic seismogram. The amplitude and period of the waveform are used to calculate earthquake magnitude, along with the distance between the recording instrument and the earthquake epicenter. Image by L. Braile of Purdue University, of an electronic recording of a magnitude 7.8 earthquake in Colima, Mexico on January 22, 2003.

Scientists have said before that magnitude is difficult for people to comprehend[8], and some of the confusion lies in the fact that magnitude is logarithmically scaled. In other words, a magnitude 4.0 earthquake actually is 32 times larger than a magnitude 3.0 quake.

For example, the magnitude 5.6 quake that struck Prague, Oklahoma in 2011, the largest ever recorded in the state, was 31,623 times larger than the average Oklahoma earthquake in 2015 (a magnitude 2.6, according to USGS data)[9]. The Prague quake destroyed six homes, and 20 homes sustained damage averaging $80,000 each[10]. Approximately 20 miles from the epicenter, in Shawnee, a brick turret atop Benedictine Hall at St. Gregory’s University toppled. From reports by citizens, we know a single magnitude 2.6 quake might only be felt by those within 10 miles from the earthquake (more on that later).

If damage estimates aren’t available, how would it be possible to compare areas that are experiencing different numbers of earthquakes, earthquakes of different magnitudes, and earthquakes of varying distances? If a town is near four magnitude 3.0 earthquakes, is it better or worse off than a town that has a single 4.0 quake that is farther away?

Measuring the total amount of ground movement can provide some insight into the local impact of quakes over time[11]. Earthquakes generate a series of waves of energy that travel through rock and soil, and the energy from each wave can be recorded and summed over time. Engineers look at this information, called Cumulative Absolute Velocity (CAV), in determining whether ground motion could be damaging to buildings. The measurement could even form the basis for early warning and response systems[12].

Unlike magnitude, CAV is not a measurement of the strength of an earthquake. Rather, it is the measurement of the effects of an earthquake at a particular location, such as an office building or a nuclear power plant. Usually a critical structure, such as a nuclear power plant, is equipped with instruments to record the ground motion[13]. While there is a strong correlation between the magnitude of an earthquake and the cumulative absolute velocity of a structure[14], as of yet there is no model for estimating the CAV for structures in Oklahoma affected by induced earthquakes.

Short of installing more instruments and conducting a new study, a different approach is required to determine cumulative impact of these induced earthquakes. One such alternative uses people as the “instruments” or “sensors” in a seismic network.

Another way to measure earthquake impact

A potential solution for communicating cumulative ground motion was proposed during a November 2014 workshop in Midwest City, Oklahoma, organized by the USGS and OGS. In its report on this workshop[15], USGS and OGS researchers noted that “A cumulative shake map could be produced for areas that have reported significant levels of induced seismicity, such as Oklahoma. This map would display the maximum intensity observed at any location over a given past time-interval based on Did You Feel It? public responses.”

Intensity is a concept different than both magnitude and CAV. Like CAV, it does not measure the strength of the earthquake, but rather measures the effect of the earthquake on a particular area. Unlike CAV, it is a qualitative measure of the ground shaking at a particular site. The USGS measures intensity on the Modified Mercalli Intensity scale, or MMI. This intensity scale was developed, in part, because of a need to correlate instrument data with the destructive effects of an earthquake. “Though the importance of the factor of acceleration is recognized, we have as yet no satisfactory definition of intensity, no formula expressing earthquake violence in ground movement,” Harry Wood and Frank Neumann wrote in their 1931 paper on the index[16].

To help differentiate magnitude from intensity, the MMI scale makes use of Roman numerals. “I” on the MMI scale indicates the earthquake was only felt by very few people. Earthquakes become more noticeable at an intensity of “III,” but people may not immediately recognize the movement as an earthquake and might mistake the vibration for a passing truck. At “V”, many people are awakened if they are sleeping, and windows and dishes might break. Ordinary structures may experience slight damage at “VII.” The scale tops out at “XII,” which means utter devastation.

Did You Feel It?, otherwise known as DYFI, is a USGS program to collect intensity data from citizens, by surveying the public on what was experienced during the quake. For each earthquake in the United States, the USGS creates a web page with the magnitude, location, depth of the quake, and other information. If an earthquake is of a certain size, the page might also show monetary damage estimates that are calculated by the Prompt Assessment of Global Earthquakes for Response (PAGER) system[17].

A USGS PAGER report for a magnitude 5.1 earthquake in Oklahoma, showing estimated fatalities, economic losses, and population exposure.

The page also contains a link to a questionnaire where people are asked whether they felt the earthquake, whether it made objects in the house move, what kind of damage buildings experienced, and other questions. This is the DYFI survey.

The DYFI survey is based on a questionnaire that was developed to study the intensity of a 1994 quake in Northridge, California[18]. That study also established an equation that the USGS uses to interpret intensity, called “Community Decimal Intensity,” or CDI, from questionnaires. The USGS aggregates all questionnaires by ZIP code, and calculates the average of the reported intensities for a ZIP code.

The resulting intensity is known as “Community Internet Intensity,” or CII, where the community is the ZIP code. CII is similar to MMI, but USGS warns that “because there are major differences in the data and procedures used to assign the two types of intensities, the Community Internet Intensities cannot be considered to be identical to the USGS Modified Mercalli Intensities.”

A USGS-produced "Community Internet Intensity Map" created from crowd-sourced surveys on earthquake intensity. Geometries in the map are of ZIP codes. A scale is included to translate color into intensity and reported shaking and/or damage.

USGS noted that different locations within a ZIP code may experience different levels of intensity due to type of structure, construction, and other factors[19]. While the USGS makes attempts to weed out unusual observations, the DYFI data product may still contain errors. But despite the potential “messiness” of using internet questionnaires, studies have shown that the DYFI data is not only useful for rapid processing of post-earthquake information, but also can be used in quantitative research. “The key to the usefulness of the data is simply this,” wrote Gail Atkison and David Wald in their 2007 study on DYFI data[20], “They make up in quantity what they lack in quality.” In their research, Atkison and Wald found a strong relationship between DYFI data and ground motion data.

To summarize, citizens send reports of shaking, the USGS finds the average reported intensity for a ZIP code, and the resulting data tracks very well with ground motion data. But how to get to a cumulative shake map from this data?

I propose the following: intensity is a function of earthquake magnitude, such that it is possible to estimate with some certainty the intensity very close to an earthquake, near the earthquake hypocenter. Also, an earthquake of a given magnitude will release a certain amount of energy. Given these two relationships, it is possible to arrive at an energy value by treating each DYFI response as if the response was given at the earthquake hypocenter.

Neither intensity nor magnitude can be summed in any meaningful way, but energy can be cumulative. Thus, a cumulative shake map can be generated by translating intensity from DYFI data into energy in this manner.


USGS's earthquake database was queried for three regions in Oklahoma, comprising 91.6% of Oklahoma's total land area. The earthquake database cannot be queried for earthquakes in a given state, only geographic coordinates are permitted. The query generated a list of 5,778 earthquakes from 2000-2015.

It is important to note that this is a much smaller sample than the 18,021 earthquakes OGS recorded for the same time period. This large difference in the number of earthquakes recorded could be due to USGS pulling its historical data from the Advanced National Seismic System (ANSS) Comprehensive Catalog (ComCat), based at the Northern California Data Center, which may have not yet integrated OGS's regional seismic catalog. Oklahoma’s seismic network is not listed among the ANSS catalog's contributing networks[21].

This map, created in ArcGIS Earth, shows the queried regions of Oklahoma with respect to the earthquakes recorded by OGS from 2000-2015.

A Google Apps Script was utilized to scrape the web pages of each of those earthquakes and search for links to DYFI data. When DYFI data was discovered for a given earthquake, the script appended the DYFI data to a Google Sheet. When all earthquake pages were scraped, the DYFI data was downloaded locally from the Google Sheet.

USGS recorded 188,707 individual shake reports related to Oklahoma earthquakes between 2000 and 2015, from 3,374 unique ZIP codes across 33 states. Hereafter, only the DYFI data from Oklahoma ZIP codes were used, narrowing the dataset to 127,779 individual reports across 561 ZIP codes. Of the Oklahoma earthquakes in the USGS database, 2,433 resulted in any shake reports within the state (42%).

A scatter plot showing CII data from inside Oklahoma, from 2002-2015. Dots representing reports from larger earthquakes are shown as darker points, while dots from smaller earthquakes are lighter. Larger earthquakes were reported for more than 100 miles, while smaller earthquakes may only be reported nearby.

In order to translate the DYFI intensity data into energy, it was necessary to develop an equation that best fit the existing DYFI data. Any DYFI entry where a ZIP code submitted less than five individual shake reports for an earthquake was discarded. DYFI data for earthquakes between M2.5 and M5.6 (the strongest quake recorded in Oklahoma for the period) was binned according to magnitude in 0.1 increments (e.g., M2.5, M2.6, M2.7, and so on). Each bin was processed in R using bagplot (bivariate boxplot) from the Another Plot PACkage[22] to identify outliers. A total of 44 outliers were removed from the data based on bagplots, leaving 2,234 data points.

A graph generated in R, showing the bagplot for the magnitude 3.3 bin. One outlier appeared as a red dot outside of the bagplot's "fence"; a reported 4.7 intensity at 9 miles from the earthquake hypocenter.

Multiple linear regression was used in R to find the relationship between intensity, earthquake magnitude, and distance from earthquake hypocenter for the Oklahoma earthquakes. With this dataset and method, the CDI data could be described in a linear relationship with magnitude and distance with an average error of 0.4591 CDI units (this is only slightly higher than the Atkison and Wald study). Additionally, the best-fit line has an R squared of 0.48, meaning 48% percent of the intensity could be explained by magnitude and distance alone.

Given an earthquake of moment magnitude M, at a location D miles from the hypocenter, the CDI intensity I can be described from the following equation:

I = 0.93M - 1.15logD + 1.14

Using the new equation, the intensity near the hypocenter (1 mile) would approximately be:

I = 0.93M + 1.14

In finding the cumulative energy displacement experienced by a ZIP code, the following relationship between energy and magnitude was used[23], where E is energy in Joules:

log E = 1.5M  + 4.8

Solving for E yields the equation:

E = 10^(1.5M  + 4.8)

Substituting intensity for magnitude yields the relationship between energy and intensity:

E = 10^(1.62I + 2.96)

Using data from 2015 earthquakes exclusively, for each quake in which a ZIP code submitted  DYFI surveys, the CDI was entered in the energy-intensity equation to arrive at an energy value for the ZIP code. Those energy values were summed for each earthquake in 2015.

This information was plotted in CartoDB along with information about the frequency of shake reports from the ZIP code in 2015, the average intensity reported to USGS in 2015, and the maximum reported intensity in 2015.


The Oklahoma equation is similar to the equation used by the USGS. The USGS has also developed equation to estimate CDI on the Eastern and Western United States, based on DYFI reports.

The Western equation is as follows:

I = 1.01M - 0.00054D - 1.72logD + 1.15

USGS equation for the Eastern US is as follows:

I = 1.29M - 0.00051*D - 2.16logD + 1.60

A comparison of intensity over distance for a magnitude 2.6 earthquake, as predicted from USGS West and East equations, and the equations developed here for Oklahoma.

One visible characteristic of the graph of the Oklahoma equation is how its slope becomes very shallow as the CDI drops below 2. This may reflect how unlikely DYFI data is to capture many reports of shaking below an intensity of 2 (see scatter plot). The equation developed from Oklahoma’s DYFI data yields a lower CDI at a given distance than is predicted by both USGS equation.

Data from the analysis was overlaid onto a CartoDB map, including information on frequency of reports, average intensity, and maximum intensity.

The following table consists of the 10 Oklahoma ZIP codes that experienced the most shaking energy, based on the equation. Energy here is represented in mega joules (MJ). 

The frequency listed is how often, on average, a ZIP code submitted shaking reports due to an earthquake during 2015. Population totals and population in poverty figures were collected from the United States Census.

ZIP code
avg intensity
max intensity
poverty rate
8D 21H
28D 1H
33D 4H
10D 17H
33D 4H
28D 1H
4D 19H
73D 0H
1D 20H
73D 0H

The ZIP code that experienced the most cumulative shaking energy (Medford, ZIP code 73759) was not the ZIP code that most often submitted shake reports. The ZIP code that most often submitted shake reports was 74023. That area includes the town of Cushing, which held 12.5 percent of United States’ commercial oil stockpiles in 2015[24], which has prompted federal officials to consider the earthquakes a national security risk[25].

Even though Cushing reported, on average, shaking from an earthquake every 36 hours, its cumulative energy was 13% of that experienced by Medford.

The next three areas with the highest reporting frequency were Stillwater's 74074 ZIP code (every 38 hours), followed by Edmond's 73013 ZIP code (41 hours), and Oklahoma City's 73126 ZIP code (43 hours).

Questions for future research

As was mentioned, DYFI data is not perfect. The USGS warns that people may report different intensities depending on the quality of the structure they were in at the time of the earthquake. Furthermore, reports are aggregated by ZIP code, but ZIP codes are not uniform geographic shapes.

Distance in this dataset is determined by the distance between the center of the largest town within the ZIP code and the earthquake hypocenter, so all reports in a ZIP code are treated as having originated from this town. Oklahoma is not a geographic area with uniform geology and soil composition, and rock and soil of differing compositions will transmit seismic energy differently[26].

Blind spots may exist in this data. Whether a person submits a shake report depends on whether the person has a reliable internet connection, and there still exists a “digital divide” between poorer and wealthier homes. Wireless infrastructure is helping bring internet connections to poorer communities, however, and DYFI reports can be submitted from a smartphone[27]Yet, while rural areas have more access to wireless internet than in previous years, those communities consistently have greater poverty than urban areas in the Unites States[28]

Meanwhile, a larger population in general means more “sensors” for that community, which means it may be more “sensitive” to shaking and more likely to make DYFI reports. Therefore, larger and/or wealthier communities may appear to have more cumulative shaking than poor, rural communities which may in fact be harder-hit by earthquakes.

The ZIP codes in the top 10 list are rural ZIP codes, but they are not necessarily the poorest ZIP codes in the state. Among the top 10 most-shaken ZIP codes in this analysis, there is a 15.6% poverty rate, which is about 1.3% lower than the state rate, and about 0.2% greater than the national rate[29]. The most poverty-stricken communities are in the south and eastern parts of the state, which have not experienced the majority of the induced earthquakes[30].

All of the factors mentioned contribute to variability in the dataset and inaccuracy in the equation. Yet they also are excellent questions to pursue in further research. In particular, an exploration of the potential link between poverty and internet access, and the likelihood of being accounted for in the DYFI dataset, could provide important policy-guiding information. Accounting for these additional factors could lead to a better cumulative shaking map, and potentially even a better impact prediction and estimation system for Oklahoma’s earthquakes than PAGER.

Better data, specifically cumulative absolute velocity measurements generated from a network of home-based devices (an internet of things for earthquakes), would yield improved accuracy. But for now, the cumulative shake map can serve as a guide for where to look for impact and damage from induced seismic activity, and where to allocate resources for future research.

[1] Oklahoma Geological Survey. (2016). Earthquake catalog. Retrieved Jan. 26, 2016, from
[2] Damrill, K., Schammert, B. (2015, Dec. 29). Four earthquakes, including 4.3 magnitude, rock Edmond area Tuesday. Retrieved from
[3] Santos, P. (2015, Dec. 29). Chimney torn from Edmond home after earthquake. Retrieved from
[4] Santos, P. (2015, Dec. 30). Walls of northeast OKC apartment complex collapse after quake. Retrieved from
[5] Andrews, R., Holland, A. (2015). Statement on Oklahoma Seismicity. Norman, OK: Oklahoma Geological Survey. Retrieved from
[6] United States Geological Survey. (2015). Induced Earthquakes. Retrieved Jan. 26, 2016, from
[7] Earthquakes and Volcanoes Volume 21 Number 1 1989). Schnabel, D. (1989). Earthquakes & Volcanoes, Volume 21, Number 1. Retrieved from
[8] LaFrance, A. (2014) Is There a Better Way to Measure Earthquakes? The Atlantic. Retrieved from
[9] Baer, E. (2007). How big was that quake? Retrieved from
[10] Branstetter, Z. (2015) Attorney sues energy companies seeking class-action status for earthquake victims in nine counties. Tulsa World. Retrieved from
[11] Campbell, K.W., Bozorgnia, Y. (2012) Use of Cumulative Absolute Velocity (CAV) in Damage Assessment. Proceedings of the 15th World Conference on Earthquake Engineering, Lisbon, September 2012, Paper number 2965.
[12] Fahjan, Y. M., Alcik, H., Sari, A. (2011). Applications of cumulative absolute velocity to urban earthquake early warning systems. Journal of Seismology, 15(2), 355-373. doi: 10.1007/s10950-011-9229-8
[13] United States Nuclear Regulatory Commission. (1997) Pre-earthquake planning and immediate nuclear power plant operator postearthquake actions. (Regulatory guide 1.166). Washington, DC: Author. Retrieved from
[14] Kostov, M. K. (2005). Site specific estimation of cumulative absolute velocity. Proceedings of the 18th International Conference on Structural Mechanics in Reactor Technology, Beijing, August 2012.
[15] (Incorporating Induced Seismicity in the 2014 United States National Seismic Hazard Model) Petersen, M.D., Mueller, C. S., Moschetti, M. P., Hoover, S. M., Rubinstein, J. L., Llenos, A. L., Michael, A.  J., Ellsworth, W. L., McGarr, A. F., Holland, A. A., & Anderson, J. G. (2015). Incorporating Induced Seismicity in the 2014 United States National Seismic Hazard Model—Results of 2014 Workshop and Sensitivity Studies. Reston, VA: United States Geological Survey. Retrieved from
[16] Wood, H., & Neumann, F. (1931) Modified Mercalli Intensity Scale of 1931. Bulletin of the Seismological Society of America, Vol. 21, 277-283.
[17] United States Geological Survey. (2011). PAGER – Background. Retrieved from
[18] Dengler, L. A., & J. W. Dewey (1998). An Intensity Survey of Households Affected by the Northridge, California, Earthquake of 17 January, 1994. Bulletin of the Seismological Society of America, Vol. 88, 441-462.
[19] United States Geological Survey. (2014). DYFI Disclaimer. Retrieved from
[20] Atkinson, G. M., & D. J. Wald. (2007). "Did You Feel It?" intensity data: A surprisingly good measure of earthquake ground motion. Seismological Research Letters, 78, 362-368.
[21] Northern California Earthquake Data Center. (2015). Earthquake Catalog Details. Retrieved from
[22] Wolf, H.P. (2014). aplpack: Another Plot PACKage. Retrieved from
[23] Bormann, P., & D. Di Giacomo. The moment magnitude Mw and the energy magnitude Me: common roots and differences. Journal of Seismology, v.15, 2011, 411. doi:10.1007/s10950-010-9219-2
[24] Wilmoth, A. (2015, Aug. 2). GUSHING INTO CUSHING: Oil fills major storage hub in small Oklahoma town. The Oklahoman. Retrieved from
[25] Wertz, J. (2015, Nov. 30). Confidence In Oil Hub Security Shaken By Oklahoma Earthquakes. State Impact Oklahoma. Retrieved from
[26] Siddiqi, J. (2000). Horizontal-to-vertical component ratios for earthquake ground motions recorded on hard rock sites in Canada (Master’s thesis). Ottawa, Canada: Carleton University.
[27] Horrigan, J. & Duggan, M. (2015, Dec. 21) Home Broadband 2015. Washington, D.C.: Pew Research Center. Retrieved from
[28] Shinn, P. (2009, Aug. 17). Surprising causes of rural poverty. Retrieved from
[29] United States Census Bureau. (2015). State & County QuickFacts. Retrieved from
[30] Wertz, J. (2011). Mapped: An Overview of Poverty in Oklahoma. State Impact Oklahoma. Retrieved from