Creating Volcano Maps with Pandas and the Matplotlib Basemap Toolkit
Transfer from http://ramiro.org/notebook/mapping-volcanoes/
Introduction
This notebook walks through the process of creating maps of volcanoes with Python. The main steps involve getting, cleaning and finally mapping the data.
All Python 3rd party packages used, except the Matplotlib Basemap Toolkit, are included with the Anaconda distribution and installed when you create an anaconda environment. To add Basemap simply run the command conda install basemap in your activated anaconda environment. To follow the code you should be familiar with Python, Pandas and Matplotlib.
Get into work
First load all Python libraries required in this notebook.
%load_ext signature %matplotlib inline import json from lxml import html from mpl_toolkits.basemap import Basemap import numpy as np import pandas as pd import matplotlib.pyplot as plt chartinfo = 'Author: Ramiro Gomez - ramiro.org | Data: Volcano World - Volcano.oregonstate.edu'
Get the volcano data
The data is downloaded and parsed using lxml. The Volcano World source page lists the volcano data in several HTML tables, which are each read into individual Pandas data frames that are appended to a list of data frames. Since the page also uses tables for layout the first four tables are omitted.
url ='http://volcano.oregonstate.edu/oldroot/volcanoes/alpha.html'
xpath='//table'
tree = html.parse(url)
tables = tree.xpath(xpath)
table_dfs = []
for idx in range(4, len(tables)):
df = pd.read_html(html.tostring(tables[idx]), header=0)[0]
table_dfs.append(df)
The next step is to create a single data frame from the ones in the list using Pandas’ concat method. To create a new index with consecutive numbers the index_ignore parameter is set to True.
df_volc = pd.concat(table_dfs, ignore_index=True)
Let’s take a look at the data contained in the newly created data frame.
print(len(df_volc)) df_volc.head(10)
1560
| Name | Location | Type | Latitude | Longitude | Elevation (m) | |
|---|---|---|---|---|---|---|
| 0 | Abu | Honshu-Japan | Shield volcanoes | 34.50 | 131.60 | 641 |
| 1 | Acamarachi | Chile-N | Stratovolcano | -23.30 | -67.62 | 6046 |
| 2 | Acatenango | Guatemala | Stratovolcano | 14.50 | -90.88 | 3976 |
| 3 | Acigöl-Nevsehir | Turkey | Caldera | 38.57 | 34.52 | 1689 |
| 4 | Adams | US-Washington | Stratovolcano | 46.21 | -121.49 | 3742 |
| 5 | Adams Seamount | Pacific-C | Submarine volcano | -25.37 | -129.27 | -39 |
| 6 | Adatara | Honshu-Japan | Stratovolcanoes | 37.64 | 140.29 | 1718 |
| 7 | Adwa | Ethiopia | Stratovolcano | 10.07 | 40.84 | 1733 |
| 8 | Afderà | Ethiopia | Stratovolcano | 13.08 | 40.85 | 1295 |
| 9 Agrigan | Mariana Is-C Pacific | Stratovolcano | 18.77 | 145.67 | 965 |
df_volc['Type'].value_counts()
Stratovolcano 601
Shield volcano 121
Stratovolcanoes 109
Submarine volcano 95
Volcanic field 82
Caldera 78
Cinder cones 62
Complex volcano 49
Shield volcanoes 31
Pyroclastic cones 30
Lava domes 26
Submarine volcano ? 19
Volcanic field 16
Fissure vents 16
Shield volcano 16
Submarine volcano 16
Maars 12
Compound volcano 10
Cinder cones 10
Lava dome 9
Cinder cone 9
Calderas 9
Pyroclastic cone 8
Tuff cones 7
Scoria cones 7
Maar 7
Unknown 6
Complex volcano 6
Somma volcano 5
Lava domes 5
...
Subglacial volcano 2
Lava cone 2
Explosion craters 2
Fissure vent 2
Fissure vents 2
Volcanic Landform 1
Tuff cone 1
Lava cone 1
Compound volcano 1
Scoria cones 1
Lava Cone 1
Lava cones 1
Lava Field 1
Somma volcanoes 1
Submarine volcano ? 1
Cinder Cone 1
Flood Basalt 1
Flood Basalts 1
Cone 1
Lava dome 1
Shield volcanoe 1
Pyroclastic cone 1
Stratovolcano ? 1
Complex volcanoes 1
Cones 1
Island Arc 1
Tuff rings 1
Crater rows 1
Hydrothermal field 1
Pumice cone 1
dtype: int64
Looking at the output we see that a single type may be represented by diffent tokens, for example Stratvolcano and Stratvolcanoes refer to the same type. Sometimes entries contain question marks, indicating that the classification may not be correct.
Cleaning the data
The next step is to clean the data. I decided to take the classification for granted and simply remove question marks. Also, use one token for each type and change the alternative spellings accordingly. Finally remove superfluous whitespace and start all words with capital letter.
def cleanup_type(s):
if not isinstance(s, str):
return s
s = s.replace('?', '').replace(' ', ' ')
s = s.replace('volcanoes', 'volcano')
s = s.replace('volcanoe', 'Volcano')
s = s.replace('cones', 'cone')
s = s.replace('Calderas', 'Caldera')
return s.strip().title()
df_volc['Type'] = df_volc['Type'].map(cleanup_type)
df_volc['Type'].value_counts()
Stratovolcano 713 Shield Volcano 173 Submarine Volcano 137 Volcanic Field 98 Caldera 87 Cinder Cone 84 Complex Volcano 56 Pyroclastic Cone 43 Lava Domes 31 Fissure Vents 18 Tuff Cone 14 Maars 12 Compound Volcano 11 Lava Dome 10 Scoria Cone 8 Pyroclastic Shield 8 Maar 7 Unknown 6 Somma Volcano 6 Subglacial Volcano 6 Crater Rows 5 Lava Cone 5 Fumarole Field 3 Pumice Cone 3 Explosion Craters 2 Volcanic Complex 2 Fissure Vent 2 Flood Basalt 1 Cones 1 Hydrothermal Field 1 Island Arc 1 Cone 1 Volcanic Landform 1 Lava Field 1 Tuff Rings 1 Flood Basalts 1 dtype: int64
Now let’s get rid of incomplete records.
df_volc.dropna(inplace=True) len(df_volc)
1513
Creating the maps
Volcanoes will be plotted as red triangles, whose sizes depend on the elevation values, that’s why I’ll only consider positive elevations, i. e. remove submarine volcanoes from the data frame.
df_volc = df_volc[df_volc['Elevation (m)'] >= 0] len(df_volc)
1406
Next I define a function that will plot a volcano map for the given parameters. Lists of longitudes, latitudes and elevations, that all need to have the same lengths, are mandatory, the other parameters have defaults set.
As mentioned above, sizes correspond to elevations, i. e. a higher volcano will be plotted as a larger triangle. To achieve this the 1st line in the plot_map function creates an array of bins and the 2nd line maps the individual elevation values to these bins.
Next a Basemap object is created, coastlines and boundaries will be drawn and continents filled in the given color. Then the volcanoes are plotted. The 3rd parameter of the plot method is set to ^r, the circumflex stands for triangle and r for red. For more details, see the documentation for plot.
The Basemap object will be returned so it can be manipulated after the function finishes and before the map is plotted, you’ll see why in a later example.
def plot_map(lons, lats, elevations, projection='mill', llcrnrlat=-80, urcrnrlat=90, llcrnrlon=-180, urcrnrlon=180, resolution='i', min_marker_size=2):
bins = np.linspace(0, elevations.max(), 10)
marker_sizes = np.digitize(elevations, bins) + min_marker_size
m = Basemap(projection=projection, llcrnrlat=llcrnrlat, urcrnrlat=urcrnrlat, llcrnrlon=llcrnrlon, urcrnrlon=urcrnrlon, resolution=resolution)
m.drawcoastlines()
m.drawmapboundary()
m.fillcontinents(color = '#333333')
for lon, lat, msize in zip(lons, lats, marker_sizes):
x, y = m(lon, lat)
m.plot(x, y, '^r', markersize=msize, alpha=.7)
return m
Map of Stratovolcanos
The 1st map shows the locations of Stratovolcanoes on a world map, so the data frame is filtered on the type column beforehand.
plt.figure(figsize=(16, 8))
df = df_volc[df_volc['Type'] == 'Stratovolcano']
plot_map(df['Longitude'], df['Latitude'], df['Elevation (m)'])
plt.annotate('Stratovolcanoes of the world | ' + chartinfo, xy=(0, -1.04), xycoords='axes fraction')

We can clearly see that most Stratovolcanoes are located, where tectonic plates meet. Let’s look at all volcanoes of some of those regions now.
Volcanoes of North America
The next map shows all North American volcanoes in the data frame. To display only that part of the map the parameters that determine the bounding box are set accordingly, i. e. the latitudes and longitudes of the lower left and upper right corners of the bounding box.
plt.figure(figsize=(12, 10))
plot_map(df_volc['Longitude'], df_volc['Latitude'], df_volc['Elevation (m)'],
llcrnrlat=5.5, urcrnrlat=83.2, llcrnrlon=-180, urcrnrlon=-52.3, min_marker_size=4)
plt.annotate('Volcanoes of North America | ' + chartinfo, xy=(0, -1.03), xycoords='axes fraction')

Volcanoes of Indonesia
Another region with many volcanoes is the Indonesian archipelago. Some of them like the Krakatoa and Mount Tambora have undergone catastrophic eruptions with tens of thousands of victims in the past 200 years.
plt.figure(figsize=(18, 8))
plot_map(df_volc['Longitude'], df_volc['Latitude'], df_volc['Elevation (m)'],
llcrnrlat=-11.1, urcrnrlat=6.1, llcrnrlon=95, urcrnrlon=141.1, min_marker_size=4)
plt.annotate('Volcanoes of Indonesia | ' + chartinfo, xy=(0, -1.04), xycoords='axes fraction'

Volcanoes of the world
The final map shows all volcanoes in the data frame and the whole map using a background image obtained from the NASA Web site. To be able to add this image to the map by calling the warpimage method, is why the plot_map function returns the Basemap object. Moreover, a title and an annotation are added with the code below.
plt.figure(figsize=(20, 12))
m = plot_map(df_volc['Longitude'], df_volc['Latitude'], df_volc['Elevation (m)'], min_marker_size=2)
m.warpimage(image='img/raw-bathymetry.jpg', scale=1)
plt.title('Volcanoes of the World', color='#000000', fontsize=40)
plt.annotate(chartinfo + ' | Image: NASA - nasa.gov',
(0, 0), color='#bbbbbb', fontsize=11)plt.show()
plt.show()

Bonus: volcano globe
In addition to these static maps I created this volcano globe. It is built with the WebGL globe project, that expects the following data structure [ latitude, longitude, magnitude, latitude, longitude, magnitude, … ].
To achieve this structure, I create a data frame that contains only latitude, longitude, and elevation, call the as_matrix method which creates an array with a list of lists containing the column values, flatten this into a 1-dimensional array, turn it to a list and save the new data structure as a JSON file.
df_globe_values = df_volc[['Latitude', 'Longitude', 'Elevation (m)']]
globe_values = df_globe_values.as_matrix().flatten().tolist()
with open('json/globe_volcanoes.json', 'w') as f:
json.dump(globe_values, f)
Summary
This notebook shows how you can plot maps with locations using Pandas and Basemap. This was a rather quick walk-through, which did not go into too much detail explaining the code. If you like to learn more about the topic, I highly recommend the Mapping Global Earthquake Activity tutorial, which is more in-depth and contains various nice examples. Have fun mapping data with Python!
%signature