Digitizing Ecological History
Joseph Grinnell was good at a great many things, but sitting still was not one of them. The famed field biologist journeyed to Alaska to collect birds as a 19-year-old, conducted zoology surveys at hundreds of sites throughout California, published an authoritative guide to western birds, and, in 1908, was chosen as the first director of Berkeley’s Museum of Vertebrate Zoology (MVZ).
Grinnell—like any modern field scientist—would probably tell you that the best part of his job was, well, the field. Long days in the forest, moonrise over the desert, chasing butterflies with a net. “We must confess that we have gotten more complete satisfaction, in other words happiness, out of one vacation trip into the mountains after rare birds and eggs than out of our two years of university work in embryology!” he once wrote in an editorial on the ethics of egg collection. Thanks in part to Grinnell, the UC system now manages nearly 60 field stations like Sagehen, Hastings, and Blodgett.
The problem, of course, is what to do with all those eggs (or fossils, or eagle feathers, or little baggies filled with vole feces) when you get back to campus. After you’ve used the samples for your own research, what then? If you’ve ever walked through the back rooms of the MVZ or the Essig Museum of Entomology, you’ve probably gotten the feeling that you’re in the presence of a whole lot of history. In fact, across all UC Berkeley’s natural history museums—MVZ, Essig, UC Paleontology Museum, and the University and Jepson Herbaria—over 16 million specimens are sitting in limbo, many of them untouched and all but forgotten since Joseph Grinnell and his colleagues put them there.
“The world is changing. It always has, it always will. But the current rate of change is unprecedented." –Rosemary Gillespie
Of course, an earthworm in amber isn’t just an earthworm. It’s data. Every sample tells a story about what our world was like at a specific time and place. Some bee specimens even include the pollen they carried, preserving a detailed narrative about their environment.
“The world is changing. It always has, it always will,” Rosemary Gillespie, professor of environmental science, policy, and management and Essig director, said at the launch event for the Berkeley Data Science Institute. “But the current rate of change is unprecedented.” Because of this, knowledge about the past is more vital than ever as we try to understand what the future holds. But all of that critical historical data is almost impossible to access—you can’t exactly Google 10,000 dried Depression-era acorns.
And yet that’s exactly what an ambitious new initiative is attempting to make possible. The Berkeley Ecoinformatics Engine—Holos for short, from the Greek word for “whole”—will eventually allow researchers to search, sort, and analyze all of the invaluable data collected by industrious UC scientists over the past 100-plus years. The idea, if not the execution, is simple: Digitize the data from every single item in the massive collection and then make the data available to researchers.
“We need to look at the broadest context of how organisms have changed in the past,” says Gillespie, who is the principal investigator for Holos. “If we use information about how change has happened in the past, we may then be able to build a trajectory to predict what will happen in the future.” While climate change is the most commonly discussed shift, the data from UC’s collections will be valuable for assessing the effects of all manner of transformative forces, such as agriculture, development, invasive species colonization, and genetic shifts.
Take the connection between climate and elevation, for example. When an area warms, it’s intuitive to hypothesize that plants and animals will move uphill to find cooler temperatures that better suit them. “But through all these studies that use historical surveys, it’s becoming apparent that it’s actually not as clear as that,” says Giovanni Rapacciuolo, a postdoc with the Berkeley Initiative in Global Change Biology who is helping to guide the early days of Holos.
“Many plants end up moving downhill to find wetter climates rather than cooler ones,” he says. “As the plants move downhill, so do the birds that depend on them, and the mammals that live within them, and so on.” So making predictions about future plant and animal migration requires a good analysis of the past. And it’s the past that Holos hopes to make available with the click of a button.
Cleaning Up Dirty Data
That’s the theory, at any rate. The practical realities of digitizing so much data are daunting. The available information is staggeringly varied in form, content, and quality. And just what does it mean to digitize a dragonfly? Historically, when scientists placed a specimen in storage, they would fill out a paper card to go along with it. The vast majority of specimens were cataloged before easy access to computers, so the card itself is often the only record.
And despite Grinnell’s attempts to standardize protocols for field observations, people have persisted in doing things their own way. “Back in the day, people didn’t have GPS readers,” says Rapacciuolo. “So a lot of these labels might only have a county or just say something like ‘Found 2 miles south of West Sacramento.’”
Of the four major collection types at Berkeley—insects, fossils, vertebrates, and plants—only the MVZ is completely digitized. But it also has the smallest collection—a mere 677,000 items—whereas the Museum of Paleontology has 6.5 million samples, over 95 percent of which still need digitization.
And it’s not just the museums. Historical records include archival soils, data from climate sensors, fossil pollen extracted from lakebeds, and even old photographs showing vegetation types in specific locations. Researchers will be able to overlay this unique data on top of publicly available base-layer records such as topo maps, fire records, and maps of political boundaries. Eventually, says Gillespie, scientists will be able to easily determine that “we found beetle X at point Y on date Z. And then with 100 years of data about location and abundance, you’ll really be able to paint a picture of how everything has changed.”
Crowdsourcing and Collaboration
Even working quickly, there’s not enough time, money, or bleary-eyed undergrad research assistants to enter all the data in-house. So Holos has turned to the Internet, crowdsourcing the project at notesfromnature.org, with users earning badges based on how much data entry they do. It’s not exactly as addictive as Tetris, but the number of people willing to donate a few minutes a day to science is surprisingly large.
UC data is idiosyncratic to wherever UC researchers have gone—especially the Sierras and the Richard B. Gump South Pacific Research Station in French Polynesia. Fortunately, other universities and governments have embarked on their own massive digitization processes from their unique corners of the globe. Holos exists not only to digitize UC’s data, but might also be able to serve as a computerized framework, integrating data from institutions far and wide.
Holos is already live, with more data and capabilities being added all the time. Access comes not a moment too soon. “We know we’re causing change very rapidly right now,” says Gillespie. “But we don’t know when we’ll reach the tipping point that prevents communities from responding at a rate that will allow them to continue to exist.” Understanding the specifics of how ecosystems have responded to change in the past might help us avoid ecological calamity in the very near future, she says.
Joseph Grinnell might not have had GPS on his wristwatch or an Excel spreadsheet to catalog his specimens, but he seems to have had something a lot like Holos in mind. “The greatest value of our museum,” he wrote in 1910, “will not . . . be realized until the lapse of many years, possibly a century, assuming that our material is safely preserved [so] the student of the future will have access to the original record of faunal conditions in California and the West, wherever we now work.”
Holos tools and information are live at globalchange.berkeley.edu/ecoinformatics-engine