Saving Climate Data (Part 6)

23 February, 2017

Scott Pruitt, who filed legal challenges against Environmental Protection Agency rules fourteen times, working hand in hand with oil and gas companies, is now head of that agency. What does that mean about the safety of climate data on the EPA’s websites? Here is an inside report:

• Dawn Reeves, EPA preserves Obama-Era website but climate change data doubts remain, InsideEPA.com, 21 February 2017.

For those of us who are backing up climate data, the really important stuff is in red near the bottom.

The EPA has posted a link to an archived version of its website from Jan. 19, the day before President Donald Trump was inaugurated and the agency began removing climate change-related information from its official site, saying the move comes in response to concerns that it would permanently scrub such data.

However, the archived version notes that links to climate and other environmental databases will go to current versions of them—continuing the fears that the Trump EPA will remove or destroy crucial greenhouse gas and other data.

The archived version was put in place and linked to the main page in response to “numerous [Freedom of Information Act (FOIA)] requests regarding historic versions of the EPA website,” says an email to agency staff shared by the press office. “The Agency is making its best reasonable effort to 1) preserve agency records that are the subject of a request; 2) produce requested agency records in the format requested; and 3) post frequently requested agency records in electronic format for public inspection. To meet these goals, EPA has re-posted a snapshot of the EPA website as it existed on January 19, 2017.”

The email adds that the action is similar to the snapshot taken of the Obama White House website.

The archived version of EPA’s website includes a “more information” link that offers more explanation.

For example, it says the page is “not the current EPA website” and that the archive includes “static content, such as webpages and reports in Portable Document Format (PDF), as that content appeared on EPA’s website as of January 19, 2017.”

It cites technical limits for the database exclusions. “For example, many of the links contained on EPA’s website are to databases that are updated with the new information on a regular basis. These databases are not part of the static content that comprises the Web Snapshot.” Searches of the databases from the archive “will take you to the current version of the database,” the agency says.

“In addition, links may have been broken in the website as it appeared” on Jan. 19 and those will remain broken on the snapshot. Links that are no longer active will also appear as broken in the snapshot.

“Finally, certain extremely large collections of content… were not included in the Snapshot due to their size” such as AirNow images, radiation network graphs, historic air technology transfer network information, and EPA’s searchable news releases.”

‘Smart’ Move

One source urging the preservation of the data says the snapshot appears to be a “smart” move on EPA’s behalf, given the FOIA requests it has received, and notes that even though other groups like NextGen Climate and scientists have been working to capture EPA’s online information, having it on EPA’s site makes it official.

But it could also be a signal that big changes are coming to the official Trump EPA site, and it is unclear how long the agency will maintain the archived version.

The source says while it is disappointing that the archive may signal the imminent removal of EPA’s climate site, “at least they are trying to accommodate public concerns” to preserve the information.

A second source adds that while it is good that EPA is seeking “to address the widespread concern” that the information will be removed by an administration that does not believe in human-caused climate change, “on the other hand, it doesn’t address the primary concern of the data. It is snapshots of the web text.” Also, information “not included,” such as climate databases, is what is difficult to capture by outside groups and is what really must be preserved.

“If they take [information] down” that groups have been trying to preserve, then the underlying concern about access to data remains. “Web crawlers and programs can do things that are easy,” such as taking snapshots of text, “but getting the data inside the database is much more challenging,” the source says.

The first source notes that EPA’s searchable databases, such as those maintained by its Clean Air Markets Division, are used by the public “all the time.”

The agency’s Office of General Counsel (OGC) Jan. 25 began a review of the implications of taking down the climate page—a planned wholesale removal that was temporarily suspended to allow for the OGC review.

But EPA did remove some specific climate information, including links to the Clean Power Plan and references to President Barack Obama’s Climate Action Plan. Inside EPA captured this screenshot of the “What EPA Is Doing” page regarding climate change. Those links are missing on the Trump EPA site. The archive includes the same version of the page as captured by our screenshot.

Inside EPA first reported the plans to take down the climate information on Jan. 17.

After the OGC investigation began, a source close to the Trump administration said Jan. 31 that climate “propaganda” would be taken down from the EPA site, but that the agency is not expected to remove databases on GHG emissions or climate science. “Eventually… the propaganda will get removed…. Most of what is there is not data. Most of what is there is interpretation.”

The Sierra Club and Environmental Defense Fund both filed FOIA requests asking the agency to preserve its climate data, while attorneys representing youth plaintiffs in a federal climate change lawsuit against the government have also asked the Department of Justice to ensure the data related to its claims is preserved.

The Azimuth Climate Data Backup Project and other groups are making copies of actual databases, not just the visible portions of websites.


Azimuth Backup Project (Part 4)

18 February, 2017

The Azimuth Climate Data Backup Project is going well! Our Kickstarter campaign ended on January 31st and the money has recently reached us. Our original goal was $5000. We got $20,427 of donations, and after Kickstarter took its cut we received $18,590.96.

Next time I’ll tell you what our project has actually been doing. This time I just want to give a huge “thank you!” to all 627 people who contributed money on Kickstarter!

I sent out thank you notes to everyone, updating them on our progress and asking if they wanted their names listed. The blanks in the following list represent people who either didn’t reply, didn’t want their names listed, or backed out and decided not to give money. I’ll list people in chronological order: first contributors first.

Only 12 people backed out; the vast majority of blanks on this list are people who haven’t replied to my email. I noticed some interesting but obvious patterns. For example, people who contributed later are less likely to have answered my email yet—I’ll update this list later. People who contributed more money were more likely to answer my email.

The magnitude of contributions ranged from $2000 to $1. A few people offered to help in other ways. The response was international—this was really heartwarming! People from the US were more likely than others to ask not to be listed.

But instead of continuing to list statistical patterns, let me just thank everyone who contributed.

thank-you-message2_edited-1

Daniel Estrada
Ahmed Amer
Saeed Masroor
Jodi Kaplan
John Wehrle
Bob Calder
Andrea Borgia
L Gardner

Uche Eke
Keith Warner
Dean Kalahan
James Benson
Dianne Hackborn

Walter Hahn
Thomas Savarino
Noah Friedman
Eric Willisson
Jeffrey Gilmore
John Bennett
Glenn McDavid

Brian Turner

Peter Bagaric

Martin Dahl Nielsen
Broc Stenman

Gabriel Scherer
Roice Nelson
Felipe Pait
Kenneth Hertz

Luis Bruno


Andrew Lottmann
Alex Morse

Mads Bach Villadsen
Noam Zeilberger

Buffy Lyon

Josh Wilcox

Danny Borg

Krishna Bhogaonker
Harald Tveit Alvestrand


Tarek A. Hijaz, MD
Jouni Pohjola
Chavdar Petkov
Markus Jöbstl
Bjørn Borud


Sarah G

William Straub

Frank Harper
Carsten Führmann
Rick Angel
Drew Armstrong

Jesimpson

Valeria de Paiva
Ron Prater
David Tanzer

Rafael Laguna
Miguel Esteves dos Santos 
Sophie Dennison-Gibby




Randy Drexler
Peter Haggstrom


Jerzy Michał Pawlak
Santini Basra
Jenny Meyer


John Iskra

Bruce Jones
Māris Ozols
Everett Rubel



Mike D
Manik Uppal
Todd Trimble

Federer Fanatic

Forrest Samuel, Harmos Consulting








Annie Wynn
Norman and Marcia Dresner



Daniel Mattingly
James W. Crosby








Jennifer Booth
Greg Randolph





Dave and Karen Deeter

Sarah Truebe









Tieg Zaharia
Jeffrey Salfen
Birian Abelson

Logan McDonald

Brian Truebe
Jon Leland


Nicole



Sarah Lim







James Turnbull




John Huerta
Katie Mandel Bruce
Bethany Summer




Heather Tilert

Anna C. Gladstone



Naom Hart
Aaron Riley

Giampiero Campa

Julie A. Sylvia


Pace Willisson









Bangskij










Peter Herschberg

Alaistair Farrugia


Conor Hennessy




Stephanie Mohr




Torinthiel


Lincoln Muri 
Anet Ferwerda 


Hanna





Michelle Lee Guiney

Ben Doherty
Trace Hagemann







Ryan Mannion


Penni and Terry O'Hearn



Brian Bassham
Caitlin Murphy
John Verran






Susan


Alexander Hawson
Fabrizio Mafessoni
Anita Phagan
Nicolas Acuña
Niklas Brunberg

Adam Luptak
V. Lazaro Zamora






Branford Werner
Niklas Starck Westerberg
Luca Zenti and Marta Veneziano 


Ilja Preuß
Christopher Flint

George Read 
Courtney Leigh

Katharina Spoerri


Daniel Risse



Hanna
Charles-Etienne Jamme
rhackman41



Jeff Leggett

RKBookman


Aaron Paul
Mike Metzler


Patrick Leiser

Melinda

Ryan Vaughn
Kent Crispin

Michael Teague

Ben



Fabian Bach
Steven Canning


Betsy McCall

John Rees

Mary Peters

Shane Claridge
Thomas Negovan
Tom Grace
Justin Jones


Jason Mitchell




Josh Weber
Rebecca Lynne Hanginger
Kirby


Dawn Conniff


Michael T. Astolfi



Kristeva

Erik
Keith Uber

Elaine Mazerolle
Matthieu Walraet

Linda Penfold




Lujia Liu



Keith



Samar Tareem


Henrik Almén
Michael Deakin 
Rutger Ockhorst

Erin Bassett
James Crook



Junior Eluhu
Dan Laufer
Carl
Robert Solovay






Silica Magazine







Leonard Saers
Alfredo Arroyo García



Larry Yu













John Behemonth


Eric Humphrey


Svein Halvor Halvorsen



Karim Issa

Øystein Risan Borgersen
David Anderson Bell III











Ole-Morten Duesend







Adam North and Gabrielle Falquero

Robert Biegler 


Qu Wenhao






Steffen Dittmar




Shanna Germain






Adam Blinkinsop







John WS Marvin (Dread Unicorn Games)


Bill Carter
Darth Chronis 



Lawrence Stewart

Gareth Hodges

Colin Backhurst
Christopher Metzger

Rachel Gumper


Mariah Thompson

Falk Alexander Glade
Johnathan Salter




Maggie Unkefer
Shawna Maryanovich






Wilhelm Fitzpatrick
Dylan “ExoByte” Mayo
Lynda Lee




Scott Carpenter



Charles D, Payet
Vince Rostkowski


Tim Brown
Raven Daegmorgan
Zak Brueckner


Christian Page

Adi Shavit


Steven Greenberg
Chuck Lunney



Adriel Bustamente

Natasha Anicich



Bram De Bie
Edward L






Gray Detrick
Robert


Sarah Russell

Sam Leavin

Abilash Pulicken

Isabel Olondriz
James Pierce
James Morrison


April Daniels



José Tremblay Champagne


Chris Edmonds

Hans & Maria Cummings
Bart Gasiewiski


Andy Chamard



Andrew Jackson

Christopher Wright

Crystal Collins

ichimonji10


Alan Stern
Alison W


Dag Henrik Bråtane





Martin Nilsson


William Schrade


Saving Climate Data (Part 5)

6 February, 2017

march-for-science-earth-day

There’s a lot going on! Here’s a news roundup. I will separately talk about what the Azimuth Climate Data Backup Project is doing.

I’ll start with the bad news, and then go on to some good news.

Tweaking the EPA website

Scientists are keeping track of how Trump administration is changing the Environmental Protection Agency website, with before-and-after photos, and analysis:

• Brian Kahn, Behold the “tweaks” Trump has made to the EPA website (so far), National Resources Defense Council blog, 3 February 2017.

There’s more about “adaptation” to climate change, and less about how it’s caused by carbon emissions.

All of this would be nothing compared to the new bill to eliminate the EPA, or Myron Ebell’s plan to fire most of the people working there:

• Joe Davidson, Trump transition leader’s goal is two-thirds cut in EPA employees, Washington Post, 30 January 2017.

If you want to keep track of this battle, I recommend getting a 30-day free subscription to this online magazine:

InsideEPA.com.

Taking animal welfare data offline

The Trump team is taking animal-welfare data offline. The US Department of Agriculture will no longer make lab inspection results and violations publicly available, citing privacy concerns:

• Sara Reardon, US government takes animal-welfare data offline, Nature Breaking News, 3 Feburary 2017.

Restricting access to geospatial data

A new bill would prevent the US government from providing access to geospatial data if it helps people understand housing discrimination. It goes like this:

Notwithstanding any other provision of law, no Federal funds may be used to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing._

For more on this bill, and the important ways in which such data has been used, see:

• Abraham Gutman, Scott Burris, and the Temple University Center for Public Health Law Research, Where will data take the Trump administration on housing?, Philly.com, 1 February 2017.

The EDGI fights back

The Environmental Data and Governance Initiative or EDGI is working to archive public environmental data. They’re helping coordinate data rescue events. You can attend one and have fun eating pizza with cool people while saving data:

• 3 February 2017, Portland
• 4 February 2017, New York City
• 10-11 February 2017, Austin Texas
• 11 February 2017, U. C. Berkeley, California
• 18 February 2017, MIT, Cambridge Massachusetts
• 18 February 2017, Haverford Connecticut
• 18-19 February 2017, Washington DC
• 26 February 2017, Twin Cities, Minnesota

Or, work with EDGI to organize one your own data rescue event! They provide some online tools to help download data.

I know there will also be another event at UCLA, so the above list is not complete, and it will probably change and grow over time. Keep up-to-date at their site:

Environmental Data and Governance Initiative.

Scientists fight back

The pushback is so big it’s hard to list it all! For now I’ll just quote some of this article:

• Tabitha Powledge, The gag reflex: Trump info shutdowns at US science agencies, especially EPA, 27 January 2017.

THE PUSHBACK FROM SCIENCE HAS BEGUN

Predictably, counter-tweets claiming to come from rebellious employees at the EPA, the Forest Service, the USDA, and NASA sprang up immediately. At The Verge, Rich McCormick says there’s reason to believe these claims may be genuine, although none has yet been verified. A lovely head on this post: “On the internet, nobody knows if you’re a National Park.”

At Hit&Run, Ronald Bailey provides handles for several of these alt tweet streams, which he calls “the revolt of the permanent government.” (That’s a compliment.)

Bailey argues, “with exception perhaps of some minor amount of national security intelligence, there is no good reason that any information, data, studies, and reports that federal agencies produce should be kept from the public and press. In any case, I will be following the Alt_Bureaucracy feeds for a while.”

NeuroDojo Zen Faulkes posted on how to demand that scientific societies show some backbone. “Ask yourself: “Have my professional societies done anything more political than say, ‘Please don’t cut funding?’” Will they fight?,” he asked.

Scientists associated with the group_ 500 Women Scientists _donned lab coats and marched in DC as part of the Women’s March on Washington the day after Trump’s Inauguration, Robinson Meyer reported at the Atlantic. A wildlife ecologist from North Carolina told Meyer, “I just can’t believe we’re having to yell, ‘Science is real.’”

Taking a cue from how the Women’s March did its social media organizing, other scientists who want to set up a Washington march of their own have put together a closed Facebook group that claims more than 600,000 members, Kate Sheridan writes at STAT.

The #ScienceMarch Twitter feed says a date for the march will be posted in a few days. [The march will be on 22 April 2017.] The group also plans to release tools to help people interested in local marches coordinate their efforts and avoid duplication.

At The Atlantic, Ed Yong describes the political action committee 314Action. (314=the first three digits of pi.)

Among other political activities, it is holding a webinar on Pi Day—March 14—to explain to scientists how to run for office. Yong calls 314Action the science version of Emily’s List, which helps pro-choice candidates run for office. 314Action says it is ready to connect potential candidate scientists with mentors—and donors.

Other groups may be willing to step in when government agencies wimp out. A few days before the Inauguration, the Centers for Disease Control and Prevention abruptly and with no explanation cancelled a 3-day meeting on the health effects of climate change scheduled for February. Scientists told Ars Technica’s Beth Mole that CDC has a history of running away from politicized issues.

One of the conference organizers from the American Public Health Association was quoted as saying nobody told the organizers to cancel.

I believe it. Just one more example of the chilling effect on global warming. In politics, once the Dear Leader’s wishes are known, some hirelings will rush to gratify them without being asked.

The APHA guy said they simply wanted to head off a potential last-minute cancellation. Yeah, I guess an anticipatory pre-cancellation would do that.

But then—Al Gore to the rescue! He is joining with a number of health groups—including the American Public Health Association—to hold a one-day meeting on the topic Feb 16 at the Carter Center in Atlanta, CDC’s home base. Vox’s Julia Belluz reports that it is not clear whether CDC officials will be part of the Gore rescue event.

The Sierra Club fights back

The Sierra Club, of which I’m a proud member, is using the Freedom of Information Act or FOIA to battle or at least slow the deletion of government databases. They wisely started even before Trump took power:

• Jennifer A Dlouhy, Fearing Trump data purge, environmentalists push to get records, BloombergMarkets, 13 January 2017.

Here’s how the strategy works:

U.S. government scientists frantically copying climate data they fear will disappear under the Trump administration may get extra time to safeguard the information, courtesy of a novel legal bid by the Sierra Club.

The environmental group is turning to open records requests to protect the resources and keep them from being deleted or made inaccessible, beginning with information housed at the Environmental Protection Agency and the Department of Energy. On Thursday [January 9th], the organization filed Freedom of Information Act requests asking those agencies to turn over a slew of records, including data on greenhouse gas emissions, traditional air pollution and power plants.

The rationale is simple: Federal laws and regulations generally block government agencies from destroying files that are being considered for release. Even if the Sierra Club’s FOIA requests are later rejected, the record-seeking alone could prevent files from being zapped quickly. And if the records are released, they could be stored independently on non-government computer servers, accessible even if other versions go offline.


Azimuth Backup Project (Part 3)

22 January, 2017


azimuth_logo

Along with the bad news there is some good news:

• Over 380 people have pledged over $14,000 to the Azimuth Backup Project on Kickstarter, greatly surpassing our conservative initial goal of $5,000.

• Given our budget, we currently aim at backing up 40 terabytes of data, and we are well on our way to this goal. You can see what we’ve done at Our Progress, and what we’re still doing at the Issue Tracker.

• I have gotten a commitment from Danna Gianforte, the head of Computing and Communications at U. C. Riverside, that eventually the university will maintain a copy of our data. (This commitment is based on my earlier estimate that we’d have 20 terabytes of data, so I need to see if 40 is okay.)

• I have gotten two offers from other people, saying they too can hold our data.

I’m hoping that the data at U. C. Riverside will be made publicly available through a server. The other offers may involve it being held ‘secretly’ until such time as it became needed; that has its own complementary advantages.

However, the interesting problem that confronts us now is: how to spend our money?

You can see how we’re currently spending it on our Budget and Spending page. Basically, we’re paying a firm called Hetzner for servers and storage boxes.

We could simply continue to do this until our money runs out. I hope that long before then, U. C. Riverside will have taken over some responsibilities. If so, there would be a long period where our money would largely pay for a redundant backup. Redundancy is good, but perhaps there is something better.

Two members of our team, Sakari Maaranen and Greg Kochanski, have thoughts on this matter which I’d like to share. Sakari posted his thoughts on Google+, while Greg posted his in an email which he’s letting me share here.

Please read these and offer us your thoughts! Maybe you can help us decide on the best strategy!

Sakari Maaranen

For the record, my views on our strategy of using the budget that the Azimuth Climate Data Backup Project now has.

People have contributed it to this effort specifically.

Some non-government entities have offered “free hosting”. Of course the project should take any and all free offers to host our data. Those would not be spending our budget however. And they are still paying for it, even if they offered it to us “for free”.

As far as it comes to spending, I think we should think in terms of 1) terabytemonths, and 2) sufficient redundancy, and do that as cost-efficiently as possible. We should not just dump the money to any takers, but think of the best bang for the buck. We owe that to the people who have contributed now.

For example, if we burn the cash quick to expensive storage, I would consider that a failure. Instead, we must plan for the best use of the budget towards our mission.

What we have promised to the people is that we back up and serve these data sets, by the money they have given to us. Let’s do exactly that.

We are currently serving the mission at approximately €0.006 per gigabytemonth at least for as long as we have volunteers to work for free. The cost could be slightly higher if we paid for professional maintenance, which should be a reasonable assumption if we plan for long term service. Volunteer work cannot be guaranteed forever, even if it works temporarily.

This is one view and the question is open to public discussion.

Greg Kochanski

Some misc thoughts.

1) As I see it, we have made some promise of serving the data (“create a better interface for getting it”) which can be an expensive thing.

UI coding isn’t all that easy, and takes some time.

Beyond that, we’ve promised to back up the data, and once you say “backup”, you’ve also made an implicit promise to make the data available.

2) I agree that if we have a backup, it is a logical extension to take continuous backups, but I wouldn’t say it’s necessary.

Perhaps the way to think about it is to ask the question, “what do our donors likely want”?

3) Clearly they want to preserve the data, in case it disappears from the Federal sites. So, that’s job 1. And, if it does disappear, we need to make it available.

3a) Making it available will require some serving CPU, disk, and network. We may need to worry about DDOS attacks, thought perhaps we could get free coverage from Akamai or Google Project Shield.

3b) Making it available may imply paying some students to write Javascript and HTML to put up a front-end to allow people to access the data we are collecting.

Not all the data we’re collecting is in strictly servable form. Some of the databases, for example aren’t usefully servable in the form we collect, and we know some links will be broken because of missing pages, or because of wget’s design flaw.*

[* Wget stores http://a/b/c as a file, a/b/c, where a/b is a directory. Wget stores http://a/b as a file a/b, where a/b is a file.

Therefore, both cannot exist simultaneously on disk. If they do, wget drops one.]

Points 3 & 3a imply that we need to keep some money in the bank until either the websites are taken down, or we decide that the threat has abated. So, we need to figure out how much money to keep as a serving reserve. It doesn’t sound like UCR has committed to serve the data, though you could perhaps ask.

Beyond the serving reserve, I think we are free to do better backups (i.e. more than one data collection), and change detection.


Saving Climate Data (Part 4)

21 January, 2017

At noon today in Washington DC, while Trump was being inaugurated, all mentions of “climate change” and “global warming” were eliminated from the White House website.

Well, not all. The word “climate” still shows up here:

President Trump is committed to eliminating harmful and unnecessary policies such as the Climate Action Plan….

There are also reports that all mentions of climate change will be scrubbed from the website of the Environmental Protection Agency, or EPA.

From Motherboard

Let me quote from this article:

• Jason Koebler, All references to climate change have been deleted from the White House website, Motherboard, 20 January 2017.

Scientists and professors around the country had been rushing to download and rehost as much government science as was possible before the transition, based on a fear that Trump’s administration would neglect or outright delete government information, databases, and web applications about science. Last week, the Radio Motherboard podcast recorded an episode about these efforts, which you can listen to below, or anywhere you listen to podcasts.

The Internet Archive, too, has been keeping a close watch on the White House website; President Obama’s climate change page had been archived every single day in January.

So far, nothing on the Environmental Protection Agency’s website has changed under Trump, but a report earlier this week from Inside EPA, a newsletter and website that reports on the agency, suggested that pages about climate are destined to be cut within the first few weeks of his presidency.

Scientists I’ve spoken to who are archiving websites say they expect scientific data on the NASA, NOAA, Department of Energy, and EPA websites to be neglected or deleted eventually. They say they don’t expect agency sites to be updated immediately, but expect it to play out over the course of months. This sort of low-key data destruction might not be the type of censorship people typically think about, but scientists are treating it as such.

From Technology Review

Greg Egan pointed out another good article, on MIT’s magazine:

• James Temple, Climate data preservation efforts mount as Trump takes office, Technology Review, 20 January 2010.

Quoting from that:

Dozens of computer science students at the University of California, Los Angeles, will mark Inauguration Day by downloading federal climate databases they fear could vanish under the Trump Administration.

Friday’s hackathon follows a series of grassroots data preservation efforts in recent weeks, amid increasing concerns the new administration is filling agencies with climate deniers likely eager to cut off access to scientific data that undermine their policy views. Those worries only grew earlier this week, when Inside EPA reported website that the Environmental Protection Agency transition team plans to scrub climate data from the agency’s website, citing a source familiar with the team.

Earlier federal data hackathons include the “Guerrilla Archiving” event at the University of Toronto last month, the Internet Archive’s Gov Data Hackathon in San Francisco at the beginning of January, and the DataRescue Philly event at the University of Pennsylvania last week.

Much of the collected data is being stored in the servers of the End of Term Web Archive, a collaborative effort to preserve government websites at the conclusion of presidential terms. The University of Pennsylvania’s Penn Program in Environmental Humanities launched the separate DataRefuge project, in part to back up environmental data sets that standard Web crawling tools can’t collect.

Many of the groups are working off a master list of crucial data sets from NASA, the National Oceanic and Atmospheric Administration, the U.S. Geological Survey, and other agencies. Meteorologist and climate journalist Eric Holthaus helped prompt the creation of that crowdsourced list with a tweet early last month.

Other key developments driving the archival initiatives included reports that the transition team had asked Energy Department officials for a list of staff who attended climate change meetings in recent years, and public statements from senior campaign policy advisors arguing that NASA should get out of the business of “politically correct environmental monitoring.”

“The transition team has given us no reason to believe that they will respect scientific data, particularly when it’s inconvenient,” says Gretchen Goldman, research director in the Center for Science and Democracy at the Union of Concerned Scientists. These historical databases are crucial to ongoing climate change research in the United States and abroad, she says.

To be clear, the Trump camp hasn’t publicly declared plans to erase or eliminate access to the databases. But there is certainly precedent for state and federal governments editing, removing, or downplaying scientific information that doesn’t conform to their political views.

Late last year, it emerged that text on Wisconsin’s Department of Natural Resources website was substantially rewritten to remove references to climate change. In addition, an extensive Congressional investigation concluded in a 2007 report that the Bush Administration “engaged in a systematic effort to manipulate climate change science and mislead policymakers and the public about the dangers of global warming.”

In fact these Bush Administration efforts were masterminded by Myron Ebell, who Trump chose to lead his EPA transition team!

Continuing:

In fact, there are wide-ranging changes to federal websites with every change in administration for a variety of reasons. The Internet Archive, which collaborated on the End of Term project in 2008 and 2012 as well, notes that more than 80 percent of PDFs on .gov sites disappeared during that four-year period.

The organization has seen a surge of interest in backing up sites and data this year across all government agencies, but particularly for climate information. In the end, they expect to collect well more than 100 terabytes of data, close to triple the amount in previous years, says Jefferson Bailey, director of Web archiving.

In fact the Azimuth Backup Project alone may gather about 40 terabytes!

From Inside EPA

And then there’s this view from inside the Environmental Protection Agency:

• Dawn Reeves, Trump transition preparing to scrub some climate data from EPA Website, Inside EPA, January 17, 2017

The incoming Trump administration’s EPA transition team intends to remove non-regulatory climate data from the agency’s website, including references to President Barack Obama’s June 2013 Climate Action Plan, the strategies for 2014 and 2015 to cut methane and other data, according to a source familiar with the transition team.

Additionally, Obama’s 2013 memo ordering EPA to establish its power sector carbon pollution standards “will not survive the first day,” the source says, a step that rule opponents say is integral to the incoming administration’s pledge to roll back the Clean Power Plan and new source power plant rules.

The Climate Action Plan has been the Obama administration’s government-wide blueprint for addressing climate change and includes information on cutting domestic greenhouse gas (GHG)emissions, including both regulatory and voluntary approaches; information on preparing for the impacts of climate change; and information on leading international efforts.

The removal of such information from EPA’s website — as well as likely removal of references to such programs that link to the White House and other agency websites — is being prepped now.

The transition team’s preparations fortify concerns from agency staff, environmentalists and many scientists that the Trump administration is going to destroy reams of EPA and other agencies’ climate data. Scientists have been preparing for this possibility for months, with many working to preserve key data on private websites.

Environmentalists are also stepping up their efforts to preserve the data. The Sierra Club Jan. 13 filed a Freedom of Information Act request seeking reams of climate-related data from EPA and the Department of Energy (DOE), including power plant GHG data. Even if the request is denied, the group said it should buy them some time.

“We’re interested in trying to download and preserve the information, but it’s going to take some time,” Andrea Issod, a senior attorney with the Sierra Club, told Bloomberg. “We hope our request will be a counterweight to the coming assault on this critical pollution and climate data.”

While Trump has pledged to take a host of steps to roll back Obama EPA climate and other high-profile actions actions on his first day in office, transition and other officials say the date may slip.

“In truth, it might not [happen] on the first day, it might be a week,” the source close to the transition says of the removal of climate information from EPA’s website. The source adds that in addition to EPA, the transition team is also looking at such information on the websites of DOE and the Interior Department.

Additionally, incoming Trump press secretary Sean Spicer told reporters Jan. 17 that not much may happen on Inauguration Day itself, but to expect major developments the following Monday, Jan. 23. “I think on [Jan. 23] you’re going to see a big flurry of activity” that is expected to include the disappearance of at least some EPA climate references.

Until Trump is inaugurated on Jan. 20, the transition team cannot tell agency staff what to do, and the source familiar with the transition team’s work is unaware of any communications requiring language removal or beta testing of websites happening now, though it appears that some of this work is occurring.

“We can only ask for information at this point until we are in charge. On [Jan. 20] at about 2 o’clock, then they can ask [staff] to” take actions, the source adds.

Scope & Breadth

The scope and breadth of the information to be removed is unclear. While it is likely to include executive actions on climate, it does not appear that the reams of climate science information, including models, tools and databases on the EPA Office of Research & Development’s (ORD) website will be impacted, at least not immediately.

ORD also has published climate, air and energy strategic research action plans, including one for 2016-2019 that includes research to assess impacts; prevent and reduce emissions; and prepare for and respond to changes in climate and air quality.

But other EPA information maintained on its websites including its climate change page and its “What is EPA doing about climate change” page that references the Climate Action Plan, the 2014 methane strategy and a 2015 oil and gas methane reduction strategy are expected targets.

Another possible target is new information EPA just compiled—and hosted a Jan. 17 webinar to discuss—on climate change impacts to vulnerable communities.

One former EPA official who has experience with transitions says it is unlikely that any top Obama EPA official is on board with this. “I would think they would be violently against this. . . I would think that the last thing [EPA Administrator] Gina McCarthy would want to do would to be complicit in Trump’s effort to purge the website” of climate-related work, and that if she knew she would “go ballistic.”

But the former official, the source close to the transition team and others note that EPA career staff is fearful and may be undertaking such prep work “as a defensive maneuver to avoid getting targeted,” the official says, adding that any directive would likely be coming from mid-level managers rather than political appointees or senior level officials.

But while the former official was surprised that such work might be happening now, the fact that it is only said to be targeting voluntary efforts “has a certain ring of truth to it. Someone who is knowledgeable would draw that distinction.”

Additionally, one science advocate says, “The people who are running the EPA transition have a long history of sowing misunderstanding about climate change and they tend to believe in a vast conspiracy in the scientific community to lie to the public. If they think the information is truly fraudulent, it would make sense they would try to scrub it. . . . But the role of the agency is to inform the public . . . [and not to satisfy] the musings of a band of conspiracy theorists.”

The source was referring to EPA transition team leader Myron Ebell, a long-time climate skeptic at the Competitive Enterprise Institute, along with David Schnare, another opponent of climate action, who is at the Energy & Environment Legal Institute.

And while “a new administration has the right to change information about policy, what they don’t have the right to do is change the scientific information about policies they wish to put forward and that includes removing resources on science that serve the public.”

The advocate adds that many state and local governments rely on EPA climate information.

EPA Concern

But there has been plenty of concern that such a move would take place, especially after transition team officials last month sought the names of DOE employees who worked on climate change, raising alarms and cries of a “political witch hunt” along with a Dec. 13 letter from Sen. Maria Cantwell (D-WA) that prompted the transition team to disavow the memo.

Since then, scientists have been scrambling to preserve government data.

On Jan. 10, High Country News reported that on a Saturday last month, 150 technology specialists, hackers, scholars and activists assembled in Toronto for the “Guerrilla Archiving Event: Saving Environmental Data from Trump” where the group combed the internet for key climate and environmental data from EPA’s website.

“A giant computer program would then copy the information onto an independent server, where it will remain publicly accessible—and safe from potential government interference.”

The organizer of the event, Henry Warwick, said, “Say Trump firewalls the EPA,” pulling reams of information from public access. “No one will have access to the data in these papers” unless the archiving took place.

Additionally, the Union of Concerned Scientists released a Jan. 17 report, “Preserving Scientific Integrity in Federal Policy Making,” urging the Trump administration to retain scientific integrity. It wrote in a related blog post, “So how will government science fare under Trump? Scientists are not just going to wait and see. More than 5,500 scientists have now signed onto a letter asking the president-elect to uphold scientific integrity in his administration. . . . We know what’s at stake. We’ve come too far with scientific integrity to see it unraveled by an anti-science president. It’s worth fighting for.”


Give the Earth a Present: Help Us Save Climate Data

28 December, 2016

getz_ice_shelf

We’ve been busy backing up climate data before Trump becomes President. Now you can help too, with some money to pay for servers and storage space. Please give what you can at our Kickstarter campaign here:

Azimuth Climate Data Backup Project.

If we get $5000 by the end of January, we can save this data until we convince bigger organizations to take over. If we don’t get that much, we get nothing. That’s how Kickstarter works. Also, if you donate now, you won’t be billed until January 31st.

So, please help! It’s urgent.

I will make public how we spend this money. And if we get more than $5000, I’ll make sure it’s put to good use. There’s a lot of work we could do to make sure the data is authenticated, made easily accessible, and so on.

The idea

The safety of US government climate data is at risk. Trump plans to have climate change deniers running every agency concerned with climate change. So, scientists are rushing to back up the many climate databases held by US government agencies before he takes office.

We hope he won’t be rash enough to delete these precious records. But: better safe than sorry!

The Azimuth Climate Data Backup Project is part of this effort. So far our volunteers have backed up nearly 1 terabyte of climate data from NASA and other agencies. We’ll do a lot more! We just need some funds to pay for storage space and a server until larger institutions take over this task.

The team

Jan Galkowski is a statistician with a strong interest in climate science. He works at Akamai Technologies, a company responsible for serving at least 15% of all web traffic. He began downloading climate data on the 11th of December.

• Shortly thereafter John Baez, a mathematician and science blogger at U. C. Riverside, joined in to publicize the project. He’d already founded an organization called the Azimuth Project, which helps scientists and engineers cooperate on environmental issues.

• When Jan started running out of storage space, Scott Maxwell jumped in. He used to work for NASA—driving a Mars rover among other things—and now he works for Google. He set up a 10-terabyte account on Google Drive and started backing up data himself.

• A couple of days later Sakari Maaranen joined the team. He’s a systems architect at Ubisecure, a Finnish firm, with access to a high-bandwidth connection. He set up a server, he’s downloading lots of data, he showed us how to authenticate it with SHA-256 hashes, and he’s managing many other technical aspects of this project.

There are other people involved too. You can watch the nitty-gritty details of our progress here:

Azimuth Backup Project – Issue Tracker.

and you can learn more here:

Azimuth Climate Data Backup Project.


Saving Climate Data (Part 3)

23 December, 2016

You can back up climate data, but how can anyone be sure your backups are accurate? Let’s suppose the databases you’ve backed up have been deleted, so that there’s no way to directly compare your backup with the original. And to make things really tough, let’s suppose that faked databases are being promoted as competitors with the real ones! What can you do?

One idea is ‘safety in numbers’. If a bunch of backups all match, and they were made independently, it’s less likely that they all suffer from the same errors.

Another is ‘safety in reputation’. If a bunch of backups of climate data are held by academic institutes of climate science, and another are held by climate change denying organizations (conveniently listed here), you probably know which one you trust more. (And this is true even if you’re a climate change denier, though your answer may be different than mine.)

But a third idea is to use a cryptographic hash function. In very simplified terms, this is a method of taking a database and computing a fairly short string from it, called a ‘digest’.

740px-cryptographic_hash_function-svg

A good hash function makes it hard to change the database and get a new one with the same digest. So, if the person owning a database computes and publishes the digest, anyone can check that your backup is correct by computing its digest and comparing it to the original.

It’s not foolproof, but it works well enough to be helpful.

Of course, it only works if we have some trustworthy record of the original digest. But the digest is much smaller than the original database: for example, in the popular method called SHA-256, the digest is 256 bits long. So it’s much easier to make copies of the digest than to back up the original database. These copies should be stored in trustworthy ways—for example, the Internet Archive.

When Sakari Maraanen made a backup of the University of Idaho Gridded Surface Meteorological Data, he asked the custodians of that data to publish a digest, or ‘hash file’. One of them responded:

Sakari and others,

I have made the checksums for the UofI METDATA/gridMET files (1979-2015) as both md5sums and sha256sums.

You can find these hash files here:

https://www.northwestknowledge.net/metdata/data/hash.md5

https://www.northwestknowledge.net/metdata/data/hash.sha256

After you download the files, you can check the sums with:

md5sum -c hash.md5

sha256sum -c hash.sha256

Please let me know if something is not ideal and we’ll fix it!

Thanks for suggesting we do this!

Sakari replied:

Thank you so much! This means everything to public mirroring efforts. If you’d like to help further promoting this Best Practice, consider getting it recognized as a standard when you do online publishing of key public information.

1. Publishing those hashes is already a major improvement on its own.

2. Publishing them on a secure website offers people further guarantees that there has not been any man-in-the-middle.

3. Digitally signing the checksum files offers the best easily achievable guarantees of data integrity by the person(s) who sign the checksum files.

Please consider having these three steps included in your science organisation’s online publishing training and standard Best Practices.

Feel free to forward this message to whom it may concern. Feel free to rephrase as necessary.

As a separate item, public mirroring instructions for how to best download your data and/or public websites would further guarantee permanence of all your uniquely valuable science data and public contributions.

Right now we should get this message viral through the government funded science publishing people. Please approach the key people directly – avoiding the delay of using official channels. We need to have all the uniquely valuable public data mirrored before possible changes in funding.

Again, thank you for your quick response!

There are probably lots of things to be careful about. Here’s one. Maybe you can think of more, and ways to deal with them.

What if the data keeps changing with time? This is especially true of climate records, where new temperatures and so on are added to a database every day, or month, or year. Then I think we need to ‘time-stamp’ everything. The owners of the original database need to keep a list of digests, with the time each one was made. And when you make a copy, you need to record the time it was made.