GitHub

Introduction and scenario

CCODe project has been carried out for the exam Open Access and Digital Ethics of Digital Humanities and Digital Knowledge course at the University of Bologna. The aim of the project is the analysis and the further re-use of open access datasets, in order to find some kind of new knowledge reachable through the mashup of the original data. The scenario object of our study is climate change. What interested us was to go deeper in showing:

The main factors of climate change;

The impact of the countries on climate change and the commitments undertaken by them to fight it;

The opinions of the European Union citizens on climate change.

In line with our purpose, we selected 15 datasets and we performed on them several analysis under different points of view, we then extracted the data which we were interested in and finally we created three new datasets based on the above-mentioned areas.

The last step was the visualization of our datasets to facilitate the access and the understanding for the final user.

Original datasets

In order to cope with our goals, we chose 15 datasets related to our scenario. These were selected for their various proveniences, typologies, formats, metadata and licenses. Whenever in doubt, we considered the frequent citations of academic sources as proof of the reliability of the dataset. We selected the ones that at least at a first glance seemed free from cognitive biases, fair, legal valid, consistent and accurate.

With the goal to depict climate change over time, we selected eight datasets, concerning the main factors to measure climate change (temperature and precipitation anomalies and sea ice extent) together with significant events caused by it (droughts, floods, hurricanes, wildfires and the threatening of species). The majority (6/8) comes from American-based institutions (e.g. NOAA), while the others are provided by supranational organizations, as OECD and UNEP. Five out of eight report data collected on a country-based scale. Sea Ice Extent, Precipitation and Temperature anomalies are available only globally. Overall the datasets span from 1980 to 2019. Nonetheless, Wildfires started and Droughts ended in 2003; Threatened Species refers to 2019 only. Precipitation and Temperature started from the end of the XIX century.

To understand the impact of the single countries on climate change and their commitment against it, we selected four datasets, regarding GHG emissions, ecological footprints, submissions to Paris Agreement and global funds concerning climate change. All datasets come from supranational institutions as OECD and WRI. They are all modeled on a country basis. Starting from the 60s, they have different beginning dates, but they all end in recent years (minimum 2016). Paris agreement is circumstricted to the year of the ratification (2015).

Aiming to include the human perception of the problem, we were able to find three datasets built from Eurobarometer surveys of 2009, 2013 and 2019, reporting the opinions of European citizens on climate change. Data were collected country by country in the EU and are directly provided by this body.

In some cases, we found the datasets on re-user websites, as specified below.

In the following table, the preliminary analysis we performed on each dataset can be found.

General analysis

Subject	Name	Owner	Owner URL	Re-user	Re-user URL	Data type	Available formats	Metadata	License	Domain	Spatial coverage	Time range	Upload date	Last update	Update frequency	Description
Droughts	Droughts events 1980-2001	United Nations Environment Programme UNEP	https://preview.grid.unep.ch/index.php?preview=data&events=droughts&evcat=1&lang=eng	Humanitarian Data Exchange HDX	https://data.humdata.org/dataset/global-droughts-events-1980-2001	Quantitative	dbf, shp, shx (UNEP), CSV (HDX)	Yes: ISO 19115:2003/19139	Available for free for non commercial purpose, as explained at https://preview.grid.unep.ch/index.php?preview=about&cat=2&lang=eng	Environment	Global	Jan 01, 1980 - Dec 31, 2001	Not stated	November 17, 2018	Never	This dataset includes an estimate of global drought annual repartition based on Standardized Precipitation Index.
Floods	Global Active Archive of Large Flood Events	Dartmouth Flood Observatory, University of Colorado	http://floodobservatory.colorado.edu/Archives/index.html	Humanitarian Data Exchange HDX	https://data.humdata.org/dataset/global-active-archive-of-large-flood-events	Direct Observational Data/Anecdotal Data	XLSX, XML, MapInfo TAB, shapefiles	Just HDX	Creative Commons Attribution 4.0 International license (CC BY 4.0)	Environment	Global	1985 - present	Sep 02, 2019 (HDX)	Last entry 01/2020 (dataset) / October 11, 2019 (HDX)	Live	This dataset contains an active archive of flood event records.
Hurricanes	International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4	NOAA National Centers for Environmental Information	https://www.ncdc.noaa.gov/ibtracs/index.php?name=ib-v4-access			Quantitative	netCDF, CSV, shapefiles	Yes ISO 19115-2: https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C01552	World Data Center for Meteorology policy and World Meteorological Organization's Resolution 40 policy: https://www.ncdc.noaa.gov/ibtracs/index.php?name=terms	Environment	Global	1980 - present	2019-02-15 (NOAA) / March 2019 (IBTrACS Project)	Not stated	Twice weekly - Weekly (IBTrACS Project) / Daily (NOAA)	This dataset contains a complete set of historical tropical cyclones, obtained from the combination of information from numerous tropical cyclone datasets.
Wildfires	GFEDv4 (Global Fire Emissions Database, Version 4)	Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC)	http://www.globalfiredata.org/analysis.html			Quantitative	CSV, HDF	Yes: https://daac.ornl.gov/VEGETATION/guides/fire_emissions_v4.html	Data hosted by the ORNL DAAC is openly shared, without restriction, in accordance with NASA's Earth Science program Data and Information Policy	Environment	Global	2003-present	September 2015	2017-09-29	Not stated - inferred monthly	This dataset is data on the global estimates of annual fires counts of different countries based on burned area information from different fire types.
Temperature	Climate at a Glance - Time Series Graphs of Temperature Anomalies	NOAA National Centers for Environmental Information	https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool			Land-based station, Marine / Ocean	XMS, CSV, XML, JSON	Yes: https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool	FOIA (5 USC 552)	Environment	Global	1880-present	February 2020 (it changes every month)	Not stated	Not stated - inferred monthly	This dataset is the result of Comparing the average temperature of land, ocean, or land and ocean combined for any month or multi-month period to the average temperature for the same period over the 20th century showing if conditions are warmer or cooler than the past.
Threatened species	Threatened species	OECD (Organisation for Economic Co-operation and Development)	https://stats.oecd.org/Index.aspx?DataSetCode=WILD_LIFE			Quantitative	XLS, CSV, SDMX(XML)	Yes: https://stats.oecd.org/OECDStat_Metadata/ShowMetadata.ashx?Dataset=WILD_LIFE&Lang=en	http://www.oecd.org/termsandconditions/ Except where additional restrictions apply as stated above, You can extract from, download, copy, adapt, print, distribute, share and embed Data for any purpose, even for commercial use. You must give appropriate credit to the OECD (...)	Environment	Global	2018-2019	Not stated	March 2019	Not stated - inferred monthly	This dataset is data on the state of threatened species build on country replies to the Annual Quality Assurance (AQA) of OECD environmental reference series.
Sea ice	Sea Ice and Snow Cover Extent	NSIDC National Snow and Ice Data Center (https://nsidc.org/)		NOAA National Centers for Environmental Information	https://www.ncdc.noaa.gov/snow-and-ice/extent/	Satellite	CSV, XML, JSON	Yes: https://www.climate.gov/maps-data/dataset/snow-or-ice-extent-graphing-tool	FOIA (5 USC 552)	Environment	Global	1979-2020	Not stated	Not stated	Not stated	This dataset shows how the sea ice extent has changed from 1979 to 2020. The available data cover the North America + Greenland, Northern Hemisphere, Eurasia, and North America.
Precipitations	Climate Change Indicators: U.S. and Global Precipitation	NOAA National Centers for Environmental Information	https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-monthly-version-2?fbclid=IwAR20WeoOz2fCxr0hPl_KgqkAIJKu2CY0eNTlPYu5CtH3osaDUSbFlQR26kM	EPA - US Environmental Protection Agency	https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-precipitation	Quantitative	XLS	Yes: EPA: https://www.epa.gov/climate-indicators/downloads-indicators-technical-documentation, NOAA: ISO 19115-2 Metadata https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00835#	Not stated, FOIA in NOAA	Environment	USA and Global	1901-2015	April 2010	August 2016	Not stated	This dataset shows how the total annual amount of precipitation over land worldwide has changed since 1901.
Ghg emissions by country	Greenhouse Gas Emissions	OECD	https://stats.oecd.org/Index.aspx?DataSetCode=AIR_GHG#			Quantitative	XLS, CSV, PX, SDMX (XML)	Yes	http://www.oecd.org/termsandconditions/	Environment	Global	1990-2017	Not stated	August 2019	Not stated	This dataset presents trends in man-made emissions of major greenhouse gases and emissions by gas.
Footprint by country	National Footprint and Biocapacity Accounts 2019 Public Data Package	Global Footprint Network	https://www.footprintnetwork.org/licenses/public-data-package-free/			Quantitative	CSV	No	Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0)	Environment	Global	1961-2016	Not stated	Not stated	Not stated	This dataset contains data related to the ecological footprint of the countries.
Adhesion to Paris agreement	CAIT Paris Contributions Data	WRI - World Resource Institute	https://www.wri.org/resources/data-sets/cait-paris-contributions-data			Descriptive	XLSX	Yes	Creative Commons Attribution 4.0 International License	Environment, Politics	Global	2015-2016	March 2015	February 19, 2016	Not stated	This dataset collects information about all the countries which submitted the Paris Agreement. In particular the date of submission and the summary of the undertaken commitments.
Investments for climate change	Cumulative data on the contributors of climate finance	Climate Funds Update	https://climatefundsupdate.org/data-dashboard/#1541245664327-538690dc-b9a8			Quantitative	XLSX	No	Not stated	Economy, Environment	Global	2003-2019	Not stated	February 2019	Not stated	This dataset collects information about the funds invested by the countries at a global level in order to fight the climate change.
Opinions on climate change EU 2009	Special Eurobarometer 313: Europeans’ attitudes towards climate change	Directorate-General for Communication of the European Commission	https://data.europa.eu/euodp/it/data/dataset/S942_71_1_EBS313			Qualitative and quantitative	XLSX	Yes: https://europarl.europa.eu/at-your-service/files/be-heard/eurobarometer/2009/climate-change/report/it-report-climate-change-200907.pdf	https://data.europa.eu/euodp/it/copyright	Government and public sector	Slovacchia, Slovenia, Svezia, Paesi Bassi, Polonia, Portogallo, Romania, Belgio, Austria, Cipro, Bulgaria, Germania, Cechia, Spagna, Danimarca, Finlandia, Estonia, Regno Unito, Francia, Ungheria, Grecia, Italia, Irlanda, Lussemburgo, Lituania, Malta, Lettonia	January-February 2009	2014-12-09			This dataset is data on the public opinon of European citizens on the issue of climate change.
Opinions on climate change EU 2013	Special Eurobarometer 409: Climate change	Directorate-General for Communication of the European Commission	https://data.europa.eu/euodp/en/data/dataset/S2212_91_3_490_ENG			Qualitative and quantitative	XLS	Yes: https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm/ResultDoc/download/DocumentKy/57629	https://data.europa.eu/euodp/it/copyright	Government and public sector	Romania, Slovacchia, Slovenia, Svezia, Malta, Paesi Bassi, Polonia, Portogallo, Belgio, Austria, Cipro, Bulgaria, Germania, Cechia, Spagna, Danimarca, Finlandia, Estonia, Regno Unito, Francia, Croazia, Grecia, Irlanda, Ungheria, Lituania, Italia, Lettonia, Lussemburgo	From 2019-04-09 to 2019-04-26	2019-09-11			This dataset is data on the public opinon of European citizens on the issue of climate change.
Opinions on climate change EU 2019	Special Eurobarometer 409: Climate change	Directorate-General for Communication of the European Commission	https://data.europa.eu/euodp/it/data/dataset/S1084_80_2_409			Qualitative and quantitative	XLS	Yes: https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm/ResultDoc/download/DocumentKy/87642	https://data.europa.eu/euodp/it/copyright	Government and public sector	Romania, Slovacchia, Slovenia, Svezia, Malta, Paesi Bassi, Polonia, Portogallo, Belgio, Austria, Cipro, Bulgaria, Germania, Cechia, Spagna, Danimarca, Finlandia, Estonia, Regno Unito, Francia, Croazia, Grecia, Irlanda, Ungheria, Lituania, Italia, Lettonia, Lussemburgo	November-December 2013	2014-12-03			This dataset is data on the public opinon of European citizens on the issue of climate change.

Quality analysis

We started the analysis of the original datasets by inspecting their quality and accuracy. As a reference, we used the Open Data Goldbook for Data Managers and Data Holders, provided by European Data Portal, which is meant to be a practical guidebook for organizations wanting to publish Open Data. The questions posed to examine the quality of the dataset mainly concern completeness, cleanness, accuracy, timeliness and consistency. In the following table we report the output of the analysis.

	Droughts events 1980-2001	Global Active Archive of Large Flood Events	International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4	GFEDv4 (Global Fire Emissions Database, Version 4)	Climate at a Glance - Time Series Graphs of Temperature Anomalies	Threatened species	Sea Ice and Snow Cover Extent	Climate Change Indicators: U.S. and Global Precipitation	Greenhouse Gas Emissions	National Footprint and Biocapacity Accounts 2019 Public Data Package	CAIT Paris Contributions Data	Cumulative data on the contributors of climate finance	Special Eurobarometer 313: Europeans’ attitudes towards climate change	Special Eurobarometer 409: Climate change	Special Eurobarometer 490: Climate change
Issues	We wrote an email to the contributor of the dataset in order to ask if there exist a legend to codify the headers of the dataset, but we never received an answer.			The API platform from which the data was downloaded could not support mixed queries to retrive at once data of the world so we had to download the data in series and then created a united csv file for the dataset					Even if you set some conditions for your query, e.g. retrieving data including LULUFC, the output contains also data where it is excluded.		In order to process our dataset we need to modify the text, directly on the sheet, of one single cell (which contains some date information). We did this because the format of the date was not conformant to be processed, since it contained both xldate and string information. Another change was made in the cells of the columns containing the ""Summary"". We decided to clean the information in these cells because in addition to the proper text they contained also html tags and entities.
Content quality
Is the dataset complete?
"Contain a header row with a single description of what is shown. This means that once a dataset structure is in place, it should not change when sources are added. In the metadata, the header should be described"	Yes, but the key (legenda) for reading the columns is not provided	Yes, but the header entries described on the website (http://www.dartmouth.edu/~floods/Archives/ArchiveNotes.html) don't correspond to the actual entries of the dataset.	Yes, it is explained in a specific PDF document.	Yes, but the header is not described in the metadata.	Yes, but the header is not described in the metadata.	Yes, the header is described in the data characteristics section of the platform https://stats.oecd.org/Index.aspx?DataSetCode=WILD_LIFE	Yes	Yes, but the header is not described in the metadata.	Yes, but the header has not been described anywhere.	Yes, the header row is present and further explained in the PDF document about the work.	Yes (description in natural language)	Yes (easily understandable header row)	Yes (in dataset)	Yes, the questions contained in the dataset and the header are described and explained in the metadata pdf of the survey.	Yes, the questions contained in the dataset are explained in a PDF document about the survey.
"Be labelled with a version number. Once an update is done the dataset should get a new version number in order for the audience to keep track of changes"	No	No	Yes, Version 4	Yes Version 4 (GFEDv4)	No	No	No	Version 2	No	No, there is no version number, but the title contains the year of the account.	No	No	Yes, v1.00	Yes, v1.00	Yes, v1.00
"Contain information about its origin. What is the data about, where does it come from and for what purpose has it been published?"	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Be given a status: Draft, validated, final	Inferred: Final	Validated ("Active")	Version numbers are updated when processing changes cause changes to storms from previous years (for example, by adjustments to the merge routines) Inferred: Validated	This data version is updated monthly and has no stated reasons for this. Could not access the previous versions because they are superseded by new versions and are only accessible at the ORNL. Inferred: Validated because internaly the dataset is updated monthly	Inferred validated bescause the data is updated yearly from a view of the dataset but not stated.	Not stated and not inferable since data contains no information on years but just stated on the website that the data is on the latest year available	Validated (updated every year since 1979)	No,but inferred final since the time range of the dataset ends in 2015.	Not stated and not inferable, since data contains no information on the years concerned and it is just stated on the website that it is referred to the latest year available.	A new dataset is published every year. Inferred: Final	Final (data from 2015 and 2016)	Cumulative since 2003; up to date as of February 2019 (https://climatefundsupdate.org/about-us/notes-and-methodology/). Inferred: Validated	Final	Final	Final
Is the data clean?
Empty fields	Yes	Yes	Yes	No	No	No	No	No	Yes	No	No (if missing data "Not specified")	Yes (345, 346)	No	No	No
Dummy data and default values: are they correct?	Yes (e.g. 0)	Yes: 0, default values in case of uncertain number of deads or displaced (http://www.dartmouth.edu/~floods/Archives/ArchiveNotes.html)	No	No	No	No	Yes (e.g. -9999 probably missing data)	No	No	Yes (0, NULL)	No	No dummy "Not applicable" as default value	No	No	No
Wrong values	No	Same countries have occasionally been indicated with different names, e.g. United Kingdom and UK. Many countries' names have been mispelled	No	No	No	No	No	No	No	Yes: Côte d'Ivoire and Réunion have special characters instead of accented letters, probably for encoding issues.	No	No	No	No	No
Double entries	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No
Privacy sensitive information	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No
Is the data accurate?
"Is the data accurate enough for its potential purpose?"	No, because is only indicated the country and the year. We don't know the duration of the event, the severity, the exact place in the country, ...	No, since the work of other archives is not taken into account and it is based mainly on news, so many events could have been left out.	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
"Does its accuracy affect its reliability? (only if the answer to the previous question is "No")"	No	No
"Are the choices concerning interval described?"	No	No	No	No	No	No since there is no information concerning years	Not explained why the dataset starts from 1979. No, but it's easily understendable (i.e. every year since 1979)	No	No	No	Under the U.N. Framework Convention on Climate Change (UNFCCC), countries committed to create a new international climate agreement by the conclusion of the Paris climate summit in December 2015. The dataset was created after the above mentioned agreement.	Yes CFU data is cumulative since 2003. This is the first year in which one of the dedicated climate funds that we monitor approved finance for a project. The start date for each fund individually is available on the relevant fund page through ‘The Funds’ (https://climatefundsupdate.org/the-funds/)	No, just the context of the survey is explained in the introduction of the explanatory PDF.	No, just the context of the survey is explained in the introduction of the explanatory PDF.	No, just the context of the survey is explained in the introduction of the explanatory PDF.
"Does the data need aggregation or disaggregation?"	No	No	Data would probably need aggregation, because the resulting dataset is too big and much information could probably be condensed. For instance the same hurricane is registered more than once even in the same country because all the steps of the passage are traced.	No	No	No	No	No	No	No	No	No	No	No	No
Timeliness
"Data changes over time. Historical data will remain stable, but recent data will be updated over time. Therefore, it is important to check data with regard to its timeliness regularly. For consistency purposes, it is wise to create an update process that keeps the data up-to-date. Be sure that the data contains a notion of its timeliness. This topic is closely related to the maintenance of datasets."	No (not updated since 2001). Timeliness in data.	The dataset is updated, but the frequency of the procedure is not stated. Data contains a notion of its timeliness.	There is timeliness in data. It is clear that the dataset is updated, but the frequency is unclear: in "Status" section, it is said to be annual; in "Data access" section, twice weekly. Also the update frequency of the single sources is reported on the website.	Yes, there is timeliness in the data because the version of the dataset available is inferred to be updated every month and thus when new infomation is available, it is enterred.	Yes, there is timeliness in the data. The data has no update machanism but since it has versions, it is inferred that it is updated yearly.	It is stated on the website that the data is from the latest year available and it is inferred that it updated every month since the values present constantly change	Yes (inferred: every year)	No timiliness since the data hasn't been updated ever since with information of new years therefore it can be said to be historiacal.	Update frequency isn't stated, but there is timeliness in data.	Annually updated and timeliness is present in data.	Not updated because contains info about an agreement which took place between 2015 and 2016	The dataset is cumulative since 2003 and the last update was in february 2019. Probably every year the dataset is updated with new data, while maintaining the old ones. No notion of timeliness in data	Inferred: not needed. No notion of timeliness in the data and no update process because it is referred to a single year.	Inferred: not needed. No notion of timeliness in the data and no update process because it is referred to a single year.	Inferred: not needed. No notion of timeliness in the data and no update process because it is referred to a single year.
Consistency
"Reading through the quality aspects of data, the consistency of the presentation of your data is of major importance. Imagine re-users correlating data from various sources, but all datasets differ in accuracy, use of terms and timeframe. As an example, if you change the field names of the data collected for managing waste each year, the data cannot be compiled from one year to the next. This makes it difficult to use datasets: it will require a large effort of manipulation. Therefore, make sure you use the standards and be consistent in publishing datasets of equal quality."	Not stated	There seems to be no consistency w.r.t. the previous tables of 2007 and 2008, available on the website: the standard is different, as well as the field names.	Consistency is stated among the fundamental principles of the project in two occasions on the website: https://www.ncdc.noaa.gov/ibtracs/index.php?name=status and https://www.ncdc.noaa.gov/ibtracs/index.php?name=principle	No, each version from the writtings in the metadata is different and contains new information but this dataset version is consistent internally since it is updated each month and the same fields are present and understood.	Not stated but inferable. Probably every year the same sheet is updated.	Not stated, but it is inferred since the data is updated often and the fields remain the same.	Not stated but inferable. Probably every year the same sheet is updated.	Though the dataset hasn't been updated for a long period now, if need be the consistency of the data will be maintained.	Not stated nor inferable.	Yes: the methodology for the account of data is described in the explanatory PDF, also for what concerns the previous versions.	Yes (only one version, data not updated)	Not stated but inferable. Probably every year the same sheet is updated.	No: wrt the other datasets of the eurobarometer series, questions change without explanation	No: wrt the other datasets of the eurobarometer series, questions change without explanation	No: w.r.t. the other datasets of the Eurobarometer series, questions have changed throughout the time, but the decision and the differences are not explained.

Legal analysis

Having performed a quality analysis already on our original datasets, we continued our analysis step with the legal one. This analysis was performed mostly with the purpose of checking the legal correctness of the various datasets in terms of Privacy Issues, IPR of the dataset, Licenses, Limitation on Public Access, Economical condition and Temporal aspects according to the “Check list for Public Administration for the Open Data release”.

Here below you have a representation of the outcome of this analysis.

We used a Yes/No answering format but when necessary, we also provided broad information or links for clarifications on the choice of the answer.

Legal Basis	Droughts events 1980-2001	Global Active Archive of Large Flood Events	International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4	GFEDv4 (Global Fire Emissions Database, Version 4)	Climate at a Glance - Time Series Graphs of Temperature Anomalies	Threatened species	Sea Ice and Snow Cover Extent	Climate Change Indicators: U.S. and Global Precipitation	Greenhouse Gas Emissions	National Footprint and Biocapacity Accounts 2019 Public Data Package	CAIT Paris Contributions Data	Cumulative data on the contributors of climate finance	Special Eurobarometer 313: Europeans’ attitudes towards climate change	Special Eurobarometer 409: Climate change	Special Eurobarometer 490: Climate change
Privacy issues
1.1 Is the dataset free of any personal data as defined in the Regulation (EU) 2016/679? https://eur-lex.europa.eu/legal-content/IT/TXT/PDF/?uri=CELEX:32016R0679&from=IT	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
1.2 Is the dataset free of any indirect personal data that could be used for identifying the natural person? If so, is there a law that authorize the PA to release them? Or any other legal basis? Identify the legal basis.	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
"1.3 Is the dataset free of any particular personal data (art. 9 GDPR)? If so is there a law that authorize the PA to release them ?"	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
1.4 Is the dataset free of any information that combined with common data available in the web, could identify the person? If so, is there a law that authorize the PA to release them?	Yes	No, for each event, the location and the date are stated, so tracing back the news source could lead to individuals' name.	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
1.5 Is the dataset free of any information related to human rights (e.g. refugees, witness protection, etc.)?	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
1.6 Is a tool used for calculating the range of the risk of de- anonymization? Is the dataset anonymized? With which technique? Is it compliant with the three mandatory parameters: singling out, linking out, inference out?		Not stated
1.7 Are you using geolocalization capabilities ? Do you check that the geolocalization process can’t identify single individuals in some circumstances?	No	Yes, identifiable	Yes, not identifiable	No	No	No	No	No	No	No	No	No
1.8 Does the open data platform respect all the privacy regulations (registration of the end-user, profiling, cookies, analytics, etc.)? https://www.varonis.com/blog/us-privacy-laws/	"HDX: terms for cookies and for mailing service: https://data.humdata.org/about/terms. UNEP: No"	No privacy policy on the original website. In HDX, there are terms for cookies and for mailing service: https://data.humdata.org/about/terms.	Yes: https://www.noaa.gov/protecting-your-privacy	No	Yes: https://www.noaa.gov/protecting-your-privacy	Yes http://www.oecd.org/privacy/	Yes (https://nsidc.org/about/privacy)	Yes https://www.epa.gov/privacy/privacy-and-security-notice#rights	Yes	Yes	Yes (https://www.wri.org/about/privacy-policy)	Yes (https://climatefundsupdate.org/privacy-policy/)	Yes: https://data.europa.eu/euodp/en/privacystatement	Yes: https://data.europa.eu/euodp/en/privacystatement	Yes: https://data.europa.eu/euodp/en/privacystatement
1.9 Do you know who are in your open data platform the Controller and Processor of the privacy data of the system? https://advisera.com/eugdpracademy/knowledgebase/eu-gdpr-controller-vs-processor-what-are-the-differences/ https://www.altalex.com/documents/news/2018/04/12/articolo-4-gdpr-definizioni	No	OCHA, the system administrator of the HDX platform (inferred: it is the Controller, Google Analytics and Mixpanel are the Processors).	No, inferred: NOAA is the Controller	No, no	No inferred Controller-NOAA and Processor-Google Analytics	Not stated inferred Controller OECD	No, Inferred: NOAA is the Controller	No, inferred Controller EPA	Not stated. Inferred: OECD is the Controller	Not stated. Inferred: Global Footprint Network is the Controller	Not stated. Inferred: WRI is the Controller	Yes controller (Heinrich-Böll-Stiftung Washington, DC)	"Unit C.4, ""EU Open Data and CORDIS"" of the Publications Office is the Controller European Union Open Data Portal (EU ODP) is the Processor"	"Unit C.4, ""EU Open Data and CORDIS"" of the Publications Office is the Controller European Union Open Data Portal (EU ODP) is the Processor"	"Unit C.4, "EU Open Data and CORDIS" of the Publications Office is the Controller European Union Open Data Portal (EU ODP) is the Processor"
1.10 Where the datasets are physically stored (country and jurisdiction)? Do you have a cloud computing platform? Do you have checked the privacy regulation of the country where the dataset are physically stored? (territoriality)	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.But previous versions are at the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC)	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.	Not stated if they are physically stored or just online.
1.11 Do you have non-personal data? Are you sure that are not “mixed data”?	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.	Yes. Yes.
2. IPR of the dataset
2.1 Do you have created and generated the dataset?	Yes, (UNEP)	Yes, Dartmouth Flood Observatory.	Yes, NOAA NCEI	Yes, Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC)	Yes, NOAA NCEI - NCDC	Yes, OECD	Yes, (NSIDC)	Yes, NOAA NCEI	Yes, OECD	Yes, Global Footprint Network	Yes, (WRI)	Yes, (Climate Funds Update)	Yes, (Directorate-General for Communication of the European Commission)	Yes, (Directorate-General for Communication of the European Commission)	Yes, (Directorate-General for Communication of the European Commission)
2.2 Are you the owner of the dataset? Who is the owner?	Yes, (UNEP)	Yes, Dartmouth Flood Observatory.	Yes, NOAA NCEI	Yes, Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC)	Yes, NOAA NCEI - NCDC	Yes, OECD	Yes, (NSIDC)	Yes, NOAA NCEI	Yes, OECD	Yes, Global Footprint Network	Yes, (WRI)	Yes, (Climate Funds Update)	Yes, (Directorate-General for Communication of the European Commission)	Yes, (Directorate-General for Communication of the European Commission)	Yes, (Directorate-General for Communication of the European Commission)
2.3 Are you using third party data with the proper authorization and license? Are the dataset free from third party licenses or patents?	Third party data are used. No licences provided. https://preview.grid.unep.ch/index.php?preview=about&cat=3&lang=eng	Third party data are used. No licences provided.	Yes: https://www.ncdc.noaa.gov/ibtracs/index.php?name=terms	No third party data	Third party data are used. No licences provided.	No third party data	NOAA use NSIDC data. No licence provided.	No third party data	Third party data are used. No licences provided.	Third party data are used. No licences provided.	Yes The third party data used are those of the countries which provided their data.	Third party data are used. No licences provided.	No	No	No
2.4 Are there some limitations in the national legal system of the dataset for releasing some kind of datasets with open license?	"No Geneva (Switzerland) None or very limited activities are performed to monitor the reuse of open data in the country https://www.europeandataportal.eu/sites/default/files/open_data_maturity_report_2019.pdf see p.71 "Beginner"	No	No	No	No		No	No				No
3. Licences
3.1 Is the dataset released with an open data license ? In case of the use of CC0 have they all the right necessary for this particular kind of license (e.g., jurisdiction)?	Available for free for non commercial purpose (https://preview.grid.unep.ch/index.php?preview=about&cat=2&lang=eng&fbclid=IwAR2swMOTGMxCFZKVptR1wGa7yY2HNz0mfYZMur_aGG3TZAfdg4IEz_qcjDs#datause)	Creative Commons Attribution 4.0 International license - CC BY 4.0 (HDX)	World Data Center for Meteorology policy and World Meteorological Organization's Resolution 40 policy https://www.ncdc.noaa.gov/ibtracs/index.php?name=terms	Data hosted by the ORNL DAAC is openly shared, without restriction, in accordance with NASA's Earth Science program Data and Information Policy.	Yes FOIA	Except where additional restrictions apply as stated above, You can extract from, download, copy, adapt, print, distribute, share and embed Data for any purpose, even for commercial use. You must give appropriate credit to the OECD	Yes FOIA	Yes FOIA	Except where additional restrictions apply as stated in the website, you can extract from, download, copy, adapt, print, distribute, share and embed data for any purpose, even for commercial use. You must give appropriate credit to the OECD.	Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0)	Creative Commons Attribution 4.0 International License (CC BY 4.0)	Not stated	Yes: reuse of data published on this website for commercial or non-commercial purposes is authorised provided the source is acknowledged.	Yes: reuse of data published on this website for commercial or non-commercial purposes is authorised provided the source is acknowledged.	Yes: reuse of data published on this website for commercial or non-commercial purposes is authorised provided the source is acknowledged.
3.2 Is the clause included: "In any case the dataset can’t be used for re-identifying the person" ?	No	No	No	No	No	No	No	No	No		No	No	No	No	No
3.3 Is the API (in case there is) released with an open source license ?	Yes API, no licence	No API	No API	Yes API, No licence stated but inferred platform licence which is based on NASA FOIA	Yes API, No licence stated but inferred FOIA	Yes API, No open source licence stated but inferred that of the platform http://www.oecd.org/termsandconditions/	Yes API, no licence	No API	Yes API, no open source licence	Yes API, no licence	Yes API, no licence	Yes API, no licence	No API	No API	No API
3.4 Is the open data/API platform license regime compliant with your IPR policy? Do they have all the licences related to the open data platform/API software?	No license for the data platform	No license for the data platform	No license for the data platform	Yes Data platform license compliant to IPR policy, Yes license for open data platform but no licence for the API platform thus inferred it has the open data platform's license, yes.	Data platform license compliant to IPR policy but no licence for the API platform thus inferred data platform license, Yes for the platform and not for the API.	Data platform/API license compliant to IPR policy , yes	No, no	Data platform license compliant to IPR policy and has no API, yes	(API) Yes, yes	No license for the data platform	No license for the data platform	No license for the data platform	(data platform) Yes, yes: https://data.europa.eu/euodp/en/copyright	(data platform) Yes, yes: https://data.europa.eu/euodp/en/copyright	(data platform) Yes, yes: https://data.europa.eu/euodp/en/copyright
4. Limitations on public access
4.1 Does the dataset concern your institutional competences, scope and finality? Does the dataset concern other public administration competences?	Yes, no	Yes, no	Yes, no	Yes, no	Yes, no	Yes, no	Yes, no	Yes, no	Yes, yes: UNFCCC	Yes, yes: UN	Yes (https://www.wri.org/about/values) (https://www.wri.org/about/mission-goals)	Yes Yes (Overhead refers to expenditures from the Fund that are not directed to projects (such as administration fees)).	Yes, no	Yes, no	Yes, no
4.2 Does the dataset respect the limitations for the publication stated by your national legislation or by the EU directives ? https://project-open-data.cio.gov/policy-memo/ for USA	Yes	No open license on Dartmouth Observatory website	Yes	Yes	Yes		Yes	Yes				Yes	Yes	Yes	Yes
4.3 Are there some limitations connected to the international relations, public security or national defence ?	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No
4.4 Are there some limitations concerning the public interest ?	No	No	No	No	No	No	No	No	No	No	No	No	No	No	No
4.5 Does the dataset respect the international law limitations? https://opendatacharter.net/principles/ (?)	Yes	Yes	Yes	Yes but Open data platform not linked to metadata	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
4.6 Does the dataset respect the INSPIRE law limitations for the spatial data? https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:32007L0002	No	Not EU dataset	Not EU dataset	Not EU dataset	Not EU dataset	Not EU dataset
5. Economical Conditions
5.1 Could the dataset be released for free?	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
5.2 Are there some agreements with some other partners in order to release the dataset with a reasonable price?
5.3 Does the open data platform terms of service include a clause of “non liability agreement” regarding the dataset and API provided ?	Yes	No	No, just for links	Yes for dataset and links but not stated for the API https://science.nasa.gov/earth-science/earth-science-data/data-information-policy/data-rights-related-issues	No, just for links	Yes for the dataset and API	No, just for links NOAA.gov does not control or guarantee the accuracy, relevance, timeliness or completeness of information contained in a linked site.	Not stated in EPA NOAA just for links	Yes API, yes data	No	Yes (https://www.wri.org/about/open-data-commitment)	No	No	Yes	Yes
5.4 In case you decide to release the dataset to a reasonable price are the limitation imposed by the new directive 2019/1024/EU respected ? Are you able to calculate the “marginal cost”? Are you able to justify the “reasonable return on investment” limited to cover the costs of collection, production, reproduction, dissemination, preservation and rights clearance? There is a national law that justify your public administration to apply the “reasonable return of investment”?
5.5 In case you decide to release the dataset to a reasonable price do you check the e-Commerce directive1 and regulation?
6. Temporary aspects
6.1 Do you have a temporary policy for updating the dataset ?	Never (HDX)	"active" = current events are added immediately	Twice weekly - Weekly (IBTrACS Project) / Daily (NOAA)	Periodically	No	No	No	No	No	Annually	No	No	No	No	No
6.2 Do you have some mechanism for informing the end-user that the dataset is updated at a given time to avoid mis-usage and so potential risk of damage ?	No The United Nations periodically adds, changes, improves or updates the Materials on this Site without notice	No	Yes, forum	No, just an email for data access.	No	No	No	No	No	No	No	No	No	No	No
6.3 Did you check if the dataset for some reason can’t be indexed by the research engines (e.g. Google, Yahoo, etc.) ?	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed	Indexed
6.4 In case of personal data, do you have a reasonable technical mechanism for collecting request of deletion (e.g. right to be forgotten)?	No	Yes, email (HDX)	No	No	No	Yes, email	No	No	Yes, email	No	Yes (https://www.wri.org/about/privacy-policy > choices)	No	No	No	No

Ethical analysis

In order to carry out the ethical analysis of the original datasets, we relied on the Data Ethics Framework. We organized our analysis of each dataset according to several points of view: transparency, accountability, discrimination, cognitive bias and prejudice.

In particular, we tried to analyze each of these aspects organizing our analysis in four areas:

The purpose: what is the goal the dataset wants to achieve?

The process: how were the data collected?

The output : how was the dataset released?

Conclusion: summary

Droughts events 1980-2001

Purpose: this dataset is part of the wider “Global risk data platform” which also included data about other natural hazards. The purpose of the platform is to allow the visualisation of data on natural hazards.

Process: the data have been collected by merging data from different sources (that were cited on the website). In this case the sources were: a global monthly gridded precipitation dataset obtained from the Climatic Research Unit (University of East Anglia) and a GIS modeling of global Standardized Precipitation Index based on Brad Lyon (IRI, Columbia University) methodology. In this way the resulting dataset can be based on two different points of view. We don’t know anything from the platform about any prejudice or bias respect to the data collected, but we know that methodologies on hazards modeling were reviewed by a team of 24 independent experts selected by the World Meteorological Organization (WMO) and the United Nations Education and Scientific Cultural Organization (UNESCO).

Output: the resulting dataset is not easily understandable because there is no legend to interpret the column headers (lack of guidelines). A version is not indicated and therefore consistency cannot be ascertained. The platform provided a series of legal information about the license and the way the datasets can be used by the users (e.g. no commercial purpose). However beside this, there are no notions of discrimination and bias.

Conclusion: the dataset can be considered good from an ethical point of view but we cannot say the same about its transparency, because of the lack of the legend.

Global Active Archive of Large Flood Events

Purpose: The target is not declared, hence we could just infer that it is addressed to researchers. The benefit is explicitly said to be creating a unique source for large flood events. Nonetheless, since it doesn’t involve other archives, it could instead fragment the scenario. In the purpose, there is no trace of discrimination, prejudice or cognitive bias. It has a global basis.

Process: No transparency and accountability in the processing: even if sources are stated, the actual data they provided are not identifiable. Governmental sources they claim to have used are not distinguishable. No caveats nor documentation on what they have done have been provided.

Output: In the final dataset:

All countries have been recognised and no political discriminations have been made (e.g. Israel and Palestine).
There are no personal data, but the number of deaths, when small, combined with other information as the location could lead to individuals’ names. The purpose in using deaths and displaced is to show the gravity of the flood. However, another index is also used, so this information may have been avoided.
Since it is mainly based on news, as they state, the dataset contains mainly data about major events and “first world” countries (http://floodobservatory.colorado.edu/Archives/ArchiveNotes.html). Stating just “news” as source without naming it makes impossible to check the validity of the reported datum. On the other hand, making easy to retrieve the news source could mean in some cases facilitating the identification of involved people.

Conclusion: there aren’t prejudice and cognitive bias, but discrimination, since data focus on ‘first world’ countries and on limited sources (mainly news). Moreover, possible ethical problems arise about deaths and displaced. Nonetheless, the greatest ethical problems of the dataset are little openness (especially w.r.t. procedures) and accountability difficulties. If it wasn’t so cited in the academic world, it wouldn’t seem enough reliable.

International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4

The dataset can be considered as perfect from an ethical point of view because it doesn’t contain prejudice, cognitive bias or discrimination and everything is well documented: purpose and user need, data provenance, caveats and usage information (available in the technical documentation), field names explanation, ways to provide feedback.

GFEDv4 (Global Fire Emissions Database, Version 4)

Purpose: the data has a clear user need which is to provide global estimates of monthly burned area, monthly emissions and fractional contributions of different fire types, daily/3-hourly fields to scale the monthly emissions to higher temporal resolutions, and data for monthly biosphere fluxes which could be used for large-scale modeling studies.

Process: from a legal point of view the data is on point. The collection of the data used is done without any discrimination, cognitive bias or prejudice as inferred on the website (https://daac.ornl.gov/VEGETATION/guides/fire_emissions_v4.html) making use of the available data from other sources and theirs (Satellite information) to create a global view of the situation. The only note could be that they make no mention of licences for use of data from others.

Output: the data used serves exactly the need of the user they want to satisfy and is restricted to its purpose of creation. The dataset released at the end is available in an open format and free for reuse on an API platform. The dataset will in no way harm any individual person,community or country or public interest even with new events registered from the documentation. There is a clear description of the composition of the database again free from any discrimination,cognitive bias or prejudice.

Conclusion: we can in summary say that the dataset from an ethical point of view is clean to an extent.

Climate at a Glance - Time Series Graphs of Temperature Anomalies

Purpose: they don’t state clearly what is the purpose of the dataset created or the user need to which they are responding to but it is inferred that they want to make known to all what is the situation of the temperature anomalies in the world over the years

Process: for the creation of the final dataset, they combine data from two resources (Global Historical Climatology Network-Monthly (GHCN-M) data set and International Comprehensive Ocean-Atmosphere Data Set (ICOADS)) known for carrying out quality controls on their data for good practice. Their choice of sources is understood which goes to rhyme with the purpose and helps answer strictly to the user's need identified. Therefore no discrimantion, prejudice or cognitive bias.

Output: the dataset released from this combination is made available to all in an open format having all the information it had planned to deliver without any wrong ethical aspect. Good explanation of the basis of the results found in the dataset.

Conclusion: it can be therefore considered that the dataset is ethically correct.

Threatened species

Purpose: the purpose of the dataset is clearly stated and it is to show the numbers of known species (or assessed) and threatened species with the aim of indicating the state of mammals, birds, freshwater fish, reptiles, amphibians, vascular plants, mosses, lichens and invertebrates. This purpose has no issue of discrimination, cognitive bias or prejudice because most especially it goes for world information and also consider information from the various national Delegates.

Process: the process of collection and analysis of the data to create the dataset is done by updating and revising certain information from the comments of national Delegates. The basis of this act is not well stated on the website. So, it could be inferred that there may be some cognitive bias in the decision making.

Output: the released dataset is done through an API platform free to all but it is stated on the website that the interpretation should take in consideration the possibility of non exactness of the various values. Also they talk of the possibility of biased results due to overestimation of some of the incompletely evaluated groups of species likely to be threatened in certain countries.

Conclusion: the level of ethical correctness of this data set is not completely good because in the end we have a dataset of which some values may be wrong due to certain actions during its creation.

Sea Ice and Snow Cover Extent

Purpose: the purpose of providing a tool to see the sea ice extent over years is achieved: users can generate and examine graphs and statistics on ice and snow, or download the data to populate spreadsheets for further analysis.

Process: the purpose of providing a tool to see the sea ice extent over years is achieved: users can generate and examine graphs and statistics on ice and snow, or download the data to populate spreadsheets for further analysis.

Output: the result is a tool for browsing the sea ice extent from 1979 to 2020 for the Northern Hemisphere, Southern Hemisphere, and the Globe. Data can be observed monthly or annually. Very poor documentation, no information about restriction of use, bias and discrimination.

Conclusion: the dataset seems to be free from cognitive bias, however very few documentation is provided.

Climate Change Indicators: U.S. and Global Precipitation

Purpose: the purpose of creation of the dataset is clear and has no ethical distortion for the precise user need which was to point out all the precipitation anomalies over the given period selected.

Process: during the creation of the dataset they make use of all possible resources to create a well informed database on the subject matter. The good aspect is the fact that during the creation they make use of bias correction software ( automated bias correction software) which helps identify and eliminate biases. Also the personal intervention of the staff, scientists and data quality tests are done in the light of excluding any ethical compromise.

Output: the datasets released are well documented and are available without charge through NCEI's anonymous FTP service. The information it contains is of good quality and satisfies the user's need and purpose of creation.

Conclusion: this dataset can be consequently considered ethically correct.

Greenhouse Gas Emissions

Purpose: Target and purpose are inferable but not explicitly stated. It is not clear if data is referred just to countries of OECD. In case, this could cause cognitive bias.

Process: The provenance of the single datum is not stated so there are no ways to compare the dataset with the original sources and detect possible errors. No caveats or technical documentation to make the procedure reproducible have been made public.

Output: Even though apparently you are downloading the result of your specific query, the dataset could include unrequested data, e.g. downloading data including LULUFC leads anyway to a dataset that contains at the beginning data excluding LULUFC. Moreover, internal choices have not been clarified: in formats as CSV, values appear to be repeated in two columns; the codes for pollutants, variables, units and powercodes aren’t explained; reference and flags, despite the specific column, are overall unused. Finally, the fact that the countries are those of OECD is just inferable and has not been explicitly stated.

Conclusion: There is not properly discrimination or cognitive bias, but the vision is definitely partial because the set of countries is limited and in general the procedure and the output are not enough transparent and accountable.

National Footprint and Biocapacity Accounts 2019 Public Data Package

Purpose: The target is as broad as possible, with the purpose of making available the data to the public. No discrimination, prejudice or cognitive bias can be detected at this stage.

Process: The methodology is accountable and transparent.

Output: Everything is explained in the related paper. You are also given the possibility to access the paper of the previous versions to spot the differences. Their selection of countries could be said politically discriminant (e.g. Israel is present, while Palestine absent).

Conclusion: The peculiarity of the dataset is the purpose of making it available to everyone. Everything is accountable and transparent. There are no discrimination, prejudice or cognitive bias in any phase, except for the choice of the countries, which seems to take a political stand.

CAIT Paris Contributions Data

Purpose: the purpose of the dataset is to provide a collection of data about the countries which submitted the Paris Agreement in 2015-2016 and their commitments in the field of climate change. The purpose is achieved because the structured data from the CAIT Paris Contributions Map enables users to explore, compare, and assess the greenhouse gas mitigation plans in each country's Intended Nationally Determined Contribution (INDC).

Process: after the submission of the Paris Agreement, countries decided to release public outlines of actions they intended to take in order to achieve the goal. The data are structured according to a framework based on several protocols and standards listed on the website. The list of the adhesive countries and the license are provided on the first sheet of the dataset.

Output: the output of the process is an interactive map accessible through the platform. By clicking on almost every country (except for Libya for which we don’t have any document submitted), the user can see the information about the agreement for each country separately from the others (the same information provided in the dataset). The data are about the commitments of each country against climate change, so we can infer that they do not contain prejudices, discriminations and biases. What about Libya? This is the only case that can create a bias. A second issue is that the downloadable version of the dataset is updated to 2016, while the user can find on the online platform the data updated to 2019.

Conclusion: the dataset is almost complete from an ethical point of view, except for information about Libya. The API is easily accessible and transparent, but there is a discrepancy between the downloadable version and the online one.

Cumulative data on the contributors of climate finance

Purpose: the purpose of the platform is to present cumulative data on the contributors of climate finance from the multilateral climate change funds monitored by the platform itself. The purpose is achieved.

Process: We don’t know who is the owner of the platform and it is not clear what does it mean that “the data are presented for each multilateral climate change funds it tracks”. The platform collects the data in the following way: seeks information from different sources and then seeks correspondence with fund managers in order to verify the collected information. Despite this, it is stated that the platform receives verification for almost all funds and it is not indicated which are the authorities that verifies them. All these things can lead to an issue for what concerns reliability. A positive point is that the platform tracked governed funds focused on climate change and based its dataset on that funds (reliability and transparency). From the dataset we can infer that the analysed countries are mainly from Europe and Central Asia; no explanations about the choice of the countries (maybe chosen the ones that devoted a good part of the funds to climate change). No further info about any kind of prejudice or discrimination provided by the platform.

Output: the resulting output is user friendly and easily accessible.

Conclusion: it is not very clear who verify the funds and how much accurate data are (not very reliable). We can notice that the greater amount of data come from Europe and Central Asia. The resulting API is easily accessible.

Special Eurobarometer 313: Europeans’ attitudes towards climate change

Special Eurobarometer 409: Climate change

Special Eurobarometer 490: Climate change

Purpose: the purpose of the dataset is to understand what European citizens think about the climate change situation and what are their expectations for the future.

Process: to be able to accomplish their aim, the opinions of the citizen were collected carrying out surveys which results have been later analysed. The survey method and questions are described and documented and it is understandable that there is no ethical distortion.

Output: the dataset released contains all the countries of the EU and all the questions and answers are reported without any change. The survey method and questions are documented and further described in a specific paper. The data collected was used strictly for the purpose of the dataset and there were possibilities of not answering to certain questions. So, the possibility of prejudice is excluded and since everybody could take part in the survey we can say there is no discrimination. From the results there is no cognitive bias since there is no interpretation of the results of the dataset, just a publication. However, it is unclear the purpose of questions related to the economical status or the level of instruction of the individual in such a context; hence, they don’t seem totally free of discriminatory aspects.

Conclusion: everything is accountable and transparent. There are no discrimination, prejudice or cognitive bias, a part for the unclear purpose of some personal questions (e.g. economic status) apparently unrelated to the context.

Technical analysis

At this stage we analyzed our datasets under the technical point of view. We examined the available formats, the presence of metadata, the URIs and the provenance. Below the result:

	Droughts events 1980-2001	Global Active Archive of Large Flood Events	International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4	GFEDv4 (Global Fire Emissions Database, Version 4)	Climate at a Glance - Time Series Graphs of Temperature Anomalies	Threatened species	Sea Ice and Snow Cover Extent	Climate Change Indicators: U.S. and Global Precipitation	Greenhouse Gas Emissions	National Footprint and Biocapacity Accounts 2019 Public Data Package	CAIT Paris Contributions Data	Cumulative data on the contributors of climate finance	Special Eurobarometer 313: Europeans’ attitudes towards climate change	Special Eurobarometer 409: Climate change	Special Eurobarometer 490: Climate change
Format	dbf, shp, shx (UNEP) CSV (HDX)	XLSX, XML, MapInfo TAB, shapefiles	netCDF, CSV, shapefiles	CSV, HDF	XMS, CSV, JSON	XLS, CSV, SDMX(XML)	CSV, XML, JSON	XLS	XLS, CSV, PX, SDMX (XML)	CSV	XLSX	XLSX	XLSX	XLS	XLS
Metadata	Metadata format: ISO19115:2003/19139 https://preview.grid.unep.ch/index.php?preview=data&events=droughts&evcat=1&lang=eng	No	ISO 19115-2/C01552: https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C01552;view=iso	https://daac.ornl.gov/VEGETATION/guides/fire_emissions_v4.html	https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool	https://stats.oecd.org/OECDStat_Metadata/ShowMetadata.ashx?Dataset=WILD_LIFE&Lang=en	https://www.climate.gov/maps-data/dataset/snow-or-ice-extent-graphing-tool	EPA: https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-precipitation, NOAA: ISO 19115-2 Metadata https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00835#	https://stats.oecd.org/OECDStat_Metadata/ShowMetadata.ashx?Dataset=AIR_GHG&Lang=en	No	https://www.wri.org/resources/data-sets/cait-paris-contributions-data	No	https://data.europa.eu/euodp/it/data/dataset/S942_71_1_EBS313	https://data.europa.eu/euodp/it/data/dataset/S1084_80_2_409	https://data.europa.eu/euodp/en/data/dataset/S2212_91_3_490_ENG
URI	https://data.humdata.org/dataset/f5e8b21e-bb71-40e3-8129-5378ebc42e33/resource/52263859-fdfa-4622-bfb0-34ba82cc6729/download/dr-events-20150505221917-shapefile.zip (download)	http://floodobservatory.colorado.edu/Version3/FloodArchive.xlsx (download)	https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.since1980.list.v04r00.csv (download)	No	https://www.ncdc.noaa.gov/cag/global/time-series/globe/land_ocean/1/2/1880-2020/data.json	https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/WILD_LIFE/TOT_KNOWN+TOT_KNOWN_IND+CRITICAL+CRITICAL_IND+ENDANGERED+ENDANGERED_IND+VULNERABLE+VULNERABLE_IND+THREATENED+THREATENED_IND+THREAT_PERCENT+IND_PERCENT.MAMMAL+BIRD+REPTILE+AMPHIBIAN+FISH_TOT+MARINE_F+FRESHW_F+VASCULAR_PLANT+MOSS+LICHEN+INVERTEB.AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+NMEC+BRA+COL+CRI+RUS/all?	https://www.ncdc.noaa.gov/snow-and-ice/extent/sea-ice/N/2.xml	https://www.epa.gov/sites/production/files/2016-08/precipitation_fig-2.csv	No	No, sent by email	No, mail requested	https://climatefundsupdate.org/wp-content/uploads/2019/04/CFU-Website-Master-27-Feb-2019.xlsx (download)	http://data.europa.eu/88u/dataset/S942_71_1_EBS313 (permalink)	http://data.europa.eu/88u/dataset/S1084_80_2_409 (permalink)	http://data.europa.eu/88u/dataset/S2212_91_3_490_ENG (permalink)
Provenance	https://preview.grid.unep.ch/index.php?preview=data&events=droughts&evcat=1&lang=eng	http://floodobservatory.colorado.edu/Archives/index.html	https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/	http://www.globalfiredata.org/analysis.html	https://www.ncdc.noaa.gov/cag/global/time-series	https://stats.oecd.org/Index.aspx?DataSetCode=WILD_LIFE	NOAA https://www.climate.gov/maps-data/dataset/snow-or-ice-extent-graphing-tool (Even though the owner of the dataset is UNSIDC, we cannot find the original dataset on its website. The dataset is only available on NOAA website.)	https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-precipitation	https://stats.oecd.org/Index.aspx?DataSetCode=AIR_GHG#	https://www.footprintnetwork.org/licenses/public-data-package-free/	https://www.wri.org/resources/data-sets/cait-paris-contributions-data#	https://climatefundsupdate.org/data-dashboard/#1541245664327-538690dc-b9a8	https://data.europa.eu/euodp/it/data/dataset/S942_71_1_EBS313	https://data.europa.eu/euodp/it/data/dataset/S1084_80_2_409	https://data.europa.eu/euodp/en/data/dataset/S2212_91_3_490_ENG

Mashup and output datasets

From our original datasets, having performed all the analysis and verifications needed, we then created an overview of the various information we needed to retrieve from them based on questions we wish to answer to following our purpose and scenario. From this point we then moved to the extraction of data from the datasets using Python as our programming language. The various codes we used in this process can be found in the “code” folder. In this folder there are three folders:

1-data-extraction: in this folder you will find the python file base-file.py which is a condesé of our 15 functions we used to extract data from our original datasets. There is also the file countries.py which contains the python function used to extract the various countries and their ISO codes present in our original datasets, in order to manage the mistakes and the exceptions.
2-py-to-xml: in this folder you have the three functions we used to create our output XML datasets.
3-xml-to-json: in this folder you find the python files we used to create the json files useful for the visualizations, starting from our XML datasets.

Through the extraction and creation processes we produced three new datasets in XML format, which contain also their metadata. These datasets were used for the future points of our project and are collected in the “xml” folder. These datasets are:

natural_events.xml
impact_and_commitments.xml
eu_opinions-xml

Here below you have a table of the output datasets and the original datasets used to create each of them.

Output Dataset	Origin Datasets
natural_events.xml	Droughts events 1980-2001, Global Active Archive of Large Flood Events, International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4, GFEDv4 (Global Fire Emissions Database, Version 4) Climate at a Glance - Time Series Graphs of Temperature Anomalies, Climate Change Indicators: U.S. and Global Precipitation, Sea Ice and Snow Cover Extent.
impact_and_commitments.xml	Greenhouse Gas Emissions, National Footprint and Biocapacity Accounts 2019 Public Data Package, CAIT Paris Contributions Data, Cumulative data on the contributors of climate finance.
eu_opinions.xml	Special Eurobarometer 313: Europeans’ attitudes towards climate change, Special Eurobarometer 409: Climate change, Special Eurobarometer 490: Climate change.

The datasets have been created according to the “FAIR Principles”, so we would say that they are free of any quality, legal, technical and ethical problems.

We decided to release our datasets under the license CC-BY-SA 4.0.

Processing Issues

During the extraction of data from our original datasets, we encountered certain difficulties which we wish to make mention of for each dataset.

Global Active Archive of Large Flood Events: when we started the extraction process on the original dataset, we found out that some cells present the character /xa0, which is a non-breaking space in Latin1 (ISO 8859-1). We had to replace it before proceeding. Moreover, dates cells aren’t human-readable, so we used the library datetime to translate them.
International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4: the dataset was so big that it wasn't possible to work on it. In fact, since the location of the hurricane was reported in coordinates, we had to reverse geocode them to find the country, by using a third party geolocator, Open Street Map, but, we weren't able for its usage limits to work on the whole dataset. Thus, we selected data every five years and created a parallel CSV on which to work on.
GFEDv4 (Global Fire Emissions Database, Version 4): not being able to download the global dataset at once since the API platform did not advise this for reasons of bulkiness, we downloaded the dataset in series of three in a TXT format and combined them to create the CSV file. Due to this reason we have no URI for the dataset because the link we used is no longer accessible on the server.
CAIT Paris Contributions Data: In order to process our dataset we needed to modify the text, directly on the sheet, of one single cell (which contains some date information). We did this because the format of the date was not conformant to be processed, since it contained both xldate and string information. Another change was made in the cells of the columns containing the "Summary". We decided to clean the information in these cells because in addition to the proper text they contained also html tags and entities.

Sustainability of the datasets over time

The CCODe project is the outcome of an academic project at the University of Bologna. Thus, it won’t be maintained. Nonetheless the resulting datasets are based on other datasets that were originally collected by larger organizations and many of them are actively updated. We provided the links in the “General analysis” section, so that anyone can compare our datasets with the original ones. All the analysis were performed between February and March 2020.

We invite you to notify us in case you find errors or ways to improve our work; we provided the email contact in the metadata of the datasets.

In order to make our datasets easily reusable, we have indeed completed them with their metadata following DCAT_AP (v 2.0.0).

Moreover we provided the python codes that we used to extract the data of our interest and to produce the final xml and json files. They are freely available for further reuse, as long as the license is respected.

Everything is protected by the CC-BY-SA 4.0 license, which allows many uses of the work, provided that the creator is cited and the same license is maintained for the derivative works. See the specification for the use, on Creative Commons website. Please cite us as “Del Bene R., Hamvegam M. L. S., Pizzicori A. (2020) CCODe”.

If the project was financed, for a further implementation it would be useful to maintain the current datasets and enlarge them with data related to the missing years. It could also be desirable to cross-check in new ways our datasets, to make unexpected knowledge emerge.

Visualizations

At the beginning of our work, we formulated some hypotheses, starting from various questions with the final aim to decide how to intersect data.

We tried to reproduce this mind map in the final visualization section, organizing it in three categories, one for each dataset. Our purpose was to guide the user in the exploration of our data.

Since a part of the data was collected on a global scale and another on a country-base scale, we diversified our charts following the same approach.

Highchart is the JavaScript library we used to create the charts. This required the implementation of JSON files specifically formatted for the purpose.

Only the visualization of the map was created using another library, DataMaps, which allows us to create a choropleth map, to explore the evolution of the events over time.

Metadata and RDF assertion

In order to make our data reusable and interoperable, we provided them with their metadata, following the DCAT_AP (v 2.0.0) documentation.

The metadata were added both at the beginning of the XML documents (our final datasets) and incorporated into some tables on the metadata section of the website of the project.

We provided metadata for the whole catalogue (including the three datasets) and for each dataset individually. Moreover the RDF assertion for the metadata, following the Turtle serialization, has been released. This is accessible from the website as well.

Conclusion

Brainstorming ideas for the project, we all found ourselves concerned about climate change and hopeful that data could be an answer in representing it. Therefore, our initial question was: how evident is the problem of climate change?

This initial doubt lead to asking ourselves: how do countries behave in terms of emissions, one of the main causes of the phenomenon, and how do they commit against it? What is the perception of the problem from the citizens’ side?

Of course, the different time spans of the datasets and, in the case of the opinions, of a global spatial coverage influenced the output, which lacks for this reason of precision. Moreover, we discovered there could be external factors that condition data, as for example w.r.t. emissions, for which a country can balance its accounts by investing to fight climate change.

Nonetheless, some phenomena are evident:

Extreme natural events as droughts and wildfires have been increasing, even though not impressively, given the limited time span;
With a larger range of years, the climb is highly evident, as it happens with temperature and sea ice extent anomalies. Clearly their trends are inversely proportional;
During the latest years, footprint and emissions are overall decreasing and biocapacity is growing, even though still on a small scale. Could it be interpreted as a sign of the current awareness?
There is a marked distinction between first-world and third-world countries, which is evident by the magnitude of their emissions and by the amount of deposited funds;
The perception and the reaction of European citizens to climate change saw a negative trend from 2009 and 2013, while in 2019 the awareness is more spread, probable result of the highlighting of the problem in the past years.

Some of these outcomes proved our hypothesis; some others were unexpected, as for example the high amount of each natural event, China’s official “low” account of emissions and the citizens’ perception's negative change in 2013. Overall, we thought that data could make emerge a knowledge that is still too much ignored and we were proved in this sense right.

Name		Name	Last commit message	Last commit date
Latest commit History 800 Commits
charts		charts
code		code
css		css
dcat-rdf		dcat-rdf
img		img
js		js
scss		scss
vendor		vendor
xml		xml
.browserslistrc		.browserslistrc
.gitignore		.gitignore
.travis.yml		.travis.yml
DCAT_AP_2.0.0.pdf		DCAT_AP_2.0.0.pdf
LICENSE		LICENSE
README.md		README.md
gulpfile.js		gulpfile.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
visualization.html		visualization.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction and scenario

Original datasets

General analysis

Quality analysis

Legal analysis

Ethical analysis

Droughts events 1980-2001

Global Active Archive of Large Flood Events

International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4

GFEDv4 (Global Fire Emissions Database, Version 4)

Climate at a Glance - Time Series Graphs of Temperature Anomalies

Threatened species

Sea Ice and Snow Cover Extent

Climate Change Indicators: U.S. and Global Precipitation

Greenhouse Gas Emissions

National Footprint and Biocapacity Accounts 2019 Public Data Package

CAIT Paris Contributions Data

Cumulative data on the contributors of climate finance

Special Eurobarometer 313: Europeans’ attitudes towards climate change

Special Eurobarometer 409: Climate change

Special Eurobarometer 490: Climate change

Technical analysis

Mashup and output datasets

Processing Issues

Sustainability of the datasets over time

Visualizations

Metadata and RDF assertion

Conclusion

About

Releases

Packages

Contributors 4

Languages

License

learreDHDK/ccode

Folders and files

Latest commit

History

Repository files navigation

Introduction and scenario

Original datasets

General analysis

Quality analysis

Legal analysis

Ethical analysis

Droughts events 1980-2001

Global Active Archive of Large Flood Events

International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4

GFEDv4 (Global Fire Emissions Database, Version 4)

Climate at a Glance - Time Series Graphs of Temperature Anomalies

Threatened species

Sea Ice and Snow Cover Extent

Climate Change Indicators: U.S. and Global Precipitation

Greenhouse Gas Emissions

National Footprint and Biocapacity Accounts 2019 Public Data Package

CAIT Paris Contributions Data

Cumulative data on the contributors of climate finance

Special Eurobarometer 313: Europeans’ attitudes towards climate change

Special Eurobarometer 409: Climate change

Special Eurobarometer 490: Climate change

Technical analysis

Mashup and output datasets

Processing Issues

Sustainability of the datasets over time

Visualizations

Metadata and RDF assertion

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages