CCODe project has been carried out for the exam Open Access and Digital Ethics of Digital Humanities and Digital Knowledge course at the University of Bologna. The aim of the project is the analysis and the further re-use of open access datasets, in order to find some kind of new knowledge reachable through the mashup of the original data. The scenario object of our study is climate change. What interested us was to go deeper in showing:
In line with our purpose, we selected 15 datasets and we performed on them several analysis under different points of view, we then extracted the data which we were interested in and finally we created three new datasets based on the above-mentioned areas.
The last step was the visualization of our datasets to facilitate the access and the understanding for the final user.
In order to cope with our goals, we chose 15 datasets related to our scenario. These were selected for their various proveniences, typologies, formats, metadata and licenses. Whenever in doubt, we considered the frequent citations of academic sources as proof of the reliability of the dataset. We selected the ones that at least at a first glance seemed free from cognitive biases, fair, legal valid, consistent and accurate.
With the goal to depict climate change over time, we selected eight datasets, concerning the main factors to measure climate change (temperature and precipitation anomalies and sea ice extent) together with significant events caused by it (droughts, floods, hurricanes, wildfires and the threatening of species). The majority (6/8) comes from American-based institutions (e.g. NOAA), while the others are provided by supranational organizations, as OECD and UNEP. Five out of eight report data collected on a country-based scale. Sea Ice Extent, Precipitation and Temperature anomalies are available only globally. Overall the datasets span from 1980 to 2019. Nonetheless, Wildfires started and Droughts ended in 2003; Threatened Species refers to 2019 only. Precipitation and Temperature started from the end of the XIX century.
To understand the impact of the single countries on climate change and their commitment against it, we selected four datasets, regarding GHG emissions, ecological footprints, submissions to Paris Agreement and global funds concerning climate change. All datasets come from supranational institutions as OECD and WRI. They are all modeled on a country basis. Starting from the 60s, they have different beginning dates, but they all end in recent years (minimum 2016). Paris agreement is circumstricted to the year of the ratification (2015).
Aiming to include the human perception of the problem, we were able to find three datasets built from Eurobarometer surveys of 2009, 2013 and 2019, reporting the opinions of European citizens on climate change. Data were collected country by country in the EU and are directly provided by this body.
In some cases, we found the datasets on re-user websites, as specified below.
In the following table, the preliminary analysis we performed on each dataset can be found.
Subject | Name | Owner | Owner URL | Re-user | Re-user URL | Data type | Available formats | Metadata | License | Domain | Spatial coverage | Time range | Upload date | Last update | Update frequency | Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Droughts | Droughts events 1980-2001 | United Nations Environment Programme UNEP | https://preview.grid.unep.ch/index.php?preview=data&events=droughts&evcat=1&lang=eng | Humanitarian Data Exchange HDX | https://data.humdata.org/dataset/global-droughts-events-1980-2001 | Quantitative | dbf, shp, shx (UNEP), CSV (HDX) | Yes: ISO 19115:2003/19139 | Available for free for non commercial purpose, as explained at https://preview.grid.unep.ch/index.php?preview=about&cat=2&lang=eng | Environment | Global | Jan 01, 1980 - Dec 31, 2001 | Not stated | November 17, 2018 | Never | This dataset includes an estimate of global drought annual repartition based on Standardized Precipitation Index. |
Floods | Global Active Archive of Large Flood Events | Dartmouth Flood Observatory, University of Colorado | http://floodobservatory.colorado.edu/Archives/index.html | Humanitarian Data Exchange HDX | https://data.humdata.org/dataset/global-active-archive-of-large-flood-events | Direct Observational Data/Anecdotal Data | XLSX, XML, MapInfo TAB, shapefiles | Just HDX | Creative Commons Attribution 4.0 International license (CC BY 4.0) | Environment | Global | 1985 - present | Sep 02, 2019 (HDX) | Last entry 01/2020 (dataset) / October 11, 2019 (HDX) | Live | This dataset contains an active archive of flood event records. |
Hurricanes | International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4 | NOAA National Centers for Environmental Information | https://www.ncdc.noaa.gov/ibtracs/index.php?name=ib-v4-access | Quantitative | netCDF, CSV, shapefiles | Yes ISO 19115-2: https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C01552 | World Data Center for Meteorology policy and World Meteorological Organization's Resolution 40 policy: https://www.ncdc.noaa.gov/ibtracs/index.php?name=terms | Environment | Global | 1980 - present | 2019-02-15 (NOAA) / March 2019 (IBTrACS Project) | Not stated | Twice weekly - Weekly (IBTrACS Project) / Daily (NOAA) | This dataset contains a complete set of historical tropical cyclones, obtained from the combination of information from numerous tropical cyclone datasets. | ||
Wildfires | GFEDv4 (Global Fire Emissions Database, Version 4) | Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) | http://www.globalfiredata.org/analysis.html | Quantitative | CSV, HDF | Yes: https://daac.ornl.gov/VEGETATION/guides/fire_emissions_v4.html | Data hosted by the ORNL DAAC is openly shared, without restriction, in accordance with NASA's Earth Science program Data and Information Policy | Environment | Global | 2003-present | September 2015 | 2017-09-29 | Not stated - inferred monthly | This dataset is data on the global estimates of annual fires counts of different countries based on burned area information from different fire types. | ||
Temperature | Climate at a Glance - Time Series Graphs of Temperature Anomalies | NOAA National Centers for Environmental Information | https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool | Land-based station, Marine / Ocean | XMS, CSV, XML, JSON | Yes: https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool | FOIA (5 USC 552) | Environment | Global | 1880-present | February 2020 (it changes every month) | Not stated | Not stated - inferred monthly | This dataset is the result of Comparing the average temperature of land, ocean, or land and ocean combined for any month or multi-month period to the average temperature for the same period over the 20th century showing if conditions are warmer or cooler than the past. | ||
Threatened species | Threatened species | OECD (Organisation for Economic Co-operation and Development) | https://stats.oecd.org/Index.aspx?DataSetCode=WILD_LIFE | Quantitative | XLS, CSV, SDMX(XML) | Yes: https://stats.oecd.org/OECDStat_Metadata/ShowMetadata.ashx?Dataset=WILD_LIFE&Lang=en | http://www.oecd.org/termsandconditions/ Except where additional restrictions apply as stated above, You can extract from, download, copy, adapt, print, distribute, share and embed Data for any purpose, even for commercial use. You must give appropriate credit to the OECD (...) | Environment | Global | 2018-2019 | Not stated | March 2019 | Not stated - inferred monthly | This dataset is data on the state of threatened species build on country replies to the Annual Quality Assurance (AQA) of OECD environmental reference series. | ||
Sea ice | Sea Ice and Snow Cover Extent | NSIDC National Snow and Ice Data Center (https://nsidc.org/) | NOAA National Centers for Environmental Information | https://www.ncdc.noaa.gov/snow-and-ice/extent/ | Satellite | CSV, XML, JSON | Yes: https://www.climate.gov/maps-data/dataset/snow-or-ice-extent-graphing-tool | FOIA (5 USC 552) | Environment | Global | 1979-2020 | Not stated | Not stated | Not stated | This dataset shows how the sea ice extent has changed from 1979 to 2020. The available data cover the North America + Greenland, Northern Hemisphere, Eurasia, and North America. | |
Precipitations | Climate Change Indicators: U.S. and Global Precipitation | NOAA National Centers for Environmental Information | https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-monthly-version-2?fbclid=IwAR20WeoOz2fCxr0hPl_KgqkAIJKu2CY0eNTlPYu5CtH3osaDUSbFlQR26kM | EPA - US Environmental Protection Agency | https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-precipitation | Quantitative | XLS | Yes: EPA: https://www.epa.gov/climate-indicators/downloads-indicators-technical-documentation, NOAA: ISO 19115-2 Metadata https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00835# | Not stated, FOIA in NOAA | Environment | USA and Global | 1901-2015 | April 2010 | August 2016 | Not stated | This dataset shows how the total annual amount of precipitation over land worldwide has changed since 1901. |
Ghg emissions by country | Greenhouse Gas Emissions | OECD | https://stats.oecd.org/Index.aspx?DataSetCode=AIR_GHG# | Quantitative | XLS, CSV, PX, SDMX (XML) | Yes | http://www.oecd.org/termsandconditions/ | Environment | Global | 1990-2017 | Not stated | August 2019 | Not stated | This dataset presents trends in man-made emissions of major greenhouse gases and emissions by gas. | ||
Footprint by country | National Footprint and Biocapacity Accounts 2019 Public Data Package | Global Footprint Network | https://www.footprintnetwork.org/licenses/public-data-package-free/ | Quantitative | CSV | No | Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0) | Environment | Global | 1961-2016 | Not stated | Not stated | Not stated | This dataset contains data related to the ecological footprint of the countries. | ||
Adhesion to Paris agreement | CAIT Paris Contributions Data | WRI - World Resource Institute | https://www.wri.org/resources/data-sets/cait-paris-contributions-data | Descriptive | XLSX | Yes | Creative Commons Attribution 4.0 International License | Environment, Politics | Global | 2015-2016 | March 2015 | February 19, 2016 | Not stated | This dataset collects information about all the countries which submitted the Paris Agreement. In particular the date of submission and the summary of the undertaken commitments. | ||
Investments for climate change | Cumulative data on the contributors of climate finance | Climate Funds Update | https://climatefundsupdate.org/data-dashboard/#1541245664327-538690dc-b9a8 | Quantitative | XLSX | No | Not stated | Economy, Environment | Global | 2003-2019 | Not stated | February 2019 | Not stated | This dataset collects information about the funds invested by the countries at a global level in order to fight the climate change. | ||
Opinions on climate change EU 2009 | Special Eurobarometer 313: Europeans’ attitudes towards climate change | Directorate-General for Communication of the European Commission | https://data.europa.eu/euodp/it/data/dataset/S942_71_1_EBS313 | Qualitative and quantitative | XLSX | Yes: https://europarl.europa.eu/at-your-service/files/be-heard/eurobarometer/2009/climate-change/report/it-report-climate-change-200907.pdf | https://data.europa.eu/euodp/it/copyright | Government and public sector | Slovacchia, Slovenia, Svezia, Paesi Bassi, Polonia, Portogallo, Romania, Belgio, Austria, Cipro, Bulgaria, Germania, Cechia, Spagna, Danimarca, Finlandia, Estonia, Regno Unito, Francia, Ungheria, Grecia, Italia, Irlanda, Lussemburgo, Lituania, Malta, Lettonia | January-February 2009 | 2014-12-09 | This dataset is data on the public opinon of European citizens on the issue of climate change. | ||||
Opinions on climate change EU 2013 | Special Eurobarometer 409: Climate change | Directorate-General for Communication of the European Commission | https://data.europa.eu/euodp/en/data/dataset/S2212_91_3_490_ENG | Qualitative and quantitative | XLS | Yes: https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm/ResultDoc/download/DocumentKy/57629 | https://data.europa.eu/euodp/it/copyright | Government and public sector | Romania, Slovacchia, Slovenia, Svezia, Malta, Paesi Bassi, Polonia, Portogallo, Belgio, Austria, Cipro, Bulgaria, Germania, Cechia, Spagna, Danimarca, Finlandia, Estonia, Regno Unito, Francia, Croazia, Grecia, Irlanda, Ungheria, Lituania, Italia, Lettonia, Lussemburgo | From 2019-04-09 to 2019-04-26 | 2019-09-11 | This dataset is data on the public opinon of European citizens on the issue of climate change. | ||||
Opinions on climate change EU 2019 | Special Eurobarometer 409: Climate change | Directorate-General for Communication of the European Commission | https://data.europa.eu/euodp/it/data/dataset/S1084_80_2_409 | Qualitative and quantitative | XLS | Yes: https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm/ResultDoc/download/DocumentKy/87642 | https://data.europa.eu/euodp/it/copyright | Government and public sector | Romania, Slovacchia, Slovenia, Svezia, Malta, Paesi Bassi, Polonia, Portogallo, Belgio, Austria, Cipro, Bulgaria, Germania, Cechia, Spagna, Danimarca, Finlandia, Estonia, Regno Unito, Francia, Croazia, Grecia, Irlanda, Ungheria, Lituania, Italia, Lettonia, Lussemburgo | November-December 2013 | 2014-12-03 | This dataset is data on the public opinon of European citizens on the issue of climate change. |
We started the analysis of the original datasets by inspecting their quality and accuracy. As a reference, we used the Open Data Goldbook for Data Managers and Data Holders, provided by European Data Portal, which is meant to be a practical guidebook for organizations wanting to publish Open Data. The questions posed to examine the quality of the dataset mainly concern completeness, cleanness, accuracy, timeliness and consistency. In the following table we report the output of the analysis.
Droughts events 1980-2001 | Global Active Archive of Large Flood Events | International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4 | GFEDv4 (Global Fire Emissions Database, Version 4) | Climate at a Glance - Time Series Graphs of Temperature Anomalies | Threatened species | Sea Ice and Snow Cover Extent | Climate Change Indicators: U.S. and Global Precipitation | Greenhouse Gas Emissions | National Footprint and Biocapacity Accounts 2019 Public Data Package | CAIT Paris Contributions Data | Cumulative data on the contributors of climate finance | Special Eurobarometer 313: Europeans’ attitudes towards climate change | Special Eurobarometer 409: Climate change | Special Eurobarometer 490: Climate change | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Issues | We wrote an email to the contributor of the dataset in order to ask if there exist a legend to codify the headers of the dataset, but we never received an answer. | The API platform from which the data was downloaded could not support mixed queries to retrive at once data of the world so we had to download the data in series and then created a united csv file for the dataset | Even if you set some conditions for your query, e.g. retrieving data including LULUFC, the output contains also data where it is excluded. | In order to process our dataset we need to modify the text, directly on the sheet, of one single cell (which contains some date information). We did this because the format of the date was not conformant to be processed, since it contained both xldate and string information.
Another change was made in the cells of the columns containing the ""Summary"". We decided to clean the information in these cells because in addition to the proper text they contained also html tags and entities. |
|||||||||||
Content quality | |||||||||||||||
Is the dataset complete? | |||||||||||||||
"Contain a header row with a single description of what is shown. This means that once a dataset structure is in place, it should not change when sources are added. In the metadata, the header should be described" | Yes, but the key (legenda) for reading the columns is not provided | Yes, but the header entries described on the website (http://www.dartmouth.edu/~floods/Archives/ArchiveNotes.html) don't correspond to the actual entries of the dataset. | Yes, it is explained in a specific PDF document. | Yes, but the header is not described in the metadata. | Yes, but the header is not described in the metadata. | Yes, the header is described in the data characteristics section of the platform https://stats.oecd.org/Index.aspx?DataSetCode=WILD_LIFE | Yes | Yes, but the header is not described in the metadata. | Yes, but the header has not been described anywhere. | Yes, the header row is present and further explained in the PDF document about the work. | Yes (description in natural language) | Yes (easily understandable header row) | Yes (in dataset) | Yes, the questions contained in the dataset and the header are described and explained in the metadata pdf of the survey. | Yes, the questions contained in the dataset are explained in a PDF document about the survey. |
"Be labelled with a version number. Once an update is done the dataset should get a new version number in order for the audience to keep track of changes" | No | No | Yes, Version 4 | Yes Version 4 (GFEDv4) | No | No | No | Version 2 | No | No, there is no version number, but the title contains the year of the account. | No | No | Yes, v1.00 | Yes, v1.00 | Yes, v1.00 |
"Contain information about its origin. What is the data about, where does it come from and for what purpose has it been published?" | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Be given a status: Draft, validated, final | Inferred: Final | Validated ("Active") | Version numbers are updated when processing changes cause changes to storms from previous years (for example, by adjustments to the merge routines) Inferred: Validated | This data version is updated monthly and has no stated reasons for this. Could not access the previous versions because they are superseded by new versions and are only accessible at the ORNL. Inferred: Validated because internaly the dataset is updated monthly | Inferred validated bescause the data is updated yearly from a view of the dataset but not stated. | Not stated and not inferable since data contains no information on years but just stated on the website that the data is on the latest year available | Validated (updated every year since 1979) | No,but inferred final since the time range of the dataset ends in 2015. | Not stated and not inferable, since data contains no information on the years concerned and it is just stated on the website that it is referred to the latest year available. | A new dataset is published every year. Inferred: Final | Final (data from 2015 and 2016) | Cumulative since 2003; up to date as of February 2019 (https://climatefundsupdate.org/about-us/notes-and-methodology/).
Inferred: Validated |
Final | Final | Final |
Is the data clean? | |||||||||||||||
Empty fields | Yes | Yes | Yes | No | No | No | No | No | Yes | No | No (if missing data "Not specified") | Yes (345, 346) | No | No | No |
Dummy data and default values: are they correct? | Yes (e.g. 0) | Yes: 0, default values in case of uncertain number of deads or displaced (http://www.dartmouth.edu/~floods/Archives/ArchiveNotes.html) | No | No | No | No | Yes (e.g. -9999 probably missing data) | No | No | Yes (0, NULL) | No | No dummy "Not applicable" as default value | No | No | No |
Wrong values | No | Same countries have occasionally been indicated with different names, e.g. United Kingdom and UK. Many countries' names have been mispelled | No | No | No | No | No | No | No | Yes: Côte d'Ivoire and Réunion have special characters instead of accented letters, probably for encoding issues. | No | No | No | No | No |
Double entries | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Privacy sensitive information | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Is the data accurate? | |||||||||||||||
"Is the data accurate enough for its potential purpose?" | No, because is only indicated the country and the year. We don't know the duration of the event, the severity, the exact place in the country, ... | No, since the work of other archives is not taken into account and it is based mainly on news, so many events could have been left out. | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
"Does its accuracy affect its reliability? (only if the answer to the previous question is "No")" | No | No | |||||||||||||
"Are the choices concerning interval described?" | No | No | No | No | No | No since there is no information concerning years | Not explained why the dataset starts from 1979. No, but it's easily understendable (i.e. every year since 1979) | No | No | No | Under the U.N. Framework Convention on Climate Change (UNFCCC), countries committed to create a new international climate agreement by the conclusion of the Paris climate summit in December 2015.
The dataset was created after the above mentioned agreement. |
Yes CFU data is cumulative since 2003. This is the first year in which one of the dedicated climate funds that we monitor approved finance for a project. The start date for each fund individually is available on the relevant fund page through ‘The Funds’ (https://climatefundsupdate.org/the-funds/) | No, just the context of the survey is explained in the introduction of the explanatory PDF. | No, just the context of the survey is explained in the introduction of the explanatory PDF. | No, just the context of the survey is explained in the introduction of the explanatory PDF. |
"Does the data need aggregation or disaggregation?" | No | No | Data would probably need aggregation, because the resulting dataset is too big and much information could probably be condensed. For instance the same hurricane is registered more than once even in the same country because all the steps of the passage are traced. | No | No | No | No | No | No | No | No | No | No | No | No |
Timeliness | |||||||||||||||
"Data changes over time. Historical data will remain stable, but recent data will be updated over time. Therefore, it is important to check data with regard to its timeliness regularly. For consistency purposes, it is wise to create an update process that keeps the data up-to-date. Be sure that the data contains a notion of its timeliness. This topic is closely related to the maintenance of datasets." | No (not updated since 2001). Timeliness in data. | The dataset is updated, but the frequency of the procedure is not stated. Data contains a notion of its timeliness. | There is timeliness in data. It is clear that the dataset is updated, but the frequency is unclear: in "Status" section, it is said to be annual; in "Data access" section, twice weekly. Also the update frequency of the single sources is reported on the website. | Yes, there is timeliness in the data because the version of the dataset available is inferred to be updated every month and thus when new infomation is available, it is enterred. | Yes, there is timeliness in the data. The data has no update machanism but since it has versions, it is inferred that it is updated yearly. | It is stated on the website that the data is from the latest year available and it is inferred that it updated every month since the values present constantly change | Yes (inferred: every year) | No timiliness since the data hasn't been updated ever since with information of new years therefore it can be said to be historiacal. | Update frequency isn't stated, but there is timeliness in data. | Annually updated and timeliness is present in data. | Not updated because contains info about an agreement which took place between 2015 and 2016 | The dataset is cumulative since 2003 and the last update was in february 2019.
Probably every year the dataset is updated with new data, while maintaining the old ones.
No notion of timeliness in data |
Inferred: not needed. No notion of timeliness in the data and no update process because it is referred to a single year. | Inferred: not needed. No notion of timeliness in the data and no update process because it is referred to a single year. | Inferred: not needed. No notion of timeliness in the data and no update process because it is referred to a single year. |
Consistency | |||||||||||||||
"Reading through the quality aspects of data, the consistency of the presentation of your data is of major importance. Imagine re-users correlating data from various sources, but all datasets differ in accuracy, use of terms and timeframe. As an example, if you change the field names of the data collected for managing waste each year, the data cannot be compiled from one year to the next. This makes it difficult to use datasets: it will require a large effort of manipulation. Therefore, make sure you use the standards and be consistent in publishing datasets of equal quality." | Not stated | There seems to be no consistency w.r.t. the previous tables of 2007 and 2008, available on the website: the standard is different, as well as the field names. | Consistency is stated among the fundamental principles of the project in two occasions on the website: https://www.ncdc.noaa.gov/ibtracs/index.php?name=status and https://www.ncdc.noaa.gov/ibtracs/index.php?name=principle | No, each version from the writtings in the metadata is different and contains new information but this dataset version is consistent internally since it is updated each month and the same fields are present and understood. | Not stated but inferable. Probably every year the same sheet is updated. | Not stated, but it is inferred since the data is updated often and the fields remain the same. | Not stated but inferable. Probably every year the same sheet is updated. | Though the dataset hasn't been updated for a long period now, if need be the consistency of the data will be maintained. | Not stated nor inferable. | Yes: the methodology for the account of data is described in the explanatory PDF, also for what concerns the previous versions. | Yes (only one version, data not updated) | Not stated but inferable. Probably every year the same sheet is updated. | No: wrt the other datasets of the eurobarometer series, questions change without explanation | No: wrt the other datasets of the eurobarometer series, questions change without explanation | No: w.r.t. the other datasets of the Eurobarometer series, questions have changed throughout the time, but the decision and the differences are not explained. |
Having performed a quality analysis already on our original datasets, we continued our analysis step with the legal one. This analysis was performed mostly with the purpose of checking the legal correctness of the various datasets in terms of Privacy Issues, IPR of the dataset, Licenses, Limitation on Public Access, Economical condition and Temporal aspects according to the “Check list for Public Administration for the Open Data release”.
Here below you have a representation of the outcome of this analysis.
We used a Yes/No answering format but when necessary, we also provided broad information or links for clarifications on the choice of the answer.
Legal Basis | Droughts events 1980-2001 | Global Active Archive of Large Flood Events | International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4 | GFEDv4 (Global Fire Emissions Database, Version 4) | Climate at a Glance - Time Series Graphs of Temperature Anomalies | Threatened species | Sea Ice and Snow Cover Extent | Climate Change Indicators: U.S. and Global Precipitation | Greenhouse Gas Emissions | National Footprint and Biocapacity Accounts 2019 Public Data Package | CAIT Paris Contributions Data | Cumulative data on the contributors of climate finance | Special Eurobarometer 313: Europeans’ attitudes towards climate change | Special Eurobarometer 409: Climate change | Special Eurobarometer 490: Climate change |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Privacy issues | |||||||||||||||
1.1 Is the dataset free of any personal data as defined in the Regulation (EU) 2016/679? https://eur-lex.europa.eu/legal-content/IT/TXT/PDF/?uri=CELEX:32016R0679&from=IT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
1.2 Is the dataset free of any indirect personal data that could be used for identifying the natural person? If so, is there a law that authorize the PA to release them? Or any other legal basis? Identify the legal basis. | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
"1.3 Is the dataset free of any particular personal data (art. 9 GDPR)? If so is there a law that authorize the PA to release them ?" | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
1.4 Is the dataset free of any information that combined with common data available in the web, could identify the person? If so, is there a law that authorize the PA to release them? | Yes | No, for each event, the location and the date are stated, so tracing back the news source could lead to individuals' name. | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
1.5 Is the dataset free of any information related to human rights (e.g. refugees, witness protection, etc.)? | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
1.6 Is a tool used for calculating the range of the risk of de-
anonymization?
Is the dataset anonymized? With which technique? Is it compliant with the three mandatory parameters: singling out, linking out, inference out? |
Not stated | ||||||||||||||
1.7 Are you using geolocalization capabilities ? Do you check that the geolocalization process can’t identify single individuals in some circumstances? | No | Yes, identifiable | Yes, not identifiable | No | No | No | No | No | No | No | No | No | |||
1.8 Does the open data platform respect all the privacy regulations (registration of the end-user, profiling, cookies, analytics, etc.)? https://www.varonis.com/blog/us-privacy-laws/ | "HDX: terms for cookies and for mailing service: https://data.humdata.org/about/terms.
UNEP: No" |
No privacy policy on the original website. In HDX, there are terms for cookies and for mailing service: https://data.humdata.org/about/terms. | Yes: https://www.noaa.gov/protecting-your-privacy | No | Yes: https://www.noaa.gov/protecting-your-privacy | Yes http://www.oecd.org/privacy/ | Yes (https://nsidc.org/about/privacy) | Yes https://www.epa.gov/privacy/privacy-and-security-notice#rights | Yes | Yes | Yes (https://www.wri.org/about/privacy-policy) | Yes (https://climatefundsupdate.org/privacy-policy/) | Yes: https://data.europa.eu/euodp/en/privacystatement | Yes: https://data.europa.eu/euodp/en/privacystatement | Yes: https://data.europa.eu/euodp/en/privacystatement |
1.9 Do you know who are in your open data platform the Controller and Processor of the privacy data of the system?
https://advisera.com/eugdpracademy/knowledgebase/eu-gdpr-controller-vs-processor-what-are-the-differences/
https://www.altalex.com/documents/news/2018/04/12/articolo-4-gdpr-definizioni |
No | OCHA, the system administrator of the HDX platform (inferred: it is the Controller, Google Analytics and Mixpanel are the Processors). | No, inferred: NOAA is the Controller | No, no | No inferred Controller-NOAA and Processor-Google Analytics | Not stated inferred Controller OECD | No, Inferred: NOAA is the Controller | No, inferred Controller EPA | Not stated. Inferred: OECD is the Controller | Not stated. Inferred: Global Footprint Network is the Controller | Not stated. Inferred: WRI is the Controller | Yes controller (Heinrich-Böll-Stiftung Washington, DC) | "Unit C.4, ""EU Open Data and CORDIS"" of the Publications Office is the Controller
European Union Open Data Portal (EU ODP) is the Processor" |
"Unit C.4, ""EU Open Data and CORDIS"" of the Publications Office is the Controller
European Union Open Data Portal (EU ODP) is the Processor" |
"Unit C.4, "EU Open Data and CORDIS" of the Publications Office is the Controller
European Union Open Data Portal (EU ODP) is the Processor" |
1.10 Where the datasets are physically stored (country and jurisdiction)? Do you have a cloud computing platform? Do you have checked the privacy regulation of the country where the dataset are physically stored? (territoriality) | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online.But previous versions are at the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. | Not stated if they are physically stored or just online. |
1.11 Do you have non-personal data? Are you sure that are not “mixed data”? | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. | Yes. Yes. |
2. IPR of the dataset | |||||||||||||||
2.1 Do you have created and generated the dataset? | Yes, (UNEP) | Yes, Dartmouth Flood Observatory. | Yes, NOAA NCEI | Yes, Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) | Yes, NOAA NCEI - NCDC | Yes, OECD | Yes, (NSIDC) | Yes, NOAA NCEI | Yes, OECD | Yes, Global Footprint Network | Yes, (WRI) | Yes, (Climate Funds Update) | Yes, (Directorate-General for Communication of the European Commission) | Yes, (Directorate-General for Communication of the European Commission) | Yes, (Directorate-General for Communication of the European Commission) |
2.2 Are you the owner of the dataset? Who is the owner? | Yes, (UNEP) | Yes, Dartmouth Flood Observatory. | Yes, NOAA NCEI | Yes, Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) | Yes, NOAA NCEI - NCDC | Yes, OECD | Yes, (NSIDC) | Yes, NOAA NCEI | Yes, OECD | Yes, Global Footprint Network | Yes, (WRI) | Yes, (Climate Funds Update) | Yes, (Directorate-General for Communication of the European Commission) | Yes, (Directorate-General for Communication of the European Commission) | Yes, (Directorate-General for Communication of the European Commission) |
2.3 Are you using third party data with the proper authorization and license? Are the dataset free from third party licenses or patents? | Third party data are used. No licences provided. https://preview.grid.unep.ch/index.php?preview=about&cat=3&lang=eng | Third party data are used. No licences provided. | Yes: https://www.ncdc.noaa.gov/ibtracs/index.php?name=terms | No third party data | Third party data are used. No licences provided. | No third party data | NOAA use NSIDC data. No licence provided. | No third party data | Third party data are used. No licences provided. | Third party data are used. No licences provided. | Yes The third party data used are those of the countries which provided their data. | Third party data are used. No licences provided. | No | No | No |
2.4 Are there some limitations in the national legal system of the dataset for releasing some kind of datasets with open license? | "No
Geneva (Switzerland) None or very limited activities are performed to monitor the reuse of open data in the country https://www.europeandataportal.eu/sites/default/files/open_data_maturity_report_2019.pdf see p.71 "Beginner" |
No | No | No | No | No | No | No | |||||||
3. Licences | |||||||||||||||
3.1 Is the dataset released with an open data license ? In case of the use of CC0 have they all the right necessary for this particular kind of license (e.g., jurisdiction)? | Available for free for non commercial purpose (https://preview.grid.unep.ch/index.php?preview=about&cat=2&lang=eng&fbclid=IwAR2swMOTGMxCFZKVptR1wGa7yY2HNz0mfYZMur_aGG3TZAfdg4IEz_qcjDs#datause) | Creative Commons Attribution 4.0 International license - CC BY 4.0 (HDX) | World Data Center for Meteorology policy and World Meteorological Organization's Resolution 40 policy https://www.ncdc.noaa.gov/ibtracs/index.php?name=terms | Data hosted by the ORNL DAAC is openly shared, without restriction, in accordance with NASA's Earth Science program Data and Information Policy. | Yes FOIA | Except where additional restrictions apply as stated above, You can extract from, download, copy, adapt, print, distribute, share and embed Data for any purpose, even for commercial use. You must give appropriate credit to the OECD | Yes FOIA | Yes FOIA | Except where additional restrictions apply as stated in the website, you can extract from, download, copy, adapt, print, distribute, share and embed data for any purpose, even for commercial use. You must give appropriate credit to the OECD. | Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0) | Creative Commons Attribution 4.0 International License (CC BY 4.0) | Not stated | Yes: reuse of data published on this website for commercial or non-commercial purposes is authorised provided the source is acknowledged. | Yes: reuse of data published on this website for commercial or non-commercial purposes is authorised provided the source is acknowledged. | Yes: reuse of data published on this website for commercial or non-commercial purposes is authorised provided the source is acknowledged. |
3.2 Is the clause included: "In any case the dataset can’t be used for re-identifying the person" ? | No | No | No | No | No | No | No | No | No | No | No | No | No | No | |
3.3 Is the API (in case there is) released with an open source license ? | Yes API, no licence | No API | No API | Yes API, No licence stated but inferred platform licence which is based on NASA FOIA | Yes API, No licence stated but inferred FOIA | Yes API, No open source licence stated but inferred that of the platform http://www.oecd.org/termsandconditions/ | Yes API, no licence | No API | Yes API, no open source licence | Yes API, no licence | Yes API, no licence | Yes API, no licence | No API | No API | No API |
3.4 Is the open data/API platform license regime compliant with your IPR policy? Do they have all the licences related to the open data platform/API software? | No license for the data platform | No license for the data platform | No license for the data platform | Yes Data platform license compliant to IPR policy, Yes license for open data platform but no licence for the API platform thus inferred it has the open data platform's license, yes. | Data platform license compliant to IPR policy but no licence for the API platform thus inferred data platform license, Yes for the platform and not for the API. | Data platform/API license compliant to IPR policy , yes | No, no | Data platform license compliant to IPR policy and has no API, yes | (API) Yes, yes | No license for the data platform | No license for the data platform | No license for the data platform | (data platform) Yes, yes: https://data.europa.eu/euodp/en/copyright | (data platform) Yes, yes: https://data.europa.eu/euodp/en/copyright | (data platform) Yes, yes: https://data.europa.eu/euodp/en/copyright |
4. Limitations on public access | |||||||||||||||
4.1 Does the dataset concern your institutional competences, scope and finality? Does the dataset concern other public administration competences? | Yes, no | Yes, no | Yes, no | Yes, no | Yes, no | Yes, no | Yes, no | Yes, no | Yes, yes: UNFCCC | Yes, yes: UN | Yes (https://www.wri.org/about/values) (https://www.wri.org/about/mission-goals) | Yes Yes (Overhead refers to expenditures from the Fund that are not directed to projects (such as administration fees)). | Yes, no | Yes, no | Yes, no |
4.2 Does the dataset respect the limitations for the publication stated by your national legislation or by the EU directives ? https://project-open-data.cio.gov/policy-memo/ for USA | Yes | No open license on Dartmouth Observatory website | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | ||||
4.3 Are there some limitations connected to the international relations, public security or national defence ? | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
4.4 Are there some limitations concerning the public interest ? | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
4.5 Does the dataset respect the international law limitations? https://opendatacharter.net/principles/ (?) | Yes | Yes | Yes | Yes but Open data platform not linked to metadata | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
4.6 Does the dataset respect the INSPIRE law limitations for the spatial data? https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:32007L0002 | No | Not EU dataset | Not EU dataset | Not EU dataset | Not EU dataset | Not EU dataset | |||||||||
5. Economical Conditions | |||||||||||||||
5.1 Could the dataset be released for free? | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
5.2 Are there some agreements with some other partners in order to release the dataset with a reasonable price? | |||||||||||||||
5.3 Does the open data platform terms of service include a clause of “non liability agreement” regarding the dataset and API provided ? | Yes | No | No, just for links | Yes for dataset and links but not stated for the API https://science.nasa.gov/earth-science/earth-science-data/data-information-policy/data-rights-related-issues | No, just for links | Yes for the dataset and API | No, just for links NOAA.gov does not control or guarantee the accuracy, relevance, timeliness or completeness of information contained in a linked site. | Not stated in EPA NOAA just for links | Yes API, yes data | No | Yes (https://www.wri.org/about/open-data-commitment) | No | No | Yes | Yes |
5.4 In case you decide to release the dataset to a reasonable price are the limitation imposed by the new directive 2019/1024/EU respected ? Are you able to calculate the “marginal cost”? Are you able to justify the “reasonable return on investment” limited to cover the costs of collection, production, reproduction, dissemination, preservation and rights clearance? There is a national law that justify your public administration to apply the “reasonable return of investment”? | |||||||||||||||
5.5 In case you decide to release the dataset to a reasonable price do you check the e-Commerce directive1 and regulation? | |||||||||||||||
6. Temporary aspects | |||||||||||||||
6.1 Do you have a temporary policy for updating the dataset ? | Never (HDX) | "active" = current events are added immediately | Twice weekly - Weekly (IBTrACS Project) / Daily (NOAA) | Periodically | No | No | No | No | No | Annually | No | No | No | No | No |
6.2 Do you have some mechanism for informing the end-user that the dataset is updated at a given time to avoid mis-usage and so potential risk of damage ? | No The United Nations periodically adds, changes, improves or updates the Materials on this Site without notice | No | Yes, forum | No, just an email for data access. | No | No | No | No | No | No | No | No | No | No | No |
6.3 Did you check if the dataset for some reason can’t be indexed by the research engines (e.g. Google, Yahoo, etc.) ? | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed | Indexed |
6.4 In case of personal data, do you have a reasonable technical mechanism for collecting request of deletion (e.g. right to be forgotten)? | No | Yes, email (HDX) | No | No | No | Yes, email | No | No | Yes, email | No | Yes (https://www.wri.org/about/privacy-policy > choices) | No | No | No | No |
In order to carry out the ethical analysis of the original datasets, we relied on the Data Ethics Framework. We organized our analysis of each dataset according to several points of view: transparency, accountability, discrimination, cognitive bias and prejudice.
In particular, we tried to analyze each of these aspects organizing our analysis in four areas:
Purpose: this dataset is part of the wider “Global risk data platform” which also included data about other natural hazards. The purpose of the platform is to allow the visualisation of data on natural hazards.
Process: the data have been collected by merging data from different sources (that were cited on the website). In this case the sources were: a global monthly gridded precipitation dataset obtained from the Climatic Research Unit (University of East Anglia) and a GIS modeling of global Standardized Precipitation Index based on Brad Lyon (IRI, Columbia University) methodology. In this way the resulting dataset can be based on two different points of view. We don’t know anything from the platform about any prejudice or bias respect to the data collected, but we know that methodologies on hazards modeling were reviewed by a team of 24 independent experts selected by the World Meteorological Organization (WMO) and the United Nations Education and Scientific Cultural Organization (UNESCO).
Output: the resulting dataset is not easily understandable because there is no legend to interpret the column headers (lack of guidelines). A version is not indicated and therefore consistency cannot be ascertained. The platform provided a series of legal information about the license and the way the datasets can be used by the users (e.g. no commercial purpose). However beside this, there are no notions of discrimination and bias.
Conclusion: the dataset can be considered good from an ethical point of view but we cannot say the same about its transparency, because of the lack of the legend.
Purpose: The target is not declared, hence we could just infer that it is addressed to researchers. The benefit is explicitly said to be creating a unique source for large flood events. Nonetheless, since it doesn’t involve other archives, it could instead fragment the scenario. In the purpose, there is no trace of discrimination, prejudice or cognitive bias. It has a global basis.
Process: No transparency and accountability in the processing: even if sources are stated, the actual data they provided are not identifiable. Governmental sources they claim to have used are not distinguishable. No caveats nor documentation on what they have done have been provided.
Output: In the final dataset:
- All countries have been recognised and no political discriminations have been made (e.g. Israel and Palestine).
- There are no personal data, but the number of deaths, when small, combined with other information as the location could lead to individuals’ names. The purpose in using deaths and displaced is to show the gravity of the flood. However, another index is also used, so this information may have been avoided.
- Since it is mainly based on news, as they state, the dataset contains mainly data about major events and “first world” countries (http://floodobservatory.colorado.edu/Archives/ArchiveNotes.html). Stating just “news” as source without naming it makes impossible to check the validity of the reported datum. On the other hand, making easy to retrieve the news source could mean in some cases facilitating the identification of involved people.
Conclusion: there aren’t prejudice and cognitive bias, but discrimination, since data focus on ‘first world’ countries and on limited sources (mainly news). Moreover, possible ethical problems arise about deaths and displaced. Nonetheless, the greatest ethical problems of the dataset are little openness (especially w.r.t. procedures) and accountability difficulties. If it wasn’t so cited in the academic world, it wouldn’t seem enough reliable.
The dataset can be considered as perfect from an ethical point of view because it doesn’t contain prejudice, cognitive bias or discrimination and everything is well documented: purpose and user need, data provenance, caveats and usage information (available in the technical documentation), field names explanation, ways to provide feedback.
Purpose: the data has a clear user need which is to provide global estimates of monthly burned area, monthly emissions and fractional contributions of different fire types, daily/3-hourly fields to scale the monthly emissions to higher temporal resolutions, and data for monthly biosphere fluxes which could be used for large-scale modeling studies.
Process: from a legal point of view the data is on point. The collection of the data used is done without any discrimination, cognitive bias or prejudice as inferred on the website (https://daac.ornl.gov/VEGETATION/guides/fire_emissions_v4.html) making use of the available data from other sources and theirs (Satellite information) to create a global view of the situation. The only note could be that they make no mention of licences for use of data from others.
Output: the data used serves exactly the need of the user they want to satisfy and is restricted to its purpose of creation. The dataset released at the end is available in an open format and free for reuse on an API platform. The dataset will in no way harm any individual person,community or country or public interest even with new events registered from the documentation. There is a clear description of the composition of the database again free from any discrimination,cognitive bias or prejudice.
Conclusion: we can in summary say that the dataset from an ethical point of view is clean to an extent.
Purpose: they don’t state clearly what is the purpose of the dataset created or the user need to which they are responding to but it is inferred that they want to make known to all what is the situation of the temperature anomalies in the world over the years
Process: for the creation of the final dataset, they combine data from two resources (Global Historical Climatology Network-Monthly (GHCN-M) data set and International Comprehensive Ocean-Atmosphere Data Set (ICOADS)) known for carrying out quality controls on their data for good practice. Their choice of sources is understood which goes to rhyme with the purpose and helps answer strictly to the user's need identified. Therefore no discrimantion, prejudice or cognitive bias.
Output: the dataset released from this combination is made available to all in an open format having all the information it had planned to deliver without any wrong ethical aspect. Good explanation of the basis of the results found in the dataset.
Conclusion: it can be therefore considered that the dataset is ethically correct.
Purpose: the purpose of the dataset is clearly stated and it is to show the numbers of known species (or assessed) and threatened species with the aim of indicating the state of mammals, birds, freshwater fish, reptiles, amphibians, vascular plants, mosses, lichens and invertebrates. This purpose has no issue of discrimination, cognitive bias or prejudice because most especially it goes for world information and also consider information from the various national Delegates.
Process: the process of collection and analysis of the data to create the dataset is done by updating and revising certain information from the comments of national Delegates. The basis of this act is not well stated on the website. So, it could be inferred that there may be some cognitive bias in the decision making.
Output: the released dataset is done through an API platform free to all but it is stated on the website that the interpretation should take in consideration the possibility of non exactness of the various values. Also they talk of the possibility of biased results due to overestimation of some of the incompletely evaluated groups of species likely to be threatened in certain countries.
Conclusion: the level of ethical correctness of this data set is not completely good because in the end we have a dataset of which some values may be wrong due to certain actions during its creation.
Purpose: the purpose of providing a tool to see the sea ice extent over years is achieved: users can generate and examine graphs and statistics on ice and snow, or download the data to populate spreadsheets for further analysis.
Process: the purpose of providing a tool to see the sea ice extent over years is achieved: users can generate and examine graphs and statistics on ice and snow, or download the data to populate spreadsheets for further analysis.
Output: the result is a tool for browsing the sea ice extent from 1979 to 2020 for the Northern Hemisphere, Southern Hemisphere, and the Globe. Data can be observed monthly or annually. Very poor documentation, no information about restriction of use, bias and discrimination.
Conclusion: the dataset seems to be free from cognitive bias, however very few documentation is provided.
Purpose: the purpose of creation of the dataset is clear and has no ethical distortion for the precise user need which was to point out all the precipitation anomalies over the given period selected.
Process: during the creation of the dataset they make use of all possible resources to create a well informed database on the subject matter. The good aspect is the fact that during the creation they make use of bias correction software ( automated bias correction software) which helps identify and eliminate biases. Also the personal intervention of the staff, scientists and data quality tests are done in the light of excluding any ethical compromise.
Output: the datasets released are well documented and are available without charge through NCEI's anonymous FTP service. The information it contains is of good quality and satisfies the user's need and purpose of creation.
Conclusion: this dataset can be consequently considered ethically correct.
Purpose: Target and purpose are inferable but not explicitly stated. It is not clear if data is referred just to countries of OECD. In case, this could cause cognitive bias.
Process: The provenance of the single datum is not stated so there are no ways to compare the dataset with the original sources and detect possible errors. No caveats or technical documentation to make the procedure reproducible have been made public.
Output: Even though apparently you are downloading the result of your specific query, the dataset could include unrequested data, e.g. downloading data including LULUFC leads anyway to a dataset that contains at the beginning data excluding LULUFC. Moreover, internal choices have not been clarified: in formats as CSV, values appear to be repeated in two columns; the codes for pollutants, variables, units and powercodes aren’t explained; reference and flags, despite the specific column, are overall unused. Finally, the fact that the countries are those of OECD is just inferable and has not been explicitly stated.
Conclusion: There is not properly discrimination or cognitive bias, but the vision is definitely partial because the set of countries is limited and in general the procedure and the output are not enough transparent and accountable.
Purpose: The target is as broad as possible, with the purpose of making available the data to the public. No discrimination, prejudice or cognitive bias can be detected at this stage.
Process: The methodology is accountable and transparent.
Output: Everything is explained in the related paper. You are also given the possibility to access the paper of the previous versions to spot the differences. Their selection of countries could be said politically discriminant (e.g. Israel is present, while Palestine absent).
Conclusion: The peculiarity of the dataset is the purpose of making it available to everyone. Everything is accountable and transparent. There are no discrimination, prejudice or cognitive bias in any phase, except for the choice of the countries, which seems to take a political stand.
Purpose: the purpose of the dataset is to provide a collection of data about the countries which submitted the Paris Agreement in 2015-2016 and their commitments in the field of climate change. The purpose is achieved because the structured data from the CAIT Paris Contributions Map enables users to explore, compare, and assess the greenhouse gas mitigation plans in each country's Intended Nationally Determined Contribution (INDC).
Process: after the submission of the Paris Agreement, countries decided to release public outlines of actions they intended to take in order to achieve the goal. The data are structured according to a framework based on several protocols and standards listed on the website. The list of the adhesive countries and the license are provided on the first sheet of the dataset.
Output: the output of the process is an interactive map accessible through the platform. By clicking on almost every country (except for Libya for which we don’t have any document submitted), the user can see the information about the agreement for each country separately from the others (the same information provided in the dataset). The data are about the commitments of each country against climate change, so we can infer that they do not contain prejudices, discriminations and biases. What about Libya? This is the only case that can create a bias. A second issue is that the downloadable version of the dataset is updated to 2016, while the user can find on the online platform the data updated to 2019.
Conclusion: the dataset is almost complete from an ethical point of view, except for information about Libya. The API is easily accessible and transparent, but there is a discrepancy between the downloadable version and the online one.
Purpose: the purpose of the platform is to present cumulative data on the contributors of climate finance from the multilateral climate change funds monitored by the platform itself. The purpose is achieved.
Process: We don’t know who is the owner of the platform and it is not clear what does it mean that “the data are presented for each multilateral climate change funds it tracks”. The platform collects the data in the following way: seeks information from different sources and then seeks correspondence with fund managers in order to verify the collected information. Despite this, it is stated that the platform receives verification for almost all funds and it is not indicated which are the authorities that verifies them. All these things can lead to an issue for what concerns reliability. A positive point is that the platform tracked governed funds focused on climate change and based its dataset on that funds (reliability and transparency). From the dataset we can infer that the analysed countries are mainly from Europe and Central Asia; no explanations about the choice of the countries (maybe chosen the ones that devoted a good part of the funds to climate change). No further info about any kind of prejudice or discrimination provided by the platform.
Output: the resulting output is user friendly and easily accessible.
Conclusion: it is not very clear who verify the funds and how much accurate data are (not very reliable). We can notice that the greater amount of data come from Europe and Central Asia. The resulting API is easily accessible.
Purpose: the purpose of the dataset is to understand what European citizens think about the climate change situation and what are their expectations for the future.
Process: to be able to accomplish their aim, the opinions of the citizen were collected carrying out surveys which results have been later analysed. The survey method and questions are described and documented and it is understandable that there is no ethical distortion.
Output: the dataset released contains all the countries of the EU and all the questions and answers are reported without any change. The survey method and questions are documented and further described in a specific paper. The data collected was used strictly for the purpose of the dataset and there were possibilities of not answering to certain questions. So, the possibility of prejudice is excluded and since everybody could take part in the survey we can say there is no discrimination. From the results there is no cognitive bias since there is no interpretation of the results of the dataset, just a publication. However, it is unclear the purpose of questions related to the economical status or the level of instruction of the individual in such a context; hence, they don’t seem totally free of discriminatory aspects.
Conclusion: everything is accountable and transparent. There are no discrimination, prejudice or cognitive bias, a part for the unclear purpose of some personal questions (e.g. economic status) apparently unrelated to the context.
At this stage we analyzed our datasets under the technical point of view. We examined the available formats, the presence of metadata, the URIs and the provenance. Below the result:
From our original datasets, having performed all the analysis and verifications needed, we then created an overview of the various information we needed to retrieve from them based on questions we wish to answer to following our purpose and scenario. From this point we then moved to the extraction of data from the datasets using Python as our programming language. The various codes we used in this process can be found in the “code” folder. In this folder there are three folders:
- 1-data-extraction: in this folder you will find the python file
base-file.py
which is a condesé of our 15 functions we used to extract data from our original datasets. There is also the filecountries.py
which contains the python function used to extract the various countries and their ISO codes present in our original datasets, in order to manage the mistakes and the exceptions. - 2-py-to-xml: in this folder you have the three functions we used to create our output XML datasets.
- 3-xml-to-json: in this folder you find the python files we used to create the json files useful for the visualizations, starting from our XML datasets.
Through the extraction and creation processes we produced three new datasets in XML format, which contain also their metadata. These datasets were used for the future points of our project and are collected in the “xml” folder. These datasets are:
- natural_events.xml
- impact_and_commitments.xml
- eu_opinions-xml
Here below you have a table of the output datasets and the original datasets used to create each of them.
Output Dataset | Origin Datasets |
---|---|
natural_events.xml | Droughts events 1980-2001, Global Active Archive of Large Flood Events, International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4, GFEDv4 (Global Fire Emissions Database, Version 4) Climate at a Glance - Time Series Graphs of Temperature Anomalies, Climate Change Indicators: U.S. and Global Precipitation, Sea Ice and Snow Cover Extent. |
impact_and_commitments.xml | Greenhouse Gas Emissions, National Footprint and Biocapacity Accounts 2019 Public Data Package, CAIT Paris Contributions Data, Cumulative data on the contributors of climate finance. |
eu_opinions.xml | Special Eurobarometer 313: Europeans’ attitudes towards climate change, Special Eurobarometer 409: Climate change, Special Eurobarometer 490: Climate change. |
The datasets have been created according to the “FAIR Principles”, so we would say that they are free of any quality, legal, technical and ethical problems.
We decided to release our datasets under the license CC-BY-SA 4.0.
During the extraction of data from our original datasets, we encountered certain difficulties which we wish to make mention of for each dataset.
- Global Active Archive of Large Flood Events: when we started the extraction process on the original dataset, we found out that some cells present the character /xa0, which is a non-breaking space in Latin1 (ISO 8859-1). We had to replace it before proceeding. Moreover, dates cells aren’t human-readable, so we used the library
datetime
to translate them. - International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4: the dataset was so big that it wasn't possible to work on it. In fact, since the location of the hurricane was reported in coordinates, we had to reverse geocode them to find the country, by using a third party geolocator, Open Street Map, but, we weren't able for its usage limits to work on the whole dataset. Thus, we selected data every five years and created a parallel CSV on which to work on.
- GFEDv4 (Global Fire Emissions Database, Version 4): not being able to download the global dataset at once since the API platform did not advise this for reasons of bulkiness, we downloaded the dataset in series of three in a TXT format and combined them to create the CSV file. Due to this reason we have no URI for the dataset because the link we used is no longer accessible on the server.
- CAIT Paris Contributions Data: In order to process our dataset we needed to modify the text, directly on the sheet, of one single cell (which contains some date information). We did this because the format of the date was not conformant to be processed, since it contained both xldate and string information. Another change was made in the cells of the columns containing the "Summary". We decided to clean the information in these cells because in addition to the proper text they contained also html tags and entities.
The CCODe project is the outcome of an academic project at the University of Bologna. Thus, it won’t be maintained. Nonetheless the resulting datasets are based on other datasets that were originally collected by larger organizations and many of them are actively updated. We provided the links in the “General analysis” section, so that anyone can compare our datasets with the original ones. All the analysis were performed between February and March 2020.
We invite you to notify us in case you find errors or ways to improve our work; we provided the email contact in the metadata of the datasets.
In order to make our datasets easily reusable, we have indeed completed them with their metadata following DCAT_AP (v 2.0.0).
Moreover we provided the python codes that we used to extract the data of our interest and to produce the final xml and json files. They are freely available for further reuse, as long as the license is respected.
Everything is protected by the CC-BY-SA 4.0 license, which allows many uses of the work, provided that the creator is cited and the same license is maintained for the derivative works. See the specification for the use, on Creative Commons website. Please cite us as “Del Bene R., Hamvegam M. L. S., Pizzicori A. (2020) CCODe”.
If the project was financed, for a further implementation it would be useful to maintain the current datasets and enlarge them with data related to the missing years. It could also be desirable to cross-check in new ways our datasets, to make unexpected knowledge emerge.
At the beginning of our work, we formulated some hypotheses, starting from various questions with the final aim to decide how to intersect data.
We tried to reproduce this mind map in the final visualization section, organizing it in three categories, one for each dataset. Our purpose was to guide the user in the exploration of our data.
Since a part of the data was collected on a global scale and another on a country-base scale, we diversified our charts following the same approach.
Highchart is the JavaScript library we used to create the charts. This required the implementation of JSON files specifically formatted for the purpose.
Only the visualization of the map was created using another library, DataMaps, which allows us to create a choropleth map, to explore the evolution of the events over time.
In order to make our data reusable and interoperable, we provided them with their metadata, following the DCAT_AP (v 2.0.0) documentation.
The metadata were added both at the beginning of the XML documents (our final datasets) and incorporated into some tables on the metadata section of the website of the project.
We provided metadata for the whole catalogue (including the three datasets) and for each dataset individually. Moreover the RDF assertion for the metadata, following the Turtle serialization, has been released. This is accessible from the website as well.
Brainstorming ideas for the project, we all found ourselves concerned about climate change and hopeful that data could be an answer in representing it. Therefore, our initial question was: how evident is the problem of climate change?
This initial doubt lead to asking ourselves: how do countries behave in terms of emissions, one of the main causes of the phenomenon, and how do they commit against it? What is the perception of the problem from the citizens’ side?
Of course, the different time spans of the datasets and, in the case of the opinions, of a global spatial coverage influenced the output, which lacks for this reason of precision. Moreover, we discovered there could be external factors that condition data, as for example w.r.t. emissions, for which a country can balance its accounts by investing to fight climate change.
Nonetheless, some phenomena are evident:
- Extreme natural events as droughts and wildfires have been increasing, even though not impressively, given the limited time span;
- With a larger range of years, the climb is highly evident, as it happens with temperature and sea ice extent anomalies. Clearly their trends are inversely proportional;
- During the latest years, footprint and emissions are overall decreasing and biocapacity is growing, even though still on a small scale. Could it be interpreted as a sign of the current awareness?
- There is a marked distinction between first-world and third-world countries, which is evident by the magnitude of their emissions and by the amount of deposited funds;
- The perception and the reaction of European citizens to climate change saw a negative trend from 2009 and 2013, while in 2019 the awareness is more spread, probable result of the highlighting of the problem in the past years.
Some of these outcomes proved our hypothesis; some others were unexpected, as for example the high amount of each natural event, China’s official “low” account of emissions and the citizens’ perception's negative change in 2013. Overall, we thought that data could make emerge a knowledge that is still too much ignored and we were proved in this sense right.