Skip to content

gpw13/ghost

Repository files navigation

ghost

R-CMD-check

Overview

ghost is an R package designed to provide a simple interface for extracting data from the World Health Organization’s Global Health Observatory (GHO) database using the Open Data Protocol API. The package allows for exploration of indicators and dimensions available in the GHO and extract of these into R data frames.

  • gho_indicators() provides a data frame of all available indicators in the GHO.
  • gho_dimensions() provides a data frame of all available dimensions in the GHO.
  • gho_dimension_values() provides a data frame of all available values for a single dimension in the GHO.
  • gho_data() extracts data from a selection of indicators in the GHO and outputs all results in a single data frame.

The interface is designed to be as simple as possible, only requiring input of the code of an indicator to extract it. However, it is also compatible with using more complex queries in line with the OData protocol. Details on its implementation available in the OData documentation.

ghost can be installed using remotes::install_github("gpw13/ghost")

Usage

To begin, we can use gho_indicators() to begin to explore all data available in the GHO.

library(ghost)

gho_indicators()
#> # A tibble: 2,359 x 3
#>   IndicatorCode IndicatorName                                           Language
#>   <chr>         <chr>                                                   <chr>   
#> 1 AIR_1         Ambient air pollution attributable deaths               EN      
#> 2 AIR_10        Ambient air pollution  attributable DALYs per 100'000 … EN      
#> 3 AIR_11        Household air pollution attributable deaths             EN      
#> 4 AIR_12        Household air pollution attributable deaths in childre… EN      
#> 5 AIR_13        Household air pollution attributable deaths per 100'00… EN      
#> # … with 2,354 more rows

If we want the data for AIR_1, we could now just quickly access the data frame using gho_data().

gho_data("AIR_1")
#> # A tibble: 372 x 23
#>      Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>   <int> <chr>         <chr>          <chr>      <chr>         <int> <lgl>   
#> 1  4882 AIR_1         COUNTRY        AFG        YEAR           2004 NA      
#> 2  4883 AIR_1         COUNTRY        ALB        YEAR           2004 NA      
#> 3  4884 AIR_1         COUNTRY        DZA        YEAR           2004 NA      
#> 4  4885 AIR_1         COUNTRY        AND        YEAR           2004 NA      
#> 5  4886 AIR_1         COUNTRY        AGO        YEAR           2004 NA      
#> # … with 367 more rows, and 16 more variables: Dim1 <lgl>, Dim2Type <lgl>,
#> #   Dim2 <lgl>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <dbl>,
#> #   High <dbl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

From here, standard methods of data manipulation (e.g. base R, the tidyverse) could be used to select variables, filter rows, and explore the data. However, we could also provide OData queries as desired, filtering on different dimensions of the data. Let’s first have a quick look at available dimensions.

gho_dimensions()
#> # A tibble: 91 x 2
#>   Code             Title                            
#>   <chr>            <chr>                            
#> 1 ADVERTISINGTYPE  SUBSTANCE_ABUSE_ADVERTISING_TYPES
#> 2 AGEGROUP         Age Group                        
#> 3 ALCOHOLTYPE      Beverage Types                   
#> 4 AMRGLASSCATEGORY AMR GLASS Category               
#> 5 ARCHIVE          Archive date                     
#> # … with 86 more rows

Let’s say we want to filter by COUNTRY, then we can explore explore the possible values the SpatialDim COUNTRY dimension can take.

gho_dimension_values("COUNTRY")
#> # A tibble: 245 x 6
#>   Code  Title        Dimension ParentDimension ParentCode ParentTitle
#>   <chr> <chr>        <chr>     <chr>           <chr>      <chr>      
#> 1 AGO   Angola       COUNTRY   REGION          AFR        Africa     
#> 2 BDI   Burundi      COUNTRY   REGION          AFR        Africa     
#> 3 BEN   Benin        COUNTRY   REGION          AFR        Africa     
#> 4 BFA   Burkina Faso COUNTRY   REGION          AFR        Africa     
#> 5 BWA   Botswana     COUNTRY   REGION          AFR        Africa     
#> # … with 240 more rows

If we wanted to only extract AIR_1 data on Burundi from the GHO, then we can now implement an OData query using the code we’ve identified above. While ghost doesn’t implement complex checks on your OData queries due to their complexity, it allows you to type them with spaces and checks that each query begins with the required "$filter=...".

gho_data("AIR_1", "$filter=SpatialDim eq 'BDI'")
#> # A tibble: 2 x 23
#>      Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>   <int> <chr>         <chr>          <chr>      <chr>         <int> <lgl>   
#> 1  4909 AIR_1         COUNTRY        BDI        YEAR           2004 NA      
#> 2 21904 AIR_1         COUNTRY        BDI        YEAR           2008 NA      
#> # … with 16 more variables: Dim1 <lgl>, Dim2Type <lgl>, Dim2 <lgl>,
#> #   Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>, DataSourceDim <lgl>,
#> #   Value <chr>, NumericValue <dbl>, Low <lgl>, High <lgl>, Comments <lgl>,
#> #   Date <chr>, TimeDimensionValue <chr>, TimeDimensionBegin <dbl>,
#> #   TimeDimensionEnd <dbl>

And we can get data from the GHO on multiple indicators in one call, with the output data frames already merged together.

gho_data(c("AIR_1", "AIR_10", "AIR_11"), "$filter=SpatialDim eq 'BDI'")
#> # A tibble: 21 x 23
#>         Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>      <int> <chr>         <chr>          <chr>      <chr>         <int> <chr>   
#> 1     4909 AIR_1         COUNTRY        BDI        YEAR           2004 <NA>    
#> 2    21904 AIR_1         COUNTRY        BDI        YEAR           2008 <NA>    
#> 3     6479 AIR_10        COUNTRY        BDI        YEAR           2004 <NA>    
#> 4 19580064 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#> 5 19580065 AIR_11        COUNTRY        BDI        YEAR           2016 SEX     
#> # … with 16 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <chr>,
#> #   Dim2 <chr>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <dbl>,
#> #   High <dbl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

We can even provide different filters for each indicator separately, such as Burundi for AIR_1, Uganda for AIR_10, and South Africa for AIR_11.

gho_data(c("AIR_1", "AIR_10", "AIR_11"), 
         c("$filter=SpatialDim eq 'BDI'", "$filter=SpatialDim eq 'UGA'", "$filter=SpatialDim eq 'ZAF'"))
#> # A tibble: 21 x 23
#>         Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type
#>      <int> <chr>         <chr>          <chr>      <chr>         <int> <chr>   
#> 1     4909 AIR_1         COUNTRY        BDI        YEAR           2004 <NA>    
#> 2    21904 AIR_1         COUNTRY        BDI        YEAR           2008 <NA>    
#> 3     6612 AIR_10        COUNTRY        UGA        YEAR           2004 <NA>    
#> 4 19583070 AIR_11        COUNTRY        ZAF        YEAR           2016 SEX     
#> 5 19583071 AIR_11        COUNTRY        ZAF        YEAR           2016 SEX     
#> # … with 16 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <chr>,
#> #   Dim2 <chr>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <lgl>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <dbl>,
#> #   High <dbl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

Of course, the reality is that it’s likely easier for us to work outside the OData filtering framework and directly in R, so here’s a final more complex example using dplyr and stringr alongside ghost to automatically download all indicators with the word “drug” in the indicator name (case insensitive).

library(dplyr)
library(stringr)

gho_indicators() %>%
  filter(str_detect(str_to_lower(IndicatorName), "drug")) %>%
  pull(IndicatorCode) %>%
  gho_data()
#> # A tibble: 25,302 x 23
#>       Id IndicatorCode SpatialDimType SpatialDim TimeDimType TimeDim Dim1Type   
#>    <int> <chr>         <chr>          <chr>      <chr>         <int> <chr>      
#> 1 273692 MALARIA_30539 COUNTRY        MWI        YEAR           2004 RESIDENCEA…
#> 2 273693 MALARIA_30539 COUNTRY        MWI        YEAR           2004 RESIDENCEA…
#> 3 273694 MALARIA_30539 COUNTRY        MWI        YEAR           2004 <NA>       
#> 4 273695 MALARIA_30539 COUNTRY        TZA        YEAR           2004 RESIDENCEA…
#> 5 273714 MALARIA_30539 COUNTRY        BDI        YEAR           2005 RESIDENCEA…
#> # … with 25,297 more rows, and 16 more variables: Dim1 <chr>, Dim2Type <lgl>,
#> #   Dim2 <lgl>, Dim3Type <lgl>, Dim3 <lgl>, DataSourceDimType <chr>,
#> #   DataSourceDim <lgl>, Value <chr>, NumericValue <dbl>, Low <lgl>,
#> #   High <lgl>, Comments <lgl>, Date <chr>, TimeDimensionValue <chr>,
#> #   TimeDimensionBegin <dbl>, TimeDimensionEnd <dbl>

And once we have that data, we can then filter, explore, and analyze the data with our standard R workflow, or even export the downloaded data to Excel or other analytical tools for further use.

Releases

No releases published

Packages

No packages published

Languages