Skip to content

Commit

Permalink
adding the bib
Browse files Browse the repository at this point in the history
  • Loading branch information
sammo3182 committed Aug 20, 2023
1 parent 3630616 commit 3bbbcdd
Show file tree
Hide file tree
Showing 2 changed files with 166 additions and 71 deletions.
171 changes: 100 additions & 71 deletions vignettes/regioncode-vignette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ vignette: >
%\VignetteIndexEntry{regioncode: Convert Region Names and Division Codes of China Over Years}
%\VignetteEngine{knitr::rmarkdown}
bibliography: s_regioncode.bib

editor_options:
markdown:
wrap: sentence
Expand All @@ -26,15 +28,15 @@ library(tidyverse)

"City" is a complex concept in China.
It may refer to a county-level, prefectural, or provincial administrative unit.
Scholars of China often suffer from convert these names or corresponding geocodes, especially when dealing with data over years, since, for every a while, some unit's name may be modified or cancelled by the central government.
Scholars of China often suffer from convert these names or corresponding geocodes, especially when dealing with data over years, since, for every a while, some unit's name may be modified or cancelled by the central government [@GuoJiaTongJiJu2022].

Inspired by Vincent Arel-Bundock's [`countrycode`](https://joss.theoj.org/papers/10.21105/joss.00848) package, we created `regioncode`, a package to achieve similar functions but specifically for region name/code conversions within China.
`regioncode` aims to enable seamlessly converting regions' formal names, common-used names, and administrative division codes between each other in modern China (1986--2019 in the current version).

# Why `regioncode`?

The Chinese government gives unique geocodes for each county, city (prefecture), and provincial-level administrative unit.
These "administrative division codes" are consistently [adjusted and updated](http://www.mca.gov.cn/article/sj/xzqh/1980/) to matched national and regional plans of development.
These "administrative division codes" are consistently [adjusted and updated](http://www.mca.gov.cn/article/sj/xzqh/1980/) to matched national and regional plans of development [@MinZhengBu2022].
The adjustments however may disturb researchers when they conduct studies over time or merge geo-based data from different years.
Especially, when researchers render statistical data on a Chinese map, different geocodes between map data and statistical data can cause mess-up outputs.

Expand All @@ -60,7 +62,6 @@ The current version includes three basic types of output (together with three ty

1. Geocodes (`code`)
1. Names of the given cities/provinces (`name`)
1. Area the given cities/provinces belong (`area`, such as 华北, 东北, 华南, etc.).

In the following example, the 2019 geocodes in the toy data to their 1989 version.
Users need to correctly set the `year_from` argument to point to the proper reference.
Expand Down Expand Up @@ -187,6 +188,8 @@ To convert this type of data, `regioncode` sets a specific argument `zhixiashi`.
The default value of the argument is "FALSE," by which the municipalities are treated as provinces.
When it is set "TRUE," the municipalities are treated as prefectures, and their provincial codes are used as the geocodes.

In the following example, we illustrate the municipalities identifier with a mixed string of names of municipalities, their districts, and a prefecture:

```{r municipality}
names_municipality <- c("北京", # Beijing, a municipality
"海淀区", # A district of Beijing
Expand All @@ -211,63 +214,10 @@ regioncode(data_input = names_municipality,
# zhixiashi = TRUE)
```

## Geographic Units Beyond Provinces

The current version of `regioncode` includes two types of region conversion beyond the provincial level: sociopolitical area and linguistic zones.

### Sociopolitical Area

Due to social, political, and martial reasons, Chinese regions are divided into seven areas:

| region | provincial-level administrative unit |
|:-------|----------------------------------------------------------------|
| 华北 | 北京市, 天津市, 山西省, 河北省, 内蒙古自治区 |
| 东北 | 黑龙江省, 吉林省, 辽宁省 |
| 华东 | 上海市, 江苏省, 浙江省, 安徽省, 福建省, 台湾省, 江西省, 山东省 |
| 华中 | 河南省, 湖北省, 湖南省 |
| 华南 | 广东省, 海南省, 广西壮族自治区, 香港特别行政区, 澳门特别行政区 |
| 西南 | 重庆市, 四川省, 贵州省, 云南省, 西藏自治区 |
| 西北 | 陕西省, 甘肃省, 青海省, 宁夏回族自治区, 新疆维吾尔自治区 |


In some cases, users may want to know which areas a prefecture or province belongs.
`regioncode` offers a function to convert codes and names of the region into areas by setting the output format as "area":

```{r 2area}
regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 1989,
convert_to = "area")
```

## Linguistic Zone^[Thanks ZHU Meng's contribution to this function.]

China is a multilingual country with a variety of dialects.
These dialects may be used by several prefectures in a province or province.
Prefectures from different provinces may also share the same dialect.

For the convenience of political and sociolinguistic studies, `regioncode` includes a function to return approximate linguistic zones of the given geocodes or prefectural names.
In the current version, `regioncode` offers two levels of lignuistic zone identification, i.e., the dialect groups (`dia_group`) and dialect sub-groups (`dia_sub_group`), according to the language atlas

Note that, the linguistic distribution in China is too complex for precisely gauging at the prefectural level, not saying that they continually change along with the population dynamic.
The linguistic zone output from `regioncode` is thus at most for reference rather than rigorous linguistic research.

```{r language_zone}
regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 1989,
to_dialect = "dia_group")
regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 1989,
to_dialect = "dia_sub_group")
```

## City ranking

`regioncode` provides a feature to query the ranking of cities, which has two standards: the old criteria from 1989 and the new criteria from 2014.
Therefore, the old criteria is used for cities before 2014, while the new criteria is used for cities after 2014.
The *Statistical Yearbook of Urban and Rural Construction* divides Chinese cities into different levels from small cities to super cities, largely according to their populations [@GuoJiaTongJiJu2022a].
From 1989 to 2014, there were four levels of cities, and the system extend to a 7-level scale after 2014, as shown in the following table:

| criteria | population rank
|:-------------------|-----------------------|------------------|
Expand All @@ -284,16 +234,26 @@ Therefore, the old criteria is used for cities before 2014, while the new criter
| | 200,000 ~ 500,000 | I型小城市 |
| | <200,000 | II型小城市 |

`regioncode` provides a function to return the rank of the cities according to their populations of the given year.
The population data were collected from the official statistics.
If the population is not traceable, the rank will be marked as `NA`.
Users just need to set `convert_to = "rank"` to conduct the conversion.
For the regions in and before 1989, the old ranking system is applied.
For the rest region-year, the function will return the new ranks.
In the following example, we compare the ranks from the same input in different years.

```{r rank}
tidyr::tibble(
preference = corruption$prefecture,
rank = regioncode(data_input = corruption$prefecture,
year_from = 2011,
tibble(
city = corruption$prefecture,
rank1989 = regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 1989,
convert_to="rank")
convert_to="rank"),
rank2014 = regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 2014,
convert_to = "rank")
)
```

## Pinyin^[Thanks Liu Xueyan's contribution to this function.]
Expand Down Expand Up @@ -329,20 +289,89 @@ regioncode(data_input = corruption$prefecture,

## Provinces

`regioncode` allows conversions at not only the prefectural but provincial level.
By setting the argument `province = TRUE`, users can accomplish all the code, name, and area conversions at the provincial level.
(Note that, at the provincial level, the linguistic conversion can be only to dialect group.)
Moreover, since provinces have fixed abbreviations, `regioncode` allows names not only being, e.g., "宁夏" instead of "宁夏回族自治区" but also "宁".
When the inputs are abbreviations, users can set the `convert_to` argument to `abbreTocode`, `abbreToname`, or `abbreToarea`.
When they want provincial abbreviation outputs, just set `convert_to = "abbre"`.
`regioncode` enables conversions at not only the prefectural but also the provincial level.
By setting the argument `province = TRUE`, users can convert all the geocodes and names at the provincial level.
Chinese provinces have abbreviations.
When the converted data only have abbreviations, users can set the `convert_to` argument to `abbreTocode`, `abbreToname`, or `abbreToarea` to gain the data types they want.
When they want abbreviation outputs, just set `convert_to = "abbre"`.

In the following example, we convert a vector of province geocodes to their official names and abbreviations.

```{r provinces}
regioncode(data_input = corruption$province_id,
tibble(
province = corruption$province_id,
prov_name = regioncode(data_input = corruption$province_id,
convert_to = "name",
year_from = 2019,
year_to = 1989,
province = TRUE),
prov_abbre = regioncode(data_input = corruption$province_id,
convert_to = "codeToabbre",
year_from = 2019,
year_to = 1989,
province = TRUE)
)
```

## Geographic Units Beyond Provinces

The current version of `regioncode` includes two types of region conversion beyond the provincial level: administrative area and linguistic zones.

### Administrative Area

Due to social, political, and martial reasons, Chinese regions are divided into seven areas [@SunPing2020]:

| region | provincial-level administrative unit |
|:-------|----------------------------------------------------------------|
| 华北 | 北京市, 天津市, 山西省, 河北省, 内蒙古自治区 |
| 东北 | 黑龙江省, 吉林省, 辽宁省 |
| 华东 | 上海市, 江苏省, 浙江省, 安徽省, 福建省, 台湾省, 江西省, 山东省 |
| 华中 | 河南省, 湖北省, 湖南省 |
| 华南 | 广东省, 海南省, 广西壮族自治区, 香港特别行政区, 澳门特别行政区 |
| 西南 | 重庆市, 四川省, 贵州省, 云南省, 西藏自治区 |
| 西北 | 陕西省, 甘肃省, 青海省, 宁夏回族自治区, 新疆维吾尔自治区 |


In some cases, users may want to know which areas a prefecture or province belongs.
`regioncode` offers a function to convert codes and names of the region (both prefectures and provinces) into areas by setting the output format as "area":

```{r 2area}
regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 1989,
convert_to = "area")
```

### Linguistic Zone^[Thanks ZHU Meng's contribution to this function.]

China is a multilingual country with a variety of dialects.
These dialects may be used by several prefectures in a province or province.
Prefectures from different provinces may also share the same dialect.

For the convenience of political and sociolinguistic studies, `regioncode` includes a function to return approximate linguistic zones of the given geocodes or prefectural names.
In the current version, `regioncode` offers two levels of lignuistic zone identification, i.e., the dialect groups (`dia_group`, "方言大类") and dialect sub-groups (`dia_sub_group`, "分区片"), according to the 1987 language atlas of China [@LiEtAl1987].^[Adding the 2012 version is a project on the list [@LanguageInstitutionEtAl2012].]
(When `province = TRUE`, the linguistic conversion can be only to the dialect group level.)

In the following example, we convert the toy data to dialect groups and sub-groups:

```{r language_zone}
tibble(
city = corruption$prefecture,
dialectGroup = regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 1989,
to_dialect = "dia_group"),
dialectSubGroup = regioncode(data_input = corruption$prefecture,
year_from = 2019,
year_to = 1989,
to_dialect = "dia_sub_group")
)
```

Note that, the linguistic distribution in China is too complex for precisely gauging at the prefectural level, not saying that they continually change along with the population dynamic.
The linguistic zone output from `regioncode` is thus at most for reference rather than rigorous linguistic research.

## Conclusion

`regioncode` provides a convenient way to convert Chinese administrative division codes, official names, sociopolitical and linguistic areas, abbreviations, and so on between each other.
Expand Down
66 changes: 66 additions & 0 deletions vignettes/s_regioncode.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
@misc{GuoJiaTongJiJu2022,
title = {关于更新全国统计用区划代码和城乡划分代码的公告},
author = {{国家统计局}},
date = {2022},
url = {http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2022/index.html},
urldate = {2023-08-20},
langid = {chinese},
organization = {{中华人民共和国国家统计局}},
file = {D\:\\zotero_system\\storage\\3YF3PPKK\\index.html}
}

@book{GuoJiaTongJiJu2022a,
title = {中国统计年鉴2022(附光盘) 中国城乡建设统计年鉴2021(2022年新书)},
editor = {{国家统计局}},
date = {2022-09-30},
series = {中国统计年鉴},
publisher = {{中国统计出版社}},
url = {https://item.jd.com/10038568378953.html},
urldate = {2023-08-20},
isbn = {978-7-5037-9625-8},
langid = {chinese},
pagetotal = {945}
}

@book{LanguageInstitutionEtAl2012,
title = {The Language Atlas of China},
author = {Language Institution, Chinese Academy of Social Sciences and {Ethnicity} and Anthropology Istitution, Chinese Academy of Social Sciences and Research Center of Lanugage, City University of Hong Kong},
date = {2012},
edition = {2},
publisher = {{The Commercial Press}},
location = {{Beijing}},
owner = {Yue Hu},
timestamp = {2019-10-29T01:14:50Z}
}

@book{LiEtAl1987,
title = {The Language Atlas of China},
author = {Li, Rong and Xiong, Zhenghui and Zhang, Zhenxing},
date = {1987},
publisher = {{London: Longman}},
owner = {Yue Hu},
timestamp = {2019-10-29T01:14:50Z}
}

@misc{MinZhengBu2022,
title = {2021年中华人民共和国行政区划代码},
author = {{民政部}},
date = {2022},
url = {https://www.mca.gov.cn/n156/n186/c110745/content.html},
urldate = {2023-08-20},
langid = {chinese},
organization = {{中华人民共和国民政部}},
file = {D\:\\zotero_system\\storage\\3ALAWCLB\\content.html}
}

@article{SunPing2020,
entrysubtype = {newspaper},
title = {把握新时代行政区划优化设置的着力点 - 中华人民共和国民政部},
author = {{孙平}},
date = {2020-12-14},
journaltitle = {中国社会报},
url = {https://www.mca.gov.cn/n152/n166/c41451/content.html},
urldate = {2023-08-20},
langid = {chinese},
file = {D\:\\zotero_system\\storage\\7Y2BQQQB\\content.html}
}

0 comments on commit 3bbbcdd

Please sign in to comment.