-
-
Notifications
You must be signed in to change notification settings - Fork 39
/
50.1-Appendix.Rmd
211 lines (134 loc) · 7.25 KB
/
50.1-Appendix.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# Appendix
## Git
[Cheat Sheet](https://training.github.com/downloads/github-git-cheat-sheet.pdf)
[Cheat Sheet in different languages](https://training.github.com/)
[Learn Git](http://try.github.io/)
[Interactive Cheat Sheet](http://ndpsoftware.com/git-cheatsheet.html#loc=remote_repo;)
[Ultimate Guide of Git and GitHub for R user](https://happygitwithr.com/)
- Setting up Git: `git config` with `--global` option to configure user name, email, editor, etc.
- Creating a repository: `git init` to initialize a repo. Git stores all of its repo data in the `.git` directory.
- Tracking changes:
- `git status` shows the status of the repo
- File are stored in the project's working directory (which users see)
- The staging area (where the next commit is being built)
- local repo is where commits are permanently recorded
- `git add` put files in the staging area
- `git commit` saves the staged content as a new commit in the local repo.
- `git commit -m "your own message"` to give a messages for the purpose of your commit.
- History
- `git diff` shows differences between commits
- `git checkout` recovers old version of fields
- `git checkout HEAD` to go to the last commit
- `git checkout <unique ID of your commit>` to go to such commit
- Ignoring
- `.gitignore` file tells Git what files to ignore
- `cat . gitignore *.dat results/` ignore files ending with "dat" and folder "results".
- Remotes in GitHub
- A local git repo can be connected to one or more remote repos.
- Use the HTTPS protocol to connect to remote repos
- `git push` copies changes from a local repo to a remote repo
- `git pull` copies changes from a remote repo to a local repo
- Collaborating
- `git clone` copies remote repo to create a local repo with a remote called `origin` automatically set up
- Branching
- `git check - b <new-branch-name`
- `git checkout master` to switch to master branch.
- Conflicts
- occur when 2 or more people change the same lines of the same file
- the version control system does not allow to overwrite each other's changes blindly, but highlights conflicts so that they can be resolved.
- Licensing
- People who incorporate General Public License (GPL'd) software into their won software must make their software also open under the GPL license; most other open licenses do not require this.
- The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing and commercialization.
- Citation:
- Add a CITATION file to a repo to explain how you want others to cite your work.
- Hosting
- Rules regarding intellectual property and storage of sensitive info apply no matter where code and data are hosted.
## Short-cut
These are shortcuts that you probably you remember when working with R. Even though it might take a bit of time to learn and use them as your second nature, but they will save you a lot of time.\
Just like learning another language, the more you speak and practice it, the more comfortable you are speaking it.\
| function | short-cut |
|--------------------------------------------------|---------------------------|
| navigate folders in console | `" " + tab` |
| pull up short-cut cheat sheet | `ctrl + shift + k` |
| go to file/function (everything in your project) | `ctrl + .` |
| search everything | `cmd + shift + f` |
| navigate between tabs | `Crtl + shift + .` |
| type function faster | `snip + shift + tab` |
| type faster | `use tab for fuzzy match` |
| `cmd + up` | |
| `ctrl + .` | |
Sometimes you can't stage a folder because it's too large. In such case, use `Terminal` pane in Rstudio then type `git add -A` to stage all changes then commit and push like usual.
## Function short-cut
apply one function to your data to create a new variable: `mutate(mod=map(data,function))`\
instead of using `i in 1:length(object)`: `for (i in seq_along(object))`\
apply multiple function: `map_dbl`\
apply multiple function to multiple variables:`map2`\
`autoplot(data)` plot times series data\
`mod_tidy = linear(reg) %>% set_engine('lm') %>% fit(price ~ ., data=data)` fit lm model. It could also fit other models (stan, spark, glmnet, keras)
- Sometimes, data-masking will not be able to recognize whether you're calling from environment or data variables. To bypass this, we use `.data$variable` or `.env$variable`. For example `data %>% mutate(x=.env$variable/.data$variable`\
- Problems with data-masking:\
+ Unexpected masking by data-var: Use `.data` and `.env` to disambiguate\
+ Data-var cant get through:\
+ Tunnel data-var with {{}} + Subset `.data` with [[]]
- Passing Data-variables through arguments
```{r eval=FALSE}
library("dplyr")
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{{var}}":=mean({{var}})) # new name for each var will be created by tunnel data-var inside strings
}
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{var}":=mean({{var}})) # use single {} to glue the string, but hard to reuse code in functions
}
```
- Trouble with selection:\
```{r eval=FALSE}
library("purrr")
name <- c("mass","height")
starwars %>% select(name) # Data-var. Here you are referring to variable named "name"
starwars %>% select(all_of((name))) # use all_of() to disambiguate when
averages <- function(data,vars){ # take character vectors with all_of()
data %>%
select(all_of(vars)) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
# Another way
averages <- function(data,vars){ # Tunnel selectiosn with {{}}
data %>%
select({{vars}}) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
```
## Citation
include a citation by `[@Farjam_2015]`
cite packages used in this session
`package=ls(sessionInfo()$loadedOnly) for (i in package){print(toBibtex(citation(i)))}`
```{r, eval=FALSE}
package=ls(sessionInfo()$loadedOnly)
for (i in package){
print(toBibtex(citation(i)))
}
```
## Install all necessary packages/libaries on your local machine
Get a list of packages you need to install from this book (or your local device)
```{r}
installed <- as.data.frame(installed.packages())
head(installed)
write.csv(installed, file.path(getwd(),'installed.csv'))
```
After having the `installed.csv` file on your new or local machine, you can just install the list of packages
```{r, eval = FALSE}
# import the list of packages
installed <- read.csv('installed.csv')
# get the list of packages that you have on your device
baseR <- as.data.frame(installed.packages())
# install only those that you don't have
install.packages(setdiff(installed, baseR))
```