forked from r-dbi/bigrquery
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
127 lines (90 loc) · 4.33 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# bigrquery
[![Build Status](https://travis-ci.org/r-dbi/bigrquery.svg?branch=master)](https://travis-ci.org/r-dbi/bigrquery)
[![CRAN Status](https://www.r-pkg.org/badges/version/bigrquery)](https://cran.r-project.org/package=bigrquery)
[![Coverage status](https://codecov.io/gh/r-dbi/bigrquery/branch/master/graph/badge.svg)](https://codecov.io/github/r-dbi/bigrquery?branch=master)
The bigrquery package makes it easy to work with data stored in
[Google BigQuery](https://developers.google.com/bigquery/) by allowing you to query BigQuery tables and retrieve metadata about your projects, datasets, tables, and jobs. The bigrquery package provides three levels of abstraction on top of BigQuery:
* The low-level API provides thin wrappers over the underlying REST API. All
the low-level functions start with `bq_`, and mostly have the form
`bq_noun_verb()`. This level of abstraction is most appropriate if you're
familiar with the REST API and you want do something not supported in the
higher-level APIs.
* The [DBI interface](http://www.r-dbi.org) wraps the low-level API and
makes working with BigQuery like working with any other database system.
This is most convenient layer if you want to execute SQL queries in
BigQuery or upload smaller amounts (i.e. <100 MB) of data.
* The [dplyr interface](http://dbplyr.tidyverse.org/) lets you treat BigQuery
tables as if they are in-memory data frames. This is the most convenient
layer if you don't want to write SQL, but instead want dbplyr to write it
for you.
## Installation
The current bigrquery release can be installed from CRAN:
```R
install.packages("bigrquery")
```
The newest development release can be installed from GitHub:
```R
# install.packages('devtools')
devtools::install_github("r-dbi/bigrquery")
```
## Usage
### Low-level API
```{r}
library(bigrquery)
billing <- bq_test_project() # replace this with your project ID
sql <- "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`"
tb <- bq_project_query(billing, sql)
bq_table_download(tb, max_results = 10)
```
## DBI
```{r, warning = FALSE}
library(DBI)
con <- dbConnect(
bigrquery::bigquery(),
project = "publicdata",
dataset = "samples",
billing = billing
)
con
dbListTables(con)
dbGetQuery(con, sql, n = 10)
```
### dplyr
```{r, message = FALSE}
library(dplyr)
natality <- tbl(con, "natality")
natality %>%
select(year, month, day, weight_pounds) %>%
head(10) %>%
collect()
```
## Important details
### Authentication
When using bigquery interactively, you'll be prompted to [authorize bigrquery](https://developers.google.com/bigquery/authorization) in the browser. Your credentials will be cached across sessions in `.httr-oauth`. For non-interactive usage, you'll need to download a service token JSON file and use `set_service_token()`.
Note that `bigrquery` requests permission to modify your data; but it will never do so unless you explicitly request it (e.g. by calling `bq_table_delete()` or `bq_table_upload()`).
### Billing project
If you just want to play around with the bigquery API, it's easiest to start with the Google's free [sample data](https://developers.google.com/bigquery/docs/sample-tables). You'll still need to create a project, but if you're just playing around, it's unlikely that you'll go over the free limit (1 TB of queries / 10 GB of storage).
To create a project:
1. Open https://console.cloud.google.com/ and create a project.
Make a note of the "Project ID" in the "Project info" box.
1. Click on "APIs & Services", then "Dashboard" in the left the left menu.
1. Click on "Enable Apis and Services" at the top of the page,
then search for "BigQuery API" and "Cloud storage".
Use your project ID as the `billing` project whenever you work with free sample data; and as the `project` when you work with your own data.
## Useful links
* [SQL reference](https://developers.google.com/bigquery/query-reference)
* [API reference](https://developers.google.com/bigquery/docs/reference/v2/)
* [Query/job console](https://bigquery.cloud.google.com/)
* [Billing console](https://console.cloud.google.com/)