Skip to content

Reading .rds file by R

Matias Samuel Miranda edited this page Oct 20, 2019 · 1 revision

In consequence R can be the most used language by data analytics & statistics, that's way we will find analytics dataset in rds format (native dataset format in R). When we want migrate this information to python we meet with a issue, that's rds format is not native for python. Python provide a package "RPY2"

"The high-level interface in rpy2 is designed to facilitate the use of R by Python programmers. R objects are exposed as instances of Python-implemented classes, with R functions as bound methods to those objects in a number of cases."

Get Started

Official Site

We would like to install the rpy2 package but we need to install dependencies that rpy require.

Installation Guide

Importing package

import rpy2
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
from rpy2.robjects.packages import importr
import pandas as pd

Get instance

readRDS = robjects.r['readRDS']

Using Bioconductor Library

Sometimes the dataset need another dependencies for utilize it in this case limma by bioconductor

utils = importr('utils')
base = importr('base')
utils.install_packages('BiocManager')

biocmanager = importr('BiocManager')
biocmanager.install('limma')
limma = importr('limma')

Reading .rds dataset

We get a DataFrame with the data. rpy2 will take care transforming R data structure to python data structure and conserving your consistency.

df = readRDS('./data/celllines.rds')

with localconverter(robjects.default_converter + pandas2ri.converter):
    df = robjects.conversion.rpy2py(df)