Skip to content

Commit

Permalink
Initial arxiv submission.
Browse files Browse the repository at this point in the history
  • Loading branch information
icmccorm committed Apr 2, 2024
0 parents commit 5977e1b
Show file tree
Hide file tree
Showing 94 changed files with 14,391 additions and 0 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.DS_Store
__pycache__
archives
sample.*
choices.*
.vscode
output.csv
sandbox.r
build
renv
.Rprofile
irr/*/*.tex
irr/*/alpha.csv
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "against-the-void-tex"]
path = against-the-void-tex
url = https://github.com/icmccorm/against-the-void-tex.git
11 changes: 11 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM rocker/verse:4.3.1 AS base
WORKDIR /usr/src/void
COPY . .
FROM base as setup
RUN apt update && apt upgrade -y
RUN R -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))"
FROM setup AS renv
ENV RENV_PATHS_LIBRARY renv/library
RUN R -e "renv::restore()"
FROM setup as build
RUN make
65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# “Against the Void" - Replication Package

This repository is a replication package for the paper '“Against the Void”: An Interview and Survey Study of How Rust Developers Use Unsafe Code.' All direct (e.g. names, institutions) and indirect (positions, projects) identifying information from participants has been redacted following procedures approved by our institution's IRB.

This repository contains the following
```
- data
|- interviews // raw interview transcripts
|- community_survey // survey responses
|- screening_survey // survey responses
|- irr // codes and coding decisions
- scripts // R and Python scripts used to calculate IRR and to compile survey data
```

The appendix of our paper is provided in `appendix.pdf`. It contains three sections. The first section includes the questions on our screening survey and our interview protocol. The second section contains the results from conducting 7 rounds of IRR, including intermediate codebooks used for each round. We used IRR as a mechanism for refining our codebook. The third section contains the full text of our community survey and response counts for each question. Recruitment materials are provided in `recruitment.pdf`. This contains all emails and social media posts used throughout each stage of the investigation.

## Building
Executing `make build` will compile our raw data into the tables and figures present in our paper and its appendix. This script requires R version 4.3.1 and Python 3. Alternatively, if you are running Docker on an x86 system, you can create an image with a build of our project by executing `docker build .` from the root directory. This image includes all necessary dependencies and builds the project automatically as its final step.

## Data
Raw data is located within the `data` directory. This contains all anonymized interview transcripts, responses to our screening and community surveys, and IRR scores.

### Interviews
```
- data
|- interviews
|- coding // codebooks exported from ATLAS.ti
|- codebook.csv // individual codes and their definitions
|- decisions.csv // coding decisions
|- transcripts // markdown and .PDF interview transcripts exported from ATLAS.ti
```
We provide a full export of our ATLAS.ti project as well as unencoded copies of the raw interview transcripts and codebooks.
Each participant is identified by a unique ID ranging from 1 to 19.

### Surveys
```
- data
|- community_survey
|- coding
|- unsafe_api.csv // coded open-ended responses for WUQ1
|- unsafe.csv // coded open-ended responses for EUAQ1
|- data.csv // individual survey responses
|- questions.csv // mapping from question IDs to question text
|- sections // mapping from section IDs to section name
|- screening_survey
|- data.csv // individual survey responses
|- questions.csv // mapping from question IDs to question text.
```
Each question in the screening and community surveys is identified by a unique ID. Questions in the community survey were grouped into sections. Each question ID contains a prefix identifying its section. For example, "WUQ1" is the 1st question in section
"WUQ", which contains questions about participants' motivations for using unsafe code.

### IRR
```
- data
|- irr
|- 1
|- data.csv // codes selected for each quote by each author
|- output.md // individual quotes coded in this sample
|- survey.csv // codes presented for each user to select
|- 2
|- ...
|- code_mapping.csv
|- theme_mapping.csv
```
To refine our codebook, we conducted 7 rounds of coding random samples of quotes and we calculated interrater reliability after each round. Coding decisions were made using an online form. Data for each round are contained in the `irr` directory in subdirectories numbered 1 through 7. Each subdirectory contains 3 files. The file `survey.csv` contains the codes presented by the form for each coder to select during that round, while `data.csv` contains the selections made by each coder for each quote in `output.md`.
1 change: 1 addition & 0 deletions against-the-void-tex
Submodule against-the-void-tex added at fafb9a
Binary file added appendix.pdf
Binary file not shown.
21 changes: 21 additions & 0 deletions data/community_survey/coding/unsafe.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Code,Reason
No Other Choice,"Typically, I try to only ever use unsafe for calling into FFI. That has no safe alternative at all."
No Other Choice,It was the only option as I was doing FFI
No Other Choice,An hardware-related implementation that couldn’t be safely exposed
Feature,FFI
No Other Choice,"Certain low level things are not possible without unsafe, regardless of ease-of-use or performance."
Documentation,"I have marked functions as unsafe for reasons that don't traditionally require unsafe, if they can cause undesired behaviour"
"Ergonomics, Experimentation",Unsafe more ergonomic and intuitive but also to practice unsafe
"Feature, No Other Choice","C FFI, especially syscalls - impossible without unsafe"
"Ergonomics, No Other Choice",It was either related to unsafe apis like window creation and once I used it to get around a lifetime qurik in a crate I used
Feature,C FFI
"Ergonomics, Experimentation","When porting code I am not completely familiar with from C to Rust, I find it can be helpful to do a 1:1 unsafe conversion first. During this process I gain a better understanding of how data is passed through the program and I can convert parts to safe Rust as I go."
Feature,CUDA and FFI
"Documentation, Ergonomics, No Other Choice","I sometimes use unsafe as a marker, voluntarily, when the caller/implementer needs to avoid unusual invalid situations that can't really be checked. I could omit the unsafe marker... but it would be less safe :) Similarly, in Rust std::fs::File is safe, yet you can open /proc/self/mem and do unsafe things with it. unsafe is a courtesy marker, when it makes sense"
No Other Choice,Needing to maintain an existing C ABI
No Other Choice,"I was wrapping FFI functions, e.g. on bare metal (Commander X16) or creating a static library that was linked into a C++ application"
Feature,In one word: FFI
Performance,Duplicating safe functions with less checks wrt data encodings for performance
Uncertain,Sometimes there are crates that help with e.g. JNI but they request to buy into their artitectural choises.
Performance,unsafe is faster _and_ the proram is an optimization game entry
"Feature, Performance","FFI reasons, and uninit memory for performance reasons requires `unsafe`"
6 changes: 6 additions & 0 deletions data/community_survey/coding/unsafe_api.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Code,Response
Ergonomics,"Some APIs are fundamentally unsafe, eg FFI."
Ergonomics,"To provide flexibility for niche use cases, expecting most people won't use it. Strong warnings given and invariants/requirements documented."
Feature,"To indicate that there are preconditions for calling the API that would, if unfulfilled, result in unexpected behaviour"
No Other Choice,Interfacing with CUDA requires unsafe function calls and caller device pointers
No Other Choice,"Maintaining an existing C ABI, and in general, providing FFI APIs"
Loading

0 comments on commit 5977e1b

Please sign in to comment.