Stata program that automates the generation of exploratory data analysis reports. The program classifies variables as categorical/continuous variables and uses this information to define what types of graphs and tables to use.
To install eda
use the following command from Stata:
net inst eda, from(http://wbuchanan.github.io/eda)
Since Exploratory Data Analysis can take a substantial amount of time in addition to the time needed to clean/prep data, this is intended to be used as a program that would be called at the end of the workday/overnight to produce permutations of univariate and bivariate visualizations and tables. Then instead of spending time coding myriad possible combinations of variables to examine, a researcher could browse through a PDF generated through LaTeX while the computer does the work of compiling the results for them.
You need a dataset open in memory to use eda
. There are two required options, output
for the naming of the output files
and root
to tell eda
where to store the output.
sysuse auto // load the data
eda, o("eda-report") root("./") // use current working directory
Alternatively, you can restrict the variables to use for eda
with a varlist:
clear
sysuse auto
eda price mpg weight, o("eda-report-small") root("./")
This program requires a few other user-written programs to execute:
tuples
spineplot
estout
brewscheme
You can install these dependencies using:
ssc install tuples
ssc install spineplot
ssc install estout
net install brewscheme, from("https://wbuchanan.github.io/brewscheme")
You can find information about these packages using:
ssc d tuples
ssc d spineplot
ssc d estout