The vtable
package serves the purpose of outputting automatic variable documentation that can be easily viewed while continuing to work with data.
vtable
contains three functions: vtable()
, labeltable()
, and dftoHTML()
. vtable()
takes a dataset and outputs a formatted variable documentation file. This serves several purposes.
First, it allows for an easy generation of a variable documentation file, without requiring that one has already been created and made accessible through help(data)
, or dealing with creating and finding R help documentation files.
Second, it produces a list of variables (and, if provided, their labels) that can be easily viewed while working with the data, preventing repeated calls to head()
, and making it much easier to work with confusingly-named variables.
Third, the variable documentation file can be opened in a browser (with option out='browser'
, saving to file and opening directly, or by opening in the RStudio Viewer pane and clicking ‘Show in New Window’) where it can be easily searched with standard Find-in-Page functions like Ctrl/Cmd-F, allowsing you to search for the variable or variable label you want.
labeltable()
is designed to take a single variable and show the values it is associated with. This can also be used to generate data documentation if desired, or can just be an easy way to look at label values, or learn more about the data you’re working with.
If that variable has value labels from the sjlabelled
or haven
packages, it will show how the values in the data correspond to the value labels.
Alternately, you can include other variables as well, and labeltable()
will show, for each value of the variable you’re interested in, the values that those other variables take. This can be handy, for example, if you used some variables to create a numeric ID and want to remember what original values correspond to each ID. It can also act as sort of a cross-tabulation.
dftoHTML()
is a helper function used by vtable()
and labeltable()
. It takes any data frame or matrix with column names and outputs HTML table code for that data.
The vtable
package can be installed from CRAN using
install.packages('vtable')
Or you can download the most recent version from GitHub using the devtools
package:
library(devtools)
install_github('NickCH-K/vtable')
library(vtable)
A video introduction to vtable
can be found here.
vtable()
functionvtable()
syntax follows the following outline:
vtable(data,
out=NA,
file=NA,
labels=NA,
class=TRUE,
values=TRUE,
missing=FALSE,
index=FALSE,
factor.limit=5,
char.values=FALSE,
slow.ok=FALSE,
data.title=NA,
desc=NA,
col.width=NA,
summ=NA)
The goal of vtable()
is to take a data set data
and output an HTML file with documentation concerning each of the variables in `data’. There are several options as to what will be included in the documentation file, and each of these options are explained below. Throughout, code examples are shown in iframes:
data
The data
argument can take any data frame, data table, tibble, or matrix, as long as it has a valid set of variable names stored in the colnames()
attribute. The goals of vtable()
is to produce documentation of each of the variables in this data set and display that documentation, one variable per row on the output vtable
.
If data
has embedded variable or value labels, as the data set efc
does below, vtable()
will extract and use them automatically.
library(vtable) #Example 1, using base data LifeCycleSavings data(LifeCycleSavings) vtable(LifeCycleSavings)
#Example 2, using constructed data frame df <- data.frame(var1 = 1:4, var2 = c('A','B','C','D')) vtable(df)
#Example 3, using matrix with column names matrix <- as.matrix(df) vtable(df)
#Example 4, using efc data with embedded variable labels library(sjlabelled) data(efc) vtable(efc)
out
The out
option determines what will be done with the resulting variable documentation file. There are several options for out
:
Option | Result |
---|---|
browser | Loads variable documentation in web browser. |
viewer | Loads variable documentation in Viewer pane (RStudio only). |
htmlreturn | Returns HTML code for variable documentation file. |
return | Returns variable documentation table in data frame format. |
By default, vtable
will select ‘viewer’ if running in RStudio, and ‘browser’ otherwise.
library(vtable)
data(LifeCycleSavings)
vtable(LifeCycleSavings)
vtable(LifeCycleSavings,out='browser')
vtable(LifeCycleSavings,out='viewer')
htmlcode <- vtable(LifeCycleSavings,out='htmlreturn')
vartable <- vtable(LifeCycleSavings,out='return')
file
The file
argument will write the variable documentation file to an HTML file and save it. Will automatically append ‘html’ filetype if the filename does not include a period.
library(vtable)
data(LifeCycleSavings)
vtable(LifeCycleSavings,file='lifecycle_variabledocumentation')
labels
The labels
argument will attach variable labels to the variables in data
. If variable labels are embedded in data
and those labels are what you want, the labels
argument is unnecessary. Set labels='omit'
if there are embedded labels but you do not want them in the table.
labels
can be used in any one of three formats.
labels
as a vectorlabels
can be set to be a vector of equal length to the number of variables in data
, and in the same order. NA
values can be used for padding if some variables do not have labels.
library(vtable) #Note that LifeCycleSavings has five variables data(LifeCycleSavings) #These variable labels are taken from help(LifeCycleSavings) labs <- c('numeric aggregate personal savings', 'numeric % of population under 15', 'numeric % of population over 75', 'numeric real per-capita disposable income', 'numeric % growth rate of dpi') vtable(LifeCycleSavings,labels=labs)
labs <- c('numeric aggregate personal savings',NA,NA,NA,NA) vtable(LifeCycleSavings,labels=labs)
labels
as a two-column data setlabels
can be set to a two-column data set (any type will do) where the first column has the variable names, and the second column has the labels. The column names don’t matter.
This approach does not require that every variable name in data
has a matching label.
library(vtable) #Note that LifeCycleSavings has five variables #with names 'sr', 'pop15', 'pop75', 'dpi', and 'ddpi' data(LifeCycleSavings) #These variable labels are taken from help(LifeCycleSavings) labs <- data.frame(nonsensename1 = c('sr', 'pop15', 'pop75'), nonsensename2 = c('numeric aggregate personal savings', 'numeric % of population under 15', 'numeric % of population over 75')) vtable(LifeCycleSavings,labels=labs)
labs <- as.matrix(labs) vtable(LifeCycleSavings,labels=labs)
labels
as a one-row data setlabels
can be set to a one-row data set in which the column names are the variable names in data
and the first row is the variable names. The labels
argument can take any data type including data frame, data table, tibble, or matrix, as long as it has a valid set of variable names stored in the colnames()
attribute.
This approach does not require that every variable name in data
has a matching label.
library(vtable) #Note that LifeCycleSavings has five variables #with names 'sr', 'pop15', 'pop75', 'dpi', and 'ddpi' data(LifeCycleSavings) #These variable labels are taken from help(LifeCycleSavings) labs <- data.frame(sr = 'numeric aggregate personal savings', pop15 = 'numeric % of population under 15', pop75 = 'numeric % of population over 75') vtable(LifeCycleSavings,labels=labs)
labs <- as.matrix(labs) vtable(LifeCycleSavings,labels=labs)
class
The class
flag will either report or not report the class of each variable in the resulting variable table. By default this is set to TRUE
.
library(vtable) data(LifeCycleSavings) vtable(LifeCycleSavings)
vtable(LifeCycleSavings,class=FALSE)
values
The values
flag will either report or not report the values that each variable takes. Numeric variables will report a range, logicals will report ‘TRUE FALSE’, and factor variables will report the first factor.limit
(default 5) factors listed.
If the variable is numeric but has value labels applied by the sjlabelled
package, vtable()
will find them and report the numeric-label crosswalk. This requires sjlabelled
to be loaded.
library(vtable) data(LifeCycleSavings) vtable(LifeCycleSavings)
vtable(LifeCycleSavings,values=FALSE)
#CO2 contains factor variables data(CO2) vtable(CO2)
#efc contains labeled values #Note that the original value labels do not tell easily you what numerical #value each label maps to, but vtable() does. library(sjlabelled) data(efc) vtable(efc)
missing
The missing
flag, set to TRUE, will report whether or not the variable has any missing values. Defaults to FALSE.
library(vtable)
data(LifeCycleSavings)
LifeCycleSavings$sr[1] <- NA
vtable(LifeCycleSavings,missing=TRUE)
index
The index
flag will either report or not report the index number of each variable. Defaults to FALSE.
library(vtable)
data(LifeCycleSavings)
vtable(LifeCycleSavings,index=TRUE)
factor.limit
If values
is set to TRUE
, then factor.limit
limits the number of factors displayed on the variable table. factor.limit
is by default 5, to cut down on clutter. The table will include the phrase “and more” to indicate that some factors have been cut off.
Setting factor.limit=0
will include all factors. If values=FALSE
, factor.limit
does nothing.
library(vtable) #CO2 contains factor variables data(CO2) vtable(CO2)
vtable(CO2,factor.limit=1)
vtable(CO2,factor.limit=0)
char.values
If values
is set to TRUE
, then char.values = TRUE
instructs vtable
to list the values that character variables take, as though they were factors. If you only want some of the character variables to have their values listed, use a character vector to indicate which variables.
library(vtable) data(USJudgeRatings) USJudgeRatings$Judge <- row.names(USJudgeRatings) USJudgeRatings$SecondCharacter <- 'Less Interesting' USJudgeRatings$ThirdCharacter <- 'Less Interesting Still!' vtable(USJudgeRatings,char.values=TRUE)
vtable(USJudgeRatings,char.values=c('Judge','SecondCharacter')
slow.ok
If the data contains labelled values, but the labels don’t line up properly with the actual values in the data, that’s a problem for vtable
as it tries to match up data values with labels.
vtable
has a fix for this but it’s very slow, on the order of a few minutes for larger data sets.
So in this case, by default (slow.ok = FALSE
), the Values column will not attempt to link values to labels, and will just show the label for labelled values and raw data for unlabelled values.
Set slow.ok = TRUE
to instead get the correct result, slowly.
To avoid this being an issue at all, ensure that labels match values exactly before calling vtable
, perhaps by running sjlabelled::drop_labels()
and sjlabelled::fill_labels()
.
data.title
data.title
will include a data title in the variable documentation file. If not set manually, this will default to the variable name for data
.
library(vtable) data(LifeCycleSavings) vtable(LifeCycleSavings)
vtable(LifeCycleSavings,data.title='Intercountry Life-Cycle Savings Data')
desc
desc
will include a description of the data set in the variable documentation file. This will by default include information on the number of observations and the number of columns. To remove this, set desc='omit'
, or include any description and then include ‘omit’ as the last four characters.
library(vtable) data(LifeCycleSavings) vtable(LifeCycleSavings)
vtable(LifeCycleSavings,data.title='Intercountry Life-Cycle Savings Data', desc='Data on the savings ratio 1960–1970.')
vtable(LifeCycleSavings,data.title='Intercountry Life-Cycle Savings Data', desc='omit')
vtable(LifeCycleSavings,data.title='Intercountry Life-Cycle Savings Data', desc='Data on the savings ratio 1960–1970. omit')
col.width
vtable()
will select default column widths for the variable table depending on which measures (name, class, label, values, summ)
are included. col.width
, as a vector of percentage column widths on the 0-100 scale, will override these defaults.
library(vtable) library(sjlabelled) data(efc) vtable(efc)
#The variable names in this data set are pretty short, and the value labels are #a little cramped, so let's move that over. vtable(efc,col.width=c(10,10,40,40))
summ
summ
will calculate summary statistics for each numeric and logical variable. summ
is very flexible. It takes a character vector in which each element is of the form function(x)
, where function(x)
is any function that takes a vector and returns a single numeric value. For example, summ=c('mean(x)','median(x)','mean(log(x))')
would calculate the mean, median, and mean of the log for each variable.
summ
also takes two functions that are not R standards: propNA(x)
and countNA(x)
, which give the proportion and count of NA values in the variable, respectively. These two functions are calculated for all variables, not just numeric and logical ones.
library(vtable)
library(sjlabelled)
data(efc)
vtable(efc,summ=c('mean(x)','countNA(x)'))
labeltable()
functionvtable()
syntax follows the following outline:
labeltable(var,
...,
out=NA,
file=NA,
desc=NA)
labeltable()
is a function that shows the values that correspond to var
. This could be value label values, or it could be the values found in the data for the ...
variables.
#Include a single labelled variable to show how the values of that variable correspond to its value labels.
library(sjlabelled)
data(efc)
labeltable(efc$e15relat)
#Include more than one variable to show, for each value of the first, what values of the others are present in the data.
data(mtcars)
labeltable(mtcars$cyl,mtcars$carb,mtcars$am)
out
The out
option determines what will be done with the resulting label table file. There are several options for out
:
Option | Result |
---|---|
browser | Loads variable documentation in web browser. |
viewer | Loads variable documentation in Viewer pane (RStudio only). |
htmlreturn | Returns HTML code for variable documentation file. |
return | Returns variable documentation table in data frame format. |
file
The file
argument will write the variable documentation file to an HTML file and save it. Will automatically append ‘html’ filetype if the filename does not include a period.
library(vtable)
library(sjlabelled)
data(efc)
labeltable(efc$e15relat,file='e15relat_values')
desc
desc
will include a description of the data set in the variable documentation file, which may be useful for documentation purposes.
dftoHTML()
functiondftoHTML()
syntax follows the following outline:
dftoHTML(data,out=NA,file=NA,col.width=NA,row.names=FALSE)
dftoHTML()
largely exists to serve vtable()
. What it does is takes a data set data
and returns an HTML table with the contents of that data.
Outside of its use in vtable()
, dftoHTML()
can also be used to keep a view of the data file open while working on the data, avoiding repeated calls to head()
or similar, or switching back and forth between code tabs and data view tabs.
data
dftoHTML()
will accept any data set with a colnames()
attribute.
library(vtable)
data(LifeCycleSavings)
dftoHTML(LifeCycleSavings)
The out
option determines what will be done with the resulting variable documentation file. There are several options for out
:
Option | Result |
---|---|
browser | Loads HTML version of data in web browser. |
viewer | Loads HTML version of data in Viewer pane (RStudio only). |
htmlreturn | Returns HTML code for data . |
By default, vtable
will select ‘viewer’ if running in RStudio, and ‘browser’ otherwise.
library(vtable)
data(LifeCycleSavings)
dftoHTML(LifeCycleSavings)
dftoHTML(LifeCycleSavings,out='browser')
dftoHTML(LifeCycleSavings,out='viewer')
htmlcode <- dftoHTML(LifeCycleSavings,out='htmlreturn')
file
The file
argument will write the HTML version of data
to an HTML file and save it. Will automatically append ‘html’ filetype if the filename does not include a period.
library(vtable)
data(LifeCycleSavings)
dftoHTML(LifeCycleSavings,file='lifecycledata_htmlversion.html')
col.width
dftoHTML()
will select, by default, equal column widths for all columns in data'.
col.width`, as a vector of percentage column widths on the 0-100 scale, will override these defaults.
library(vtable)
data(LifeCycleSavings)
dftoHTML(LifeCycleSavings)
#Let's make sr much bigger for some reason
dftoHTML(LifeCycleSavings,col.width=c(60,10,10,10,10))
row.names
The row.names
flag determines whether the row names of the data are included as the first column in the output table.
library(vtable)
data(LifeCycleSavings)
dftoHTML(LifeCycleSavings,row.names=TRUE)