library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(meantables)
Table of contents:
Univariate means and 95% confidence intervals
Bivariate means and 95% confidence intervals
In this example, we will calculate the overall mean and 95% confidence interval for the variable mpg in the mtcars data set.
By default, only the n, mean, and 95% confidence interval for the mean are returned. Additionally, the values of all the returned statistics are rounded to the hundredths place. These are the numerical summaries of the data that I am most frequently interested in. Additionally, I rarely need the precision of the estimates to be any greater than the hundredths place.
The confidence intervals are calculated as:
$$ {\bar{x} \pm t_{(1-\alpha / 2, n-1)}} \frac{s}{\sqrt{n}} $$
This matches the method used by SAS: http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p0klmrp4k89pz0n1p72t0clpavyx.htm
mtcars %>%
mean_table(mpg)
#> # A tibble: 1 × 9
#> response_var n mean sd sem lcl ucl min max
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 mpg 32 20.1 6.03 1.07 17.9 22.3 10.4 33.9
By adjusting the t_prob
parameter, it is possible to
change the width of the confidence intervals. The example below returns
a 99% confidence interval.
The value for t_prob is calculated as 1 - alpha / 2.
alpha <- 1 - .99
t <- 1 - alpha / 2
mtcars %>%
mean_table(mpg, t_prob = t)
#> # A tibble: 1 × 9
#> response_var n mean sd sem lcl ucl min max
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 mpg 32 20.1 6.03 1.07 17.2 23.0 10.4 33.9
With the output = "all"
option, mean_table also returns
the number of missing values, the critical value from student’s t
distribution with degrees of freedom n - 1, and the standard error of
the mean.
We can also control the precision of the statistics using the
digits
parameter.
mtcars %>%
mean_table(mpg, output = "all", digits = 5)
#> # A tibble: 1 × 11
#> response_var n_miss n mean sd t_crit sem lcl ucl min max
#> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 mpg 0 32 20.1 6.03 2.04 1.07 17.9 22.3 10.4 33.9
This output matches the results obtained from SAS proc means and the Stata mean command (shown below).
Finally, the object returned by mean_table
is given the
class mean_table
when the data frame passed to the
.data
argument is an ungrouped tibble.
The methods used to calculate bivariate means and confidence are
identical to those used to calculate univariate means and confidence
intervals. Additionally, all of the options shown above work identically
for bivariate analysis. In order to estimate bivariate (subgroup) means
and confidence intervals over levels of a categorical variable, the
.data
argument to mean_table
should be a
grouped tibble created with dplyr::group_by
. Everything
else should “just work.”
The object returned by mean_table
is given the class
mean_table_grouped
when the data frame passed to the
.data
argument is a grouped tibble (i.e.,
grouped_df
).
mtcars %>%
group_by(cyl) %>%
mean_table(mpg, output = "all", digits = 5)
#> # A tibble: 3 × 13
#> response_var group_var group_cat n_miss n mean sd t_crit sem lcl
#> <chr> <chr> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 mpg cyl 4 0 11 26.7 4.51 2.23 1.36 23.6
#> 2 mpg cyl 6 0 7 19.7 1.45 2.45 0.549 18.4
#> 3 mpg cyl 8 0 14 15.1 2.56 2.16 0.684 13.6
#> # ℹ 3 more variables: ucl <dbl>, min <dbl>, max <dbl>
For comparison, here is the output from SAS proc means and the Stata mean command.
The method used by Stata to calculate subpopulation means and confidence intervals is available here: https://www.stata.com/manuals13/rmean.pdf