Basic Concepts

Paired-sample t-test (also called, dependent-sample t-test) compares means between two groups of observations that are NOT independent. For example, when you collected pre-test and post-test scores from the same group of students, the pre-test and post-test scores came from the same person (i.e., a within-subject design). Individuals with high pre-test scores were also likely to get high post-test scores, and those with lower pre-test scores were likely to have low post-test scores. In other words, the pre-test and post-test scores were correlated; they were NOT independent. Therefore, the assumption of independence observation was violated. We cannot just compare the means of pre- and post-test. Instead, we need to pair up the pre- and post-test scores and calcualte a difference (D) score for each person.

Let’s consider an example

p_id <- c(1:10) # participant's ID
pre <-  c(3, 5, 8,  8, 6, 9, 2, 3,  7, 5) #pre-test scores
post <- c(5, 8, 9, 10, 5, 9, 4, 2, 10, 7) #post-test scores
dat <- data.frame(p_id, pre, post) # create a data frame
dat
##    p_id pre post
## 1     1   3    5
## 2     2   5    8
## 3     3   8    9
## 4     4   8   10
## 5     5   6    5
## 6     6   9    9
## 7     7   2    4
## 8     8   3    2
## 9     9   7   10
## 10   10   5    7

We calculate the difference score (D = post - pre) for each person.

dat$D <- dat$post - dat$pre # calculate D score for each participant
dat
##    p_id pre post  D
## 1     1   3    5  2
## 2     2   5    8  3
## 3     3   8    9  1
## 4     4   8   10  2
## 5     5   6    5 -1
## 6     6   9    9  0
## 7     7   2    4  2
## 8     8   3    2 -1
## 9     9   7   10  3
## 10   10   5    7  2
library(psych)
describe(dat$D)
##    vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 10  1.3 1.49      2    1.38 1.48  -1   3     4 -0.46    -1.47 0.47

On average, the scores went up by 1.3 points (SD = 1.4944341). We would like to test whether this score increase was statistically significant. The null hypothesis for this test was that pre- and post-test were not different, i.e., pre - post = D = 0. Therefore, \[H_0: \mu_d = 0\].

At this point, we only need to test one variable, D, whether it is different from zero. Therefore, we can use the formula \[\begin{aligned} t &= \frac{\bar{D} - \mu_d}{SE_D} \\ &= \frac{\bar{D} - 0}{SE_D} \\ &=\frac{\bar{D}}{SE_D} \end{aligned}\].

Manual Calculation

We will need \(\bar{D}\) and its standard error. Recall that \(SE = SD/\sqrt{N}\)

n <- nrow(dat) # get sample size N
n
## [1] 10
se_D <- sd(dat$D)/sqrt(n) # calculate SE
se_D
## [1] 0.4725816
t <- mean(dat$D)/se_D # calculate t
t
## [1] 2.750848

Now we have the t value. To test whether it is significant, we would compare it the \(t_{critical}\) at a corresponding df, which is N -1 = 9. You can do this by looking up the t table or use t-test calculator on the internet.

t.test()

The base R provides the t.test()function to make it easier for us to conduct t-tests. For paired t-test, you would use t.test(score1, score2, paired = TRUE). The function will subtract score2 from score1 (i.e., score1 - score2). Therefore, it would make sense to the post-test in the score1 position.

t.test(dat$post, dat$pre, paired = TRUE)
## 
##  Paired t-test
## 
## data:  dat$post and dat$pre
## t = 2.7508, df = 9, p-value = 0.02245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2309462 2.3690538
## sample estimates:
## mean of the differences 
##                     1.3

The output includes the t value, its degrees of freedom (df), and the p-value to help determine whether it is statistically significantly different from 0. The function also gives use the 95% CI for the difference scores. The interval [0.23, 2.37] has a very high chance to capture the population mean of the difference scores. Note that the 95% CI does not include zero.

Looking at the p-value and 95% CI, we conclude that the difference between pre- and post-test was more than zero. That is, the post-test was significant higher than the pre-test.

Note: Paired-sample t-test is not limited to pre- vs. post-testing. It can be used with other dependent samples. For example, when we are studying twins, their genetics, personality, childhood background, etc. are not independent. We could use a paired-sample t-test to test, for example, whether older twins are more responsible than younger twins or not.

Wide vs. Long Format Data

When each person is represented in one row and all repeated observation are recorded as a new variable, such as pre and post. The dataset grows in columns or width with repeated observations. We call the data organized this way a wide formatted data.

p_id <- c(1:10) # participant's ID
time1 <-  c(3, 5, 8,  8, 6, 9, 2, 3,  7, 5)
time2 <- c(5, 8, 9, 10, 5, 9, 4, 2, 10, 7)
dat2 <- data.frame(p_id, pre, post)
dat2
##    p_id pre post
## 1     1   3    5
## 2     2   5    8
## 3     3   8    9
## 4     4   8   10
## 5     5   6    5
## 6     6   9    9
## 7     7   2    4
## 8     8   3    2
## 9     9   7   10
## 10   10   5    7
t.test(dat2$post, dat2$pre, paired = TRUE)
## 
##  Paired t-test
## 
## data:  dat2$post and dat2$pre
## t = 2.7508, df = 9, p-value = 0.02245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2309462 2.3690538
## sample estimates:
## mean of the differences 
##                     1.3

However, there is another way to organize the data, where each observation is recorded as a row. A repeated observations will create a new row. In this scheme, we will need variables to identify when the data was observed and who it belongs to.

ob_id <- c(1:20) # observation ID
p_id <- c(1:10, 1:10) # participant ID
score <-  c(3, 5, 8,  8, 6, 9, 2, 3,  7, 5, 5, 8, 9, 10, 5, 9, 4, 2, 10, 7) # all test scores.
time <- c(rep("pre", 10), rep("post", 10)) # First 10 were pre-test, last 10 were post-test.
dat2.long <- data.frame(ob_id, p_id, time, score)
knitr::kable(dat2.long)
ob_id p_id time score
1 1 pre 3
2 2 pre 5
3 3 pre 8
4 4 pre 8
5 5 pre 6
6 6 pre 9
7 7 pre 2
8 8 pre 3
9 9 pre 7
10 10 pre 5
11 1 post 5
12 2 post 8
13 3 post 9
14 4 post 10
15 5 post 5
16 6 post 9
17 7 post 4
18 8 post 2
19 9 post 10
20 10 post 7

In this long formatted data, the first 10 rows were the pre-test scores, and the last 10 rows were from the post-test. The time column was used to identify when the observation happened. The p_id identified which observations belonged to which participants. Repeated observation makes the data grows in rows or length. Hence, it is called a long format.

You can also use t.test() even if you data is in a long format. However, you will need to change the code to t.test(y ~ x, data, paired = TRUE), where y is your dependent variable and x is your independent variable (i.e., the testing time: pre vs. post).

dat2.long$time <- factor(dat2.long$time) #convert time into a factor
t.test(score ~ time, data = dat2.long, paired = TRUE)
## 
##  Paired t-test
## 
## data:  score by time
## t = 2.7508, df = 9, p-value = 0.02245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2309462 2.3690538
## sample estimates:
## mean of the differences 
##                     1.3

Effect size

The effect size can be calculated with Cohen’s \(d = \frac{\bar{D}}{s_D}\)

# Let's go back to the "dat" dataset. 
d.manual <- mean(dat$D)/sd(dat$D)
d.manual
## [1] 0.8698945

effectsize package

The effectsize package provides a function to calculate Cohen’s d.

library(effectsize)
effectsize::cohens_d(dat$post, dat$pre, paired = TRUE)
## Cohen's d |       95% CI
## ------------------------
## 0.87      | [0.12, 1.67]
# OR save a t-test as an R object and put it into the function. 
mypair_t.test <- t.test(dat$post, dat$pre, paired = TRUE)
effectsize::cohens_d(mypair_t.test)
## Cohen's d |       95% CI
## ------------------------
## 0.87      | [0.12, 1.67]