# Basic Concepts

Paired-sample t-test (also called, dependent-sample t-test) compares means between two groups of observations that are NOT independent. For example, when you collected pre-test and post-test scores from the same group of students, the pre-test and post-test scores came from the same person (i.e., a within-subject design). Individuals with high pre-test scores were also likely to get high post-test scores, and those with lower pre-test scores were likely to have low post-test scores. In other words, the pre-test and post-test scores were correlated; they were NOT independent. Therefore, the assumption of independence observation was violated. We cannot just compare the means of pre- and post-test. Instead, we need to pair up the pre- and post-test scores and calcualte a difference (D) score for each person.

Let’s consider an example

p_id <- c(1:10) # participant's ID
pre <-  c(3, 5, 8,  8, 6, 9, 2, 3,  7, 5) #pre-test scores
post <- c(5, 8, 9, 10, 5, 9, 4, 2, 10, 7) #post-test scores
dat <- data.frame(p_id, pre, post) # create a data frame
dat
##    p_id pre post
## 1     1   3    5
## 2     2   5    8
## 3     3   8    9
## 4     4   8   10
## 5     5   6    5
## 6     6   9    9
## 7     7   2    4
## 8     8   3    2
## 9     9   7   10
## 10   10   5    7

We calculate the difference score (D = post - pre) for each person.

dat$D <- dat$post - dat$pre # calculate D score for each participant dat ## p_id pre post D ## 1 1 3 5 2 ## 2 2 5 8 3 ## 3 3 8 9 1 ## 4 4 8 10 2 ## 5 5 6 5 -1 ## 6 6 9 9 0 ## 7 7 2 4 2 ## 8 8 3 2 -1 ## 9 9 7 10 3 ## 10 10 5 7 2 library(psych) describe(dat$D)
##    vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 10  1.3 1.49      2    1.38 1.48  -1   3     4 -0.46    -1.47 0.47

On average, the scores went up by 1.3 points (SD = 1.4944341). We would like to test whether this score increase was statistically significant. The null hypothesis for this test was that pre- and post-test were not different, i.e., pre - post = D = 0. Therefore, $H_0: \mu_d = 0$.

At this point, we only need to test one variable, D, whether it is different from zero. Therefore, we can use the formula \begin{aligned} t &= \frac{\bar{D} - \mu_d}{SE_D} \\ &= \frac{\bar{D} - 0}{SE_D} \\ &=\frac{\bar{D}}{SE_D} \end{aligned}.

# Manual Calculation

We will need $$\bar{D}$$ and its standard error. Recall that $$SE = SD/\sqrt{N}$$

n <- nrow(dat) # get sample size N
n
## [1] 10
se_D <- sd(dat$D)/sqrt(n) # calculate SE se_D ## [1] 0.4725816 t <- mean(dat$D)/se_D # calculate t
t
## [1] 2.750848

Now we have the t value. To test whether it is significant, we would compare it the $$t_{critical}$$ at a corresponding df, which is N -1 = 9. You can do this by looking up the t table or use t-test calculator on the internet.

# t.test()

The base R provides the t.test()function to make it easier for us to conduct t-tests. For paired t-test, you would use t.test(score1, score2, paired = TRUE). The function will subtract score2 from score1 (i.e., score1 - score2). Therefore, it would make sense to the post-test in the score1 position.

t.test(dat$post, dat$pre, paired = TRUE)
##
##  Paired t-test
##
## data:  dat$post and dat$pre
## t = 2.7508, df = 9, p-value = 0.02245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2309462 2.3690538
## sample estimates:
## mean of the differences
##                     1.3

The output includes the t value, its degrees of freedom (df), and the p-value to help determine whether it is statistically significantly different from 0. The function also gives use the 95% CI for the difference scores. The interval [0.23, 2.37] has a very high chance to capture the population mean of the difference scores. Note that the 95% CI does not include zero.

Looking at the p-value and 95% CI, we conclude that the difference between pre- and post-test was more than zero. That is, the post-test was significant higher than the pre-test.

Note: Paired-sample t-test is not limited to pre- vs. post-testing. It can be used with other dependent samples. For example, when we are studying twins, their genetics, personality, childhood background, etc. are not independent. We could use a paired-sample t-test to test, for example, whether older twins are more responsible than younger twins or not.

# Wide vs. Long Format Data

When each person is represented in one row and all repeated observation are recorded as a new variable, such as pre and post. The dataset grows in columns or width with repeated observations. We call the data organized this way a wide formatted data.

p_id <- c(1:10) # participant's ID
time1 <-  c(3, 5, 8,  8, 6, 9, 2, 3,  7, 5)
time2 <- c(5, 8, 9, 10, 5, 9, 4, 2, 10, 7)
dat2 <- data.frame(p_id, pre, post)
dat2
##    p_id pre post
## 1     1   3    5
## 2     2   5    8
## 3     3   8    9
## 4     4   8   10
## 5     5   6    5
## 6     6   9    9
## 7     7   2    4
## 8     8   3    2
## 9     9   7   10
## 10   10   5    7
t.test(dat2$post, dat2$pre, paired = TRUE)
##
##  Paired t-test
##
## data:  dat2$post and dat2$pre
## t = 2.7508, df = 9, p-value = 0.02245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2309462 2.3690538
## sample estimates:
## mean of the differences
##                     1.3

However, there is another way to organize the data, where each observation is recorded as a row. A repeated observations will create a new row. In this scheme, we will need variables to identify when the data was observed and who it belongs to.

ob_id <- c(1:20) # observation ID
p_id <- c(1:10, 1:10) # participant ID
score <-  c(3, 5, 8,  8, 6, 9, 2, 3,  7, 5, 5, 8, 9, 10, 5, 9, 4, 2, 10, 7) # all test scores.
time <- c(rep("pre", 10), rep("post", 10)) # First 10 were pre-test, last 10 were post-test.
dat2.long <- data.frame(ob_id, p_id, time, score)
knitr::kable(dat2.long)
ob_id p_id time score
1 1 pre 3
2 2 pre 5
3 3 pre 8
4 4 pre 8
5 5 pre 6
6 6 pre 9
7 7 pre 2
8 8 pre 3
9 9 pre 7
10 10 pre 5
11 1 post 5
12 2 post 8
13 3 post 9
14 4 post 10
15 5 post 5
16 6 post 9
17 7 post 4
18 8 post 2
19 9 post 10
20 10 post 7

In this long formatted data, the first 10 rows were the pre-test scores, and the last 10 rows were from the post-test. The time column was used to identify when the observation happened. The p_id identified which observations belonged to which participants. Repeated observation makes the data grows in rows or length. Hence, it is called a long format.

You can also use t.test() even if you data is in a long format. However, you will need to change the code to t.test(y ~ x, data, paired = TRUE), where y is your dependent variable and x is your independent variable (i.e., the testing time: pre vs. post).

dat2.long$time <- factor(dat2.long$time) #convert time into a factor
t.test(score ~ time, data = dat2.long, paired = TRUE)
##
##  Paired t-test
##
## data:  score by time
## t = 2.7508, df = 9, p-value = 0.02245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2309462 2.3690538
## sample estimates:
## mean of the differences
##                     1.3

# Effect size

The effect size can be calculated with Cohen’s $$d = \frac{\bar{D}}{s_D}$$

# Let's go back to the "dat" dataset.
d.manual <- mean(dat$D)/sd(dat$D)
d.manual
## [1] 0.8698945

## effectsize package

The effectsize package provides a function to calculate Cohen’s d.

library(effectsize)
effectsize::cohens_d(dat$post, dat$pre, paired = TRUE)
## Cohen's d |       95% CI
## ------------------------
## 0.87      | [0.12, 1.67]
# OR save a t-test as an R object and put it into the function.
mypair_t.test <- t.test(dat$post, dat$pre, paired = TRUE)
effectsize::cohens_d(mypair_t.test)
## Cohen's d |       95% CI
## ------------------------
## 0.87      | [0.12, 1.67]