Blocking and Variance

Preliminaries

Let \(g_1\) be a random subset of \(\{1, \ldots, n\}\) of size \(n_1\) that gets the treatment. Let \(g_0\) be a random subset of \(\{1, \ldots, n\}\) with \(n_0\) elements disjoint from \(g_1\) that gets the control. 1

We estimate the population means with the sample means

\[ \hat{\bar{Y}}_1 = \frac{1}{n_1} \sum_{i \in g_1} Y_i(1) \quad \quad \quad \hat{\bar{Y}}_0 = \frac{1}{n_0} \sum_{i \in g_0} Y_i(0) \]

with variance and covariance

\[ Var(\hat{\bar{Y}}_1) = \frac{n - n_1}{n - 1} \frac{\sigma_1^2}{n_1} \quad \quad Var(\hat{\bar{Y}}_0) = \frac{n - n_0}{n - 1} \frac{\sigma_0^2}{n_0} \]

\[ Cov(\hat{\bar{Y}}_1, \hat{\bar{Y}}_0) = -\frac{1}{n - 1}Cov(Y_i(1), Y_i(0)) \]

Completely Randomized Design \(CR[1]\)

CR[1]: Full Schedule
Project Y(0) Y(1)
1 0 0
2 1 0
3 2 1
4 4 2
5 4 0
6 6 0
7 14 12
8 15 9
9 16 8
10 16 15
11 17 5
12 18 17
\(\bar{Y}\) 9.42 5.75

CR[1]: Experiment 1
Project Y(0) Y(1)
1 0 0
2 1 0
3 2 1
4 4 2
5 4 0
6 6 0
7 14 12
8 15 9
9 16 8
10 16 15
11 17 5
12 18 17
\(\bar{Y}\) 9.42 5.75
\(\hat{\bar{Y}}\) 11.83 4.67

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
11.83 4.67 −7.17

CR[1]: Experiment 2
Project Y(0) Y(1)
1 0 0
2 1 0
3 2 1
4 4 2
5 4 0
6 6 0
7 14 12
8 15 9
9 16 8
10 16 15
11 17 5
12 18 17
\(\bar{Y}\) 9.42 5.75
\(\hat{\bar{Y}}\) 12.00 2.67

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
11.83 4.67 −7.17
12.00 2.67 −9.33

CR[1]: Experiment 3
Project Y(0) Y(1)
1 0 0
2 1 0
3 2 1
4 4 2
5 4 0
6 6 0
7 14 12
8 15 9
9 16 8
10 16 15
11 17 5
12 18 17
\(\bar{Y}\) 9.42 5.75
\(\hat{\bar{Y}}\) 8.50 7.33

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
11.83 4.67 −7.17
12.00 2.67 −9.33
8.50 7.33 −1.17

CR[1]: Experiment 4
Project Y(0) Y(1)
1 0 0
2 1 0
3 2 1
4 4 2
5 4 0
6 6 0
7 14 12
8 15 9
9 16 8
10 16 15
11 17 5
12 18 17
\(\bar{Y}\) 9.42 5.75
\(\hat{\bar{Y}}\) 7.50 7.83

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
11.83 4.67 −7.17
12.00 2.67 −9.33
8.50 7.33 −1.17
7.50 7.83 0.33

CR[1]: Experiment 5
Project Y(0) Y(1)
1 0 0
2 1 0
3 2 1
4 4 2
5 4 0
6 6 0
7 14 12
8 15 9
9 16 8
10 16 15
11 17 5
12 18 17
\(\bar{Y}\) 9.42 5.75
\(\hat{\bar{Y}}\) 9.67 5.67

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
11.83 4.67 −7.17
12.00 2.67 −9.33
8.50 7.33 −1.17
7.50 7.83 0.33
9.67 5.67 −4.00

CR[1]: Experiment 100
Project Y(0) Y(1)
1 0 0
2 1 0
3 2 1
4 4 2
5 4 0
6 6 0
7 14 12
8 15 9
9 16 8
10 16 15
11 17 5
12 18 17
\(\bar{Y}\) 9.42 5.75
\(\hat{\bar{Y}}\) 9.00 6.50

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
11.83 4.67 −7.17
12.00 2.67 −9.33
8.50 7.33 −1.17
7.50 7.83 0.33
9.67 5.67 −4.00

\[\begin{equation} \begin{aligned} Var(\widehat{ATE}) &= Var(\hat{\bar{Y}}_1 - \hat{\bar{Y}}_0) \\ &= Var(\hat{\bar{Y}}_1) + Var(\hat{\bar{Y}}_0) - 2Cov(\hat{\bar{Y}}_1, \hat{\bar{Y}}_0) \\ &= \frac{n - n_1}{n - 1} \frac{\sigma_1^2}{n_1} + \frac{n - n_0}{n - 1} \frac{\sigma_0^2}{n_0} + 2 \frac{1}{n - 1}Cov(Y_i(1), Y_i(0)) \\ &= \frac{1}{n - 1} \left( \frac{n_0 \sigma_1^2}{n_1} + \frac{n_1 \sigma_0^2}{n_0} + 2 Cov(Y_i(1), Y_i(0)) \right) \end{aligned} \end{equation}\]

\[ SE(\widehat{ATE}) = \sqrt{ \frac{1}{n- 1} \left( \frac{n_0 \sigma_1^2}{n_1} + \frac{n_1 \sigma_0^2}{n_0} + 2 Cov(Y_i(1), Y_i(0)) \right) } \]

\[ SE(\widehat{ATE}) = \sqrt{ \frac{1}{n - 1} \left( \frac{n_0 \sigma_1^2}{n_1} + \frac{n_1 \sigma_0^2}{n_0} + 2 Cov(Y_i(1), Y_i(0)) \right) } \]

How does \(SE(\widehat{ATE})\) relate to the following quantities? What does it suggest about how to plan your design?

  1. The total number of units under study \(n\).
  2. The variance within each set of potential outcomes \(\sigma_1^2\), \(\sigma_0^2\).
  3. The relative size of the groups \(n_1\) and \(n_0\).
  4. The covariance in the potential outcomes \(Cov(Y_i(1), Y_i(0))\).

Exact SE:

sigsq_1 <- mean((indo_cr$`Y(1)` - mean(indo_cr$`Y(1)`))^2)
sigsq_0 <- mean((indo_cr$`Y(0)` - mean(indo_cr$`Y(0)`))^2)
cov_01 <- mean((indo_cr$`Y(1)` - mean(indo_cr$`Y(1)`)) * 
               (indo_cr$`Y(0)` - mean(indo_cr$`Y(0)`)))
var_ATE <- 1 / (n - 1) * (n_0 / n_1 * sigsq_0 + n_1 / n_0 * 
                          sigsq_1 + 2 * cov_01)

se_ATE <- sqrt(var_ATE)
se_ATE
[1] 3.729151

Exact SE:

se_ATE
[1] 3.729151

Generalized Complete Block Design \(GCB[1]\)

GCB[1]: Full Schedule
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58

GCB[1]: Experiment 1
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58
\(\hat{\bar{Y}}\) 8.83 6.17

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
8.83 6.17 −2.67

GCB[1]: Experiment 2
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58
\(\hat{\bar{Y}}\) 9.83 4.83

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
8.83 6.17 −2.67
9.83 4.83 −5.00

GCB[1]: Experiment 3
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58
\(\hat{\bar{Y}}\) 8.83 5.00

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
8.83 6.17 −2.67
9.83 4.83 −5.00
8.83 5.00 −3.83

GCB[1]: Experiment 4
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58
\(\hat{\bar{Y}}\) 9.83 3.67

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
8.83 6.17 −2.67
9.83 4.83 −5.00
8.83 5.00 −3.83
9.83 3.67 −6.17

GCB[1]: Experiment 5
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58
\(\hat{\bar{Y}}\) 9.83 4.83

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
8.83 6.17 −2.67
9.83 4.83 −5.00
8.83 5.00 −3.83
9.83 3.67 −6.17
9.83 4.83 −5.00

GCB[1]: Experiment 100
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58
\(\hat{\bar{Y}}\) 10.33 4.33

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
8.83 6.17 −2.67
9.83 4.83 −5.00
8.83 5.00 −3.83
9.83 3.67 −6.17
9.83 4.83 −5.00

GCB[1]: Experiment 100
Project Region Y(0) Y(1)
1 A 0 0
2 A 0 0
3 A 2 4
4 A 2 4
5 A 4 8
6 A 6 8
7 B 14 12
8 B 16 8
9 B 16 8
10 B 17 5
11 B 17 5
12 B 18 5
\(\bar{Y}\) 9.33 5.58
\(\hat{\bar{Y}}\) 10.33 4.33

\(\hat{\bar{Y}}_0\) \(\hat{\bar{Y}}_1\) \[\widehat{ATE}\]
8.83 6.17 −2.67
9.83 4.83 −5.00
8.83 5.00 −3.83
9.83 3.67 −6.17
9.83 4.83 −5.00

Let \(A\) and \(B\) be a (non-random) partition of the units \(\{1, \ldots, n\}\) into blocks and let be \(n_A\) and \(n_B\) their sizes.

\[\begin{equation} \begin{aligned} ATE &= \frac{1}{n}(Y_1(1) - Y_1(0)) + \frac{1}{n}(Y_2(1) - Y_2(0)) + \ldots + \frac{1}{n}(Y_n(1) - Y_n(0)) \\ &= \sum_{i \in A} \frac{1}{n} Y_i(1) - Y_i(0) + \sum_{i \in B} \frac{1}{n} Y_i(1) - Y_i(0) \\ &= \frac{1}{n} \frac{n_A}{1} \frac{1}{n_A} \sum_{i \in A} Y_i(1) - Y_i(0) + \frac{1}{n} \frac{n_B}{1} \frac{1}{n_B} \sum_{i \in B} Y_i(1) - Y_i(0) \\ &= \frac{n_A}{n} ATE_A + \frac{n_B}{n} ATE_B \\ \end{aligned} \end{equation}\]

\[\begin{equation} \begin{aligned} Var(\widehat{ATE}) &= Var(\frac{n_A}{n} \widehat{ATE}_A + \frac{n_B}{n} \widehat{ATE}_B) \\ &= \frac{n_A^2}{n^2} Var(\widehat{ATE}_A) + \frac{n_B^2}{n^2} Var(\widehat{ATE}_B) \\ \end{aligned} \end{equation}\]

\[\begin{equation} \begin{aligned} SE(\widehat{ATE}) &= \sqrt{ \frac{n_A^2}{n^2} (SE(\widehat{ATE}_A)^2) + \frac{n_B^2}{n^2} (SE(\widehat{ATE}_B)^2)} \\ \end{aligned} \end{equation}\]

Recall for \(CR{1}\),

\[ SE(\widehat{ATE}) = \sqrt{ \frac{1}{n - 1} \left( \frac{n_0 \sigma_1^2}{n_1} + \frac{n_1 \sigma_0^2}{n_0} + 2 Cov(Y_i(1), Y_i(0)) \right) } \]

Exact block SEs:

n_A <- 6

a_sigsq_1 <- mean((indo_cb$`Y(1)`[ind_a] - mean(indo_cb$`Y(1)`[ind_a]))^2)
a_sigsq_0 <- mean((indo_cb$`Y(0)`[ind_a] - mean(indo_cb$`Y(0)`[ind_a]))^2)
a_cov_01 <- mean((indo_cb$`Y(1)`[ind_a] - mean(indo_cb$`Y(1)`[ind_a])) * 
               (indo_cb$`Y(0)`[ind_a] - mean(indo_cb$`Y(0)`[ind_a])))
a_var_ATE <- 1 / (n_A - 1) * (a_sigsq_0 + a_sigsq_1 + 2 * a_cov_01)
a_se_ATE <- sqrt(a_var_ATE)


a_se_ATE
[1] 2.389793

Exact block SEs:

a_se_ATE
[1] 2.389793
n_B <- 6
b_sigsq_1 <- mean((indo_cb$`Y(1)`[ind_b] - mean(indo_cb$`Y(1)`[ind_b]))^2)
b_sigsq_0 <- mean((indo_cb$`Y(0)`[ind_b] - mean(indo_cb$`Y(0)`[ind_b]))^2)
b_cov_01 <- mean((indo_cb$`Y(1)`[ind_b] - mean(indo_cb$`Y(1)`[ind_b])) * 
               (indo_cb$`Y(0)`[ind_b] - mean(indo_cb$`Y(0)`[ind_b])))
b_var_ATE <- 1 / (n_B - 1) * (b_sigsq_0 + b_sigsq_1 + 2 * b_cov_01)
b_se_ATE <- sqrt(b_var_ATE)
b_se_ATE
[1] 0.6191392

Exact SE:

ab_se_ate <- sqrt(n_A^2 / n^2 * a_se_ATE^2  + n_B^2 / n^2 * b_se_ATE^2)
ab_se_ate
[1] 1.234346