Interference

Study: School Attendance

A school district in a large city wants to boost student attendance rates. They identify 20,000 households, each containing exactly two enrolled students. In 10,000 randomly-chosen households, the parents receive no mail. In the remaining 10,000 households, the parents receive a letter about one of their two children, chosen at random. Attendance rates are then measured for the remainder of the school year for all children.

Measurement unit? Experimental Unit? Response? Factors? Design?

Study: School Attendance

  • Measurement unit: student
  • Experimental unit: student and household
  • Response: attendance rate
  • Experimental factor: mail (yes/no)
  • Design: Mix of cluster design (household is cluster) and CB (block is household).

This multistage design is specifically constructed to handle interference.

Question: How many potential outcomes are needed to represent this design?

3: \(Y_{ij}(1, 0), Y_{ij}(0, 1), Y_{ij}(0, 0)\)

Simulated School Attendance

Simulated School Attendance

# A tibble: 40,000 × 5
   household_id student_id treated_household treated_student     Y
          <int>      <int>             <dbl>           <dbl> <dbl>
 1            1          1                 0               0 0.802
 2            1          2                 0               0 0.859
 3            2          1                 0               0 0.778
 4            2          2                 0               0 0.838
 5            3          1                 1               0 0.891
 6            3          2                 1               1 0.956
 7            4          1                 1               0 0.924
 8            4          2                 1               1 0.975
 9            5          1                 1               1 0.897
10            5          2                 1               0 0.902
# ℹ 39,990 more rows
# treated students in treated households
ybar_1_0 <- attendance |>
  filter(treated_household == 1, treated_student == 1) |>
  summarize(avg_y = mean(Y)) |>
  pull()
ybar_1_0
[1] 0.8799312

Simulated School Attendance

# A tibble: 40,000 × 5
   household_id student_id treated_household treated_student     Y
          <int>      <int>             <dbl>           <dbl> <dbl>
 1            1          1                 0               0 0.802
 2            1          2                 0               0 0.859
 3            2          1                 0               0 0.778
 4            2          2                 0               0 0.838
 5            3          1                 1               0 0.891
 6            3          2                 1               1 0.956
 7            4          1                 1               0 0.924
 8            4          2                 1               1 0.975
 9            5          1                 1               1 0.897
10            5          2                 1               0 0.902
# ℹ 39,990 more rows
# their untreated siblings
ybar_0_1 <- attendance |>
  filter(treated_household == 1, treated_student == 0) |>
  summarize(avg_y = mean(Y)) |>
  pull()
ybar_0_1
[1] 0.8697945

Simulated School Attendance

# A tibble: 40,000 × 5
   household_id student_id treated_household treated_student     Y
          <int>      <int>             <dbl>           <dbl> <dbl>
 1            1          1                 0               0 0.802
 2            1          2                 0               0 0.859
 3            2          1                 0               0 0.778
 4            2          2                 0               0 0.838
 5            3          1                 1               0 0.891
 6            3          2                 1               1 0.956
 7            4          1                 1               0 0.924
 8            4          2                 1               1 0.975
 9            5          1                 1               1 0.897
10            5          2                 1               0 0.902
# ℹ 39,990 more rows
# students in untreated households
ybar_0_0 <- attendance |>
  filter(treated_household == 0) |>
  summarize(avg_y = mean(Y)) |>
  pull()
ybar_0_0
[1] 0.8502641

Estimates

direct_hat <- ybar_1_0 - ybar_0_0
direct_hat
[1] 0.0296671
spillover_hat <- ybar_0_1 - ybar_0_0
spillover_hat
[1] 0.01953034

Traditional estimate:

ate_hat_1 <- ybar_1_0 - ybar_0_1 # diff in means within treated households
ate_hat_1
[1] 0.01013676

Rand. test under interference

Question: How would you do a randomization test for the direct effect of treatment?

  1. Hold outcome for each student fixed.
  2. Reassign households to treatment or control and then students to treatment or control.
  3. Calculate direct_hat statistic.
  4. Repeat many times.

Case Studies

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

3000 students at UC Berkeley agree to participate in a study, and 1000 of them are chosen at random to receive a new kind of flu shot. All students are monitored for flu virus for the following eight weeks.

Saturation Model: Modeling spillover effects as a function of the fraction of treated individuals in a cluster (e.g., dormitory) rather than individual treatment status.

\[ Y_{ij}(D_{ij}, S_j) \]

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

17 pairs of similar geographic locations in Lowell, MA with high crime incidences were identified (“hot-spots”) and one hot-spot in each pair was randomly assigned to receive extra visits from police and extra follow-up by police authorities. Control hot-spots did not receive attention and police captains didn’t know their locations. Rates of emergency calls from each hot-spot were recorded before and after the study.

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

4.9 million eBay users were assigned either to a control condition, or to a treatment under which they received an email notification six hours before the end of any auction they bid in. The outcome is the amount of money spent by the user on eBay.

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

A retail store is introducing a new “Salesperson of the Month” award, which will be given at random to one of the employees. The record the amount of sales of all employees.

Sales Award Revisited

\[ Y_{i}(D_M, D_P, D_L) \]

agent Y(1,0,0) Y(0,1,0) Y(0,0,1)
Mary 100 50 70
Peter 50 50 50
Linus 90 50 90

Sales Award Revisited

\[ Y_i(D_M, D_P, D_L) \]

agent Y(1,0,0) Y(0,1,0) Y(0,0,1) Y(0,0,0)
Mary 100 50 70 70
Peter 50 50 50 50
Linus 90 50 90 90

Sales Award Revisited

\[ Y_i(D_M, D_P, D_L) \]

store agent Y(1,0,0) Y(0,1,0) Y(0,0,1) Y(0,0,0)
1 Mary 100 50 70 70
1 Peter 50 50 50 50
1 Linus 90 50 90 90
2 Priya 80 60 75 75
2 Leo 60 60 55 55
2 Ethan 70 60 85 85