Interference

Study: School Attendance

A school district in a large city wants to boost student attendance rates. They identify 20,000 households, each containing exactly two enrolled students. In 10,000 randomly-chosen households, the parents receive no mail. In the remaining 10,000 households, the parents receive a letter about one of their two children, chosen at random. Attendance rates are then measured for the remainder of the school year for all children.

Measurement unit? Experimental Unit? Response? Factors? Design?

Study: School Attendance

Measurement unit: student
Experimental unit: student and household
Response: attendance rate
Experimental factor: mail (yes/no)
Design: Mix of cluster design (household is cluster) and CB (block is household).

This multistage design is specifically constructed to handle interference.

Question: How many potential outcomes are needed to represent this design?

3: \(Y_{ij}(1, 0), Y_{ij}(0, 1), Y_{ij}(0, 0)\)

Simulated School Attendance

# A tibble: 40,000 × 5
   household_id student_id treated_household treated_student     Y
          <int>      <int>             <dbl>           <dbl> <dbl>
 1            1          1                 0               0 0.802
 2            1          2                 0               0 0.859
 3            2          1                 0               0 0.778
 4            2          2                 0               0 0.838
 5            3          1                 1               0 0.891
 6            3          2                 1               1 0.956
 7            4          1                 1               0 0.924
 8            4          2                 1               1 0.975
 9            5          1                 1               1 0.897
10            5          2                 1               0 0.902
# ℹ 39,990 more rows

# treated students in treated households
ybar_1_0 <- attendance |>
  filter(treated_household == 1, treated_student == 1) |>
  summarize(avg_y = mean(Y)) |>
  pull()
ybar_1_0

[1] 0.8799312

Simulated School Attendance

# A tibble: 40,000 × 5
   household_id student_id treated_household treated_student     Y
          <int>      <int>             <dbl>           <dbl> <dbl>
 1            1          1                 0               0 0.802
 2            1          2                 0               0 0.859
 3            2          1                 0               0 0.778
 4            2          2                 0               0 0.838
 5            3          1                 1               0 0.891
 6            3          2                 1               1 0.956
 7            4          1                 1               0 0.924
 8            4          2                 1               1 0.975
 9            5          1                 1               1 0.897
10            5          2                 1               0 0.902
# ℹ 39,990 more rows

# their untreated siblings
ybar_0_1 <- attendance |>
  filter(treated_household == 1, treated_student == 0) |>
  summarize(avg_y = mean(Y)) |>
  pull()
ybar_0_1

[1] 0.8697945

Simulated School Attendance

# A tibble: 40,000 × 5
   household_id student_id treated_household treated_student     Y
          <int>      <int>             <dbl>           <dbl> <dbl>
 1            1          1                 0               0 0.802
 2            1          2                 0               0 0.859
 3            2          1                 0               0 0.778
 4            2          2                 0               0 0.838
 5            3          1                 1               0 0.891
 6            3          2                 1               1 0.956
 7            4          1                 1               0 0.924
 8            4          2                 1               1 0.975
 9            5          1                 1               1 0.897
10            5          2                 1               0 0.902
# ℹ 39,990 more rows

# students in untreated households
ybar_0_0 <- attendance |>
  filter(treated_household == 0) |>
  summarize(avg_y = mean(Y)) |>
  pull()
ybar_0_0

[1] 0.8502641

Estimates

direct_hat <- ybar_1_0 - ybar_0_0
direct_hat

[1] 0.0296671

spillover_hat <- ybar_0_1 - ybar_0_0
spillover_hat

[1] 0.01953034

Traditional estimate:

ate_hat_1 <- ybar_1_0 - ybar_0_1 # diff in means within treated households
ate_hat_1

[1] 0.01013676

Rand. test under interference

Question: How would you do a randomization test for the direct effect of treatment?

Hold outcome for each student fixed.
Reassign households to treatment or control and then students to treatment or control.
Calculate direct_hat statistic.
Repeat many times.

Case Studies

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

3000 students at UC Berkeley agree to participate in a study, and 1000 of them are chosen at random to receive a new kind of flu shot. All students are monitored for flu virus for the following eight weeks.

Saturation Model: Modeling spillover effects as a function of the fraction of treated individuals in a cluster (e.g., dormitory) rather than individual treatment status.

\[ Y_{ij}(D_{ij}, S_j) \]

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

17 pairs of similar geographic locations in Lowell, MA with high crime incidences were identified (“hot-spots”) and one hot-spot in each pair was randomly assigned to receive extra visits from police and extra follow-up by police authorities. Control hot-spots did not receive attention and police captains didn’t know their locations. Rates of emergency calls from each hot-spot were recorded before and after the study.

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

4.9 million eBay users were assigned either to a control condition, or to a treatment under which they received an email notification six hours before the end of any auction they bid in. The outcome is the amount of money spent by the user on eBay.

How would one person’s treatment influence another’s outcome in each one? What kind of bias could this introduce? Could you redesign the experiment to avoid the interference?

A retail store is introducing a new “Salesperson of the Month” award, which will be given at random to one of the employees. The record the amount of sales of all employees.

Sales Award Revisited

\[ Y_{i}(D_M, D_P, D_L) \]

agent	Y(1,0,0)	Y(0,1,0)	Y(0,0,1)
Mary	100	50	70
Peter	50	50	50
Linus	90	50	90

Sales Award Revisited

\[ Y_i(D_M, D_P, D_L) \]

agent	Y(1,0,0)	Y(0,1,0)	Y(0,0,1)	Y(0,0,0)
Mary	100	50	70	70
Peter	50	50	50	50
Linus	90	50	90	90

Sales Award Revisited

\[ Y_i(D_M, D_P, D_L) \]

store	agent	Y(1,0,0)	Y(0,1,0)	Y(0,0,1)	Y(0,0,0)
1	Mary	100	50	70	70
1	Peter	50	50	50	50
1	Linus	90	50	90	90
2	Priya	80	60	75	75
2	Leo	60	60	55	55
2	Ethan	70	60	85	85