Tracking how Santa Barbara County is faring with COVID-19. The data comes from the Santa Barbara County Public Health Department, which is updated every weekday; this page is updated weekly.

Hat tip to Chris Lortie for the idea.

library(tidyverse)
library(lubridate)

Load the data, which was previously scraped from the Department website using this script. The data is cumulative total confirmed cases broken down by date and by geographic area within the county.

raw_data <- read_csv("data/data.csv", col_types="fiD")
head(raw_data)
## # A tibble: 6 × 3
##   area                                                          cases date      
##   <fct>                                                         <int> <date>    
## 1 SOUTH COUNTY UNINCORPORATED AREA includes communities of Mon…  1353 2021-04-14
## 2 CITY OF SANTA BARBARA and the unincorporated area of Mission…  6323 2021-04-14
## 3 CITY OF GOLETA                                                 1765 2021-04-14
## 4 COMMUNITY OF ISLA VISTA                                        1285 2021-04-14
## 5 UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA           1191 2021-04-14
## 6 SANTA YNEZ VALLEY including the Cities of Solvang & Buellton…  1004 2021-04-14
range(raw_data$date)
## [1] "2020-04-27" "2021-04-14"

Sanity check: for each geographic area, the numbers of cases, being cumulative totals, should increase monotonically over time. But this being the real world, in a few cases they don’t. Perhaps decreasing case numbers reflect corrections?

raw_data %>%
  group_by(area) %>%
  arrange(date) %>%
  mutate(delta=cases-lag(cases)) %>% # computed within each group
  filter(delta < 0) %>%
  arrange(delta)
## # A tibble: 38 × 4
## # Groups:   area [11]
##    area                                                   cases date       delta
##    <fct>                                                  <int> <date>     <int>
##  1 CITY OF SANTA MARIA                                    11249 2021-04-10   -61
##  2 FEDERAL PRISON IN LOMPOC                                 984 2020-07-08   -10
##  3 FEDERAL PRISON IN LOMPOC                                 893 2020-05-14    -3
##  4 COMMUNITY OF ORCUTT                                     1697 2021-03-03    -3
##  5 CITY OF SANTA MARIA                                      139 2020-05-02    -2
##  6 SOUTH COUNTY UNINCORPORATED AREA includes communities…   166 2020-08-12    -2
##  7 FEDERAL PRISON IN LOMPOC                                1024 2020-09-21    -2
##  8 CITY OF SANTA MARIA                                     4089 2020-10-20    -2
##  9 UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA    1111 2021-03-03    -2
## 10 UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA      14 2020-05-06    -1
## # … with 28 more rows

Ignoring that, for simplicity we consolidate geographic areas into two regions, north county and south county. There’s no firm dividing line, but below is the definition we use. Notice that the Lompoc prison is excluded. To smooth the data we downsample it to weeks instead of days. Because we’re working with cumulative totals, we can do this simply by selecting all the Wednesdays (Wednesday because it avoids problems caused by missing data on weekends and holidays).

south_areas <- c("SOUTH COUNTY UNINCORPORATED AREA includes communities of Montecito, Summerland and the City of Carpinteria",
                 "CITY OF SANTA BARBARA and the unincorporated area of Mission Canyon",
                 "CITY OF GOLETA",
                 "COMMUNITY OF ISLA VISTA",
                 "UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA",
                 "SANTA YNEZ VALLEY including the Cities of Solvang & Buellton, and the communities of Santa Ynez, Los Alamos, Los Olivos and Ballard")
north_areas <- c("CITY OF LOMPOC and the communities of Mission Hills and Vandenberg Village",
                 "CITY OF SANTA MARIA",
                 "COMMUNITY OF ORCUTT",
                 "UNINCORPORATED AREAS of Sisquoc, Casmalia, Garey, Cuyama, New Cuyama, and the City of Guadalupe")
other_areas <- c("FEDERAL PRISON IN LOMPOC",
                 "People incarcerated at the Federal Prison in Lompoc",
                 "Out of County")

base_data <- raw_data %>%
  mutate(region=fct_collapse(area, south=south_areas, north=north_areas, other=other_areas)) %>%
  filter(region != "other") %>%
  mutate(region=fct_relevel(region, c("north", "south"))) %>% # set order for consistent graphing, labels
  group_by(region, date) %>%
  summarize(cases=sum(cases), .groups="drop_last") %>%
  filter(wday(date) == 4)

head(base_data)
## # A tibble: 6 × 3
## # Groups:   region [1]
##   region date       cases
##   <fct>  <date>     <int>
## 1 north  2020-04-29   271
## 2 north  2020-05-06   311
## 3 north  2020-05-13   348
## 4 north  2020-05-20   404
## 5 north  2020-05-27   486
## 6 north  2020-06-03   555

Cumulative total cases

special_date_lines <- function (max_y) {
  annotations <- data.frame(
    x=c(ymd("2020-11-26"), ymd("2020-12-25")),
    y=c(max_y*0.9, max_y*0.8),
    label=c("Thanksgiving", "Christmas")
  )
  list(
    geom_vline(xintercept=annotations$x[1], color="orange", linewidth=0.25),
    geom_vline(xintercept=annotations$x[2], color="orange", linewidth=0.25),
    geom_label(data=annotations, aes(x=x, y=y, label=label, hjust=1),
               color="orange", size=3, show.legend=FALSE)
  )
}

legend <- scale_color_discrete(
  labels=c("North county (excluding prison)", "South county (including SYV)"))

ggplot(base_data, aes(x=date, y=cases, color=region)) +
  special_date_lines(max(base_data$cases)) +
  geom_line() +
  labs(x="Date", y="Cumulative total cases", color="Region") +
  legend

Weekly new cases

week_data <- base_data %>%
  arrange(date) %>%
  mutate(new_cases=cases-lag(cases)) %>%
  drop_na # first row in each group is NA

ggplot(week_data, aes(x=date, y=new_cases, color=region)) +
  special_date_lines(max(week_data$new_cases)) +
  geom_line() +
  labs(x="Date", y="Weekly new cases", color="Region") +
  legend

Trajectory

Are we controlling the virus, or is it still spreading uncontrollably? Following Aatish Bhatia, we plot weekly new cases versus cumulative total cases. The idea is that as long as the virus is spreading exponentially, the number of new cases arising is proportional to the total number of cases at any given time. When plotted, such growth appears as a straight line. If/when the virus is controlled, the graph will plummet downward.

Note that time is implicit here. The graphs for each region evolve left to right, but not necessarily at a uniform rate, and not necessarily at the same rate.

ggplot(week_data, aes(x=cases, y=new_cases, color=region)) +
  geom_line() +
  geom_point(data=week_data[week_data$date==max(week_data$date),]) + # add terminal points
  scale_x_log10() + scale_y_log10() +
  annotation_logticks() +
  labs(x="Cumulative total cases", y="Weekly new cases", color="Region") +
  legend

Growth rate

How fast is the virus spreading? I.e., what growth rate does the general slope of the above graphs represent? Looking at north county growth over the first 13 weeks for which we have data (i.e., when the cumulative number of cases in that region went from approximately 300 to approximately 3,000), we can compute from \({\it week}_1 \cdot {\it growth\_rate}^{12} = {\it week}_{13}\):

data <- base_data %>% filter(region == "north") %>% arrange(date)
growth_rate <- (data[13,"cases"]/data[1,"cases"])^(1/12)
doubling_time <- log(2)/log(growth_rate) # weeks

So during that period the virus grew by 22% each week on average, meaning the number of cases doubled about every 3.4 weeks. Caveat: case counts are affected by the prevalence of testing and other factors.