Tracking how Santa Barbara County is faring with COVID-19. The data comes from the Santa Barbara County Public Health Department, which is updated every weekday; this page is updated weekly.
Hat tip to Chris Lortie for the idea.
library(tidyverse)
library(lubridate)
Load the data, which was previously scraped from the Department website using this script. The data is cumulative total confirmed cases broken down by date and by geographic area within the county.
raw_data <- read_csv("data/data.csv", col_types="fiD")
head(raw_data)
## # A tibble: 6 × 3
## area cases date
## <fct> <int> <date>
## 1 SOUTH COUNTY UNINCORPORATED AREA includes communities of Mon… 1353 2021-04-14
## 2 CITY OF SANTA BARBARA and the unincorporated area of Mission… 6323 2021-04-14
## 3 CITY OF GOLETA 1765 2021-04-14
## 4 COMMUNITY OF ISLA VISTA 1285 2021-04-14
## 5 UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA 1191 2021-04-14
## 6 SANTA YNEZ VALLEY including the Cities of Solvang & Buellton… 1004 2021-04-14
range(raw_data$date)
## [1] "2020-04-27" "2021-04-14"
Sanity check: for each geographic area, the numbers of cases, being cumulative totals, should increase monotonically over time. But this being the real world, in a few cases they don’t. Perhaps decreasing case numbers reflect corrections?
raw_data %>%
group_by(area) %>%
arrange(date) %>%
mutate(delta=cases-lag(cases)) %>% # computed within each group
filter(delta < 0) %>%
arrange(delta)
## # A tibble: 38 × 4
## # Groups: area [11]
## area cases date delta
## <fct> <int> <date> <int>
## 1 CITY OF SANTA MARIA 11249 2021-04-10 -61
## 2 FEDERAL PRISON IN LOMPOC 984 2020-07-08 -10
## 3 FEDERAL PRISON IN LOMPOC 893 2020-05-14 -3
## 4 COMMUNITY OF ORCUTT 1697 2021-03-03 -3
## 5 CITY OF SANTA MARIA 139 2020-05-02 -2
## 6 SOUTH COUNTY UNINCORPORATED AREA includes communities… 166 2020-08-12 -2
## 7 FEDERAL PRISON IN LOMPOC 1024 2020-09-21 -2
## 8 CITY OF SANTA MARIA 4089 2020-10-20 -2
## 9 UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA 1111 2021-03-03 -2
## 10 UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA 14 2020-05-06 -1
## # … with 28 more rows
Ignoring that, for simplicity we consolidate geographic areas into two regions, north county and south county. There’s no firm dividing line, but below is the definition we use. Notice that the Lompoc prison is excluded. To smooth the data we downsample it to weeks instead of days. Because we’re working with cumulative totals, we can do this simply by selecting all the Wednesdays (Wednesday because it avoids problems caused by missing data on weekends and holidays).
south_areas <- c("SOUTH COUNTY UNINCORPORATED AREA includes communities of Montecito, Summerland and the City of Carpinteria",
"CITY OF SANTA BARBARA and the unincorporated area of Mission Canyon",
"CITY OF GOLETA",
"COMMUNITY OF ISLA VISTA",
"UNINCORPORATED AREA OF THE GOLETA VALLEY AND GAVIOTA",
"SANTA YNEZ VALLEY including the Cities of Solvang & Buellton, and the communities of Santa Ynez, Los Alamos, Los Olivos and Ballard")
north_areas <- c("CITY OF LOMPOC and the communities of Mission Hills and Vandenberg Village",
"CITY OF SANTA MARIA",
"COMMUNITY OF ORCUTT",
"UNINCORPORATED AREAS of Sisquoc, Casmalia, Garey, Cuyama, New Cuyama, and the City of Guadalupe")
other_areas <- c("FEDERAL PRISON IN LOMPOC",
"People incarcerated at the Federal Prison in Lompoc",
"Out of County")
base_data <- raw_data %>%
mutate(region=fct_collapse(area, south=south_areas, north=north_areas, other=other_areas)) %>%
filter(region != "other") %>%
mutate(region=fct_relevel(region, c("north", "south"))) %>% # set order for consistent graphing, labels
group_by(region, date) %>%
summarize(cases=sum(cases), .groups="drop_last") %>%
filter(wday(date) == 4)
head(base_data)
## # A tibble: 6 × 3
## # Groups: region [1]
## region date cases
## <fct> <date> <int>
## 1 north 2020-04-29 271
## 2 north 2020-05-06 311
## 3 north 2020-05-13 348
## 4 north 2020-05-20 404
## 5 north 2020-05-27 486
## 6 north 2020-06-03 555
special_date_lines <- function (max_y) {
annotations <- data.frame(
x=c(ymd("2020-11-26"), ymd("2020-12-25")),
y=c(max_y*0.9, max_y*0.8),
label=c("Thanksgiving", "Christmas")
)
list(
geom_vline(xintercept=annotations$x[1], color="orange", linewidth=0.25),
geom_vline(xintercept=annotations$x[2], color="orange", linewidth=0.25),
geom_label(data=annotations, aes(x=x, y=y, label=label, hjust=1),
color="orange", size=3, show.legend=FALSE)
)
}
legend <- scale_color_discrete(
labels=c("North county (excluding prison)", "South county (including SYV)"))
ggplot(base_data, aes(x=date, y=cases, color=region)) +
special_date_lines(max(base_data$cases)) +
geom_line() +
labs(x="Date", y="Cumulative total cases", color="Region") +
legend
week_data <- base_data %>%
arrange(date) %>%
mutate(new_cases=cases-lag(cases)) %>%
drop_na # first row in each group is NA
ggplot(week_data, aes(x=date, y=new_cases, color=region)) +
special_date_lines(max(week_data$new_cases)) +
geom_line() +
labs(x="Date", y="Weekly new cases", color="Region") +
legend
Are we controlling the virus, or is it still spreading uncontrollably? Following Aatish Bhatia, we plot weekly new cases versus cumulative total cases. The idea is that as long as the virus is spreading exponentially, the number of new cases arising is proportional to the total number of cases at any given time. When plotted, such growth appears as a straight line. If/when the virus is controlled, the graph will plummet downward.
Note that time is implicit here. The graphs for each region evolve left to right, but not necessarily at a uniform rate, and not necessarily at the same rate.
ggplot(week_data, aes(x=cases, y=new_cases, color=region)) +
geom_line() +
geom_point(data=week_data[week_data$date==max(week_data$date),]) + # add terminal points
scale_x_log10() + scale_y_log10() +
annotation_logticks() +
labs(x="Cumulative total cases", y="Weekly new cases", color="Region") +
legend
How fast is the virus spreading? I.e., what growth rate does the general slope of the above graphs represent? Looking at north county growth over the first 13 weeks for which we have data (i.e., when the cumulative number of cases in that region went from approximately 300 to approximately 3,000), we can compute from \({\it week}_1 \cdot {\it growth\_rate}^{12} = {\it week}_{13}\):
data <- base_data %>% filter(region == "north") %>% arrange(date)
growth_rate <- (data[13,"cases"]/data[1,"cases"])^(1/12)
doubling_time <- log(2)/log(growth_rate) # weeks
So during that period the virus grew by 22% each week on average, meaning the number of cases doubled about every 3.4 weeks. Caveat: case counts are affected by the prevalence of testing and other factors.