ORCID API Example

Greg Janée, March 2024

Example of querying the ORCID Public API from R using the rorcid package. In this example our goal is to find all ORCID IDs belonging to people who are currently employed at UCSB, and to do some rudimentary analysis.

library(tidyverse)
library(rorcid)
library(stringdist)

Getting access

Log in to ORCID and, in the menu under your name, visit “Developer tools” to create a client ID. The web form is intended for developers registering OAuth applications. For the purpose of one-off API access it doesn’t seem to matter what values you enter.

There’s no need to stash the returned client ID and client secret anywhere; they can always be viewed on that page.

Next step is to obtain a token that will allow querying (and only querying) against the public API. Presumably writing to ORCID profiles requires a different kind of token. Run this curl command from the Bash command line:

curl -H 'Accept: application/json'    \
     -d grant_type=client_credentials \
     -d scope=/read-public            \
     -d client_id=...                 \
     -d client_secret=...             \
     https://orcid.org/oauth/token

A JSON document is returned. Copy the value of the access_token element and record it in a .Renviron file, located either in your home directory or in your R project’s root directory, like so:

ORCID_TOKEN="..."

Restart R for it to take effect. Neither the client ID nor the token mention an expiration date, so perhaps they last forever?

Query process overview

The basic idea is to supply a query in the form of a string expression and get back a 3-column dataframe. For example:

result <- orcid("email:*@ucsb.edu")
head(result)
orcid-identifier.uri orcid-identifier.path orcid-identifier.host
https://orcid.org/0000-0002-8504-0049 0000-0002-8504-0049 orcid.org
https://orcid.org/0000-0002-9157-6640 0000-0002-9157-6640 orcid.org
https://orcid.org/0000-0003-0173-8042 0000-0003-0173-8042 orcid.org
https://orcid.org/0000-0002-6274-4852 0000-0002-6274-4852 orcid.org
https://orcid.org/0000-0003-0205-2261 0000-0003-0205-2261 orcid.org
https://orcid.org/0000-0001-5003-731X 0000-0001-5003-731X orcid.org

Well as can be seen two of the columns are redundant. Let’s define a function to return a cleaner result.

orcid_query <- function(query, start=NULL, rows=NULL) {
    orcid(query, start, rows) %>%
    as_tibble %>%
    select(id=`orcid-identifier.path`)
}

result <- orcid_query("email:*@ucsb.edu")
head(result)
id
0000-0002-8504-0049
0000-0002-9157-6640
0000-0003-0173-8042
0000-0002-6274-4852
0000-0003-0205-2261
0000-0001-5003-731X

The number of rows returned may be explicitly limited (that’s the purpose of the rows argument above; more on pagination later), but even if not, ORCID limits the rows returned to 1,000 per call. Furthermore, there’s an overall limit of 10,000 rows maximum for any given query. A nice feature is that, regardless of how many rows are returned, the total number is always returned as the found attribute:

attr(result, "found")
[1] 154

(Only 154 ORCID IDs that have a @ucsb.edu email address?! Remember we’re using the public API, so only publicly available information is available to us. Clearly most people are not making their email addresses public via ORCID.)

Formulating the query

ORCID internally maintains structured records, and in principle its database could be queried for people whose current employment is UCSB. But that level of granularity is not exposed through the API. The only queries supported are freetext search over entire profiles and search over a handful of named fields such as email as shown previously. For our purpose, affiliation-org-name is the relevant field. Note that this field aggregates all affiliations, including employment, education, and perhaps other types. It is also possible to search by various types of organization identifiers (GRID, ROR, RINGGOLD), but these are unlikely to be entered by laypeople. If they appear, they will have been auto-populated by publishers or institutional integrations (the latter not applicable to UCSB), or by ORCID itself when an organization is selected from the menu that pops up when somebody starts typing some text. So, we stick with searching over the affiliation-org-name text field.

Here are the names for UCSB we’ll look for:

ucsb_names <- c(
    "University of California, Santa Barbara",  # official
    "University of California at Santa Barbara",
    "University of California Santa Barbara",
    "UC Santa Barbara",
    "UCSB"
)

Just out of curiosity, how many IDs are returned for each name?

tibble(
    name=ucsb_names,
    count=map_int(
        ucsb_names,
        function(name) {
            query <- paste('affiliation-org-name:"', name, '"', sep="")
            attr(orcid_query(query), "found")
        }
    )
)
name count
University of California, Santa Barbara 5305
University of California at Santa Barbara 25
University of California Santa Barbara 5305
UC Santa Barbara 70
UCSB 64

Results pagination

Query results are paginated. To retrieve all rows we write a function that requests 200 rows at a time and concatenates them into a single dataframe.

orcid_query_all_results <- function(query) {
    num_results <- attr(orcid_query(query, rows=1), "found")
    reduce(
        map(
            seq(0, num_results, 200),
            function(offset) {
                orcid_query(query, offset, 200)
            }
        ),
        bind_rows
    )
}

Final query

Here’s our final query. We look for IDs that have an affiliation that matches any of our UCSB names. This query takes only a few seconds, but for good netiquette we cache the results.

cache_file <- "ids.RData"

if (file.exists(cache_file)) {
    load(cache_file)
} else {
    ucsb_ids <- orcid_query_all_results(
        paste(
            paste('affiliation-org-name:"', ucsb_names, '"', sep=""),
            collapse=" OR "
        )
    )
    save(ucsb_ids, file=cache_file)
}

How many IDs did we get?

attr(ucsb_ids, "found")
[1] 5442

Again, these are ORCID IDs that contain any kind of affiliation with UCSB, not necessarily employment, and not necessarily current employment.

Employment data

Getting employment data is super easy: just pass the entire list of ORCID IDs in one batch. This takes awhile, on the order of 10 minutes, so we cache the 80MB of data received.

cache_file <- "employment.RData"

if (file.exists(cache_file)) {
    load(cache_file)
} else {
    employment_data <- orcid_employments(ucsb_ids$id)
    save(employment_data, file=cache_file)
}

The return is a hierarchical list of dataframes whose values contain lists of dataframes whose values contain lists of… the R version of JSON? Fortunately, pluck is able to pick out, for each employment affiliation: the organization name; the department name and position title; and the affiliation end date. We then filter for those records that mention some form of the UCSB name. We also filter for those records that don’t have an affiliation end date, the theory being that those represent current employment. (Of course, it’s entirely possible that people neglected to update their ORCID profiles when they left UCSB, just as they might never have added any dates at all.)

df <- reduce(
        map(employment_data, pluck, "affiliation-group", "summaries"),
        bind_rows
    ) %>%
    as_tibble %>%
    mutate(id = str_sub(`employment-summary.path`, 2, 20)) %>%
    select(
        id,
        institution=`employment-summary.organization.name`,
        department=`employment-summary.department-name`,
        title=`employment-summary.role-title`,
        end_date=`employment-summary.end-date.year.value`,
    ) %>%
    filter(institution %in% ucsb_names & is.na(end_date)) %>%
    select(id, department, title) %>%
    arrange(id)

head(df)
id department title
0000-0001-5011-4565 International Center for School Based Youth Development Distinguished Emeritus Professor and Research Professor
0000-0001-5013-0944 Chemical Engineering Associate Specialist
0000-0001-5020-8350 Counseling and Psychological Services Psychologist
0000-0001-5059-1804 Bren School of Environmental Science and Management PhD Candidate
0000-0001-5092-6037 Marine Science Institute Postdoctoral Scholar
0000-0001-5097-8259 NA Postdoctoral Scholar

How many records did we get?

nrow(df)
[1] 1059

I.e., of the 5442 ORCID IDs found that have some kind of UCSB affiliation, 1059 of those reflect current employment at UCSB.

Well that’s not exactly correct because 1059 is the count of employment records returned, not a count of IDs. If somebody has multiple concurrent positions at UCSB they might have multiple employment records. Or, as noted previously, they might have had serial UCSB employments and neglected to add end dates or any dates at all. The number of unique ORCID IDs among the employment records is:

length(unique(df$id))
[1] 1044

So, the vast majority of IDs have one current UCSB employment recorded.

Grouping by department

Let’s group employment records by department to get a sense of the distribution of ORCID IDs across campus. (The table below is scrollable.)

df %>%
    group_by(department) %>%
    summarize(count=n()) %>%
    arrange(desc(count), department, .locale="en")
department count
NA 95
Physics 64
Materials 33
Electrical and Computer Engineering 29
Marine Science Institute 29
Mechanical Engineering 29
Chemistry and Biochemistry 28
Geography 26
Chemical Engineering 24
Computer Science 23
Earth Science 21
Psychological and Brain Sciences 18
Mathematics 16
Ecology, Evolution, and Marine Biology 15
Anthropology 14
Bren School of Environmental Science & Management 14
Earth Research Institute 14
Neuroscience Research Institute 14
Bren School of Environmental Science and Management 13
Communication 13
ECE 13
History 13
Ecology, Evolution and Marine Biology 11
Materials Research Laboratory 11
Political Science 11
Chemistry 10
Linguistics 10
Molecular, Cellular, and Developmental Biology 10
English 9
Library 9
Materials Department 9
Sociology 9
Department of Chemical Engineering 8
MCDB 8
Economics 7
Education 7
Global Studies 7
Molecular, Cellular and Developmental Biology 7
Biomolecular Science and Engineering 6
Department of Chemistry and Biochemistry 6
Department of Geography 6
Film and Media Studies 6
Department of Communication 5
Department of Physics 5
National Center for Ecological Analysis and Synthesis 5
Spanish and Portuguese 5
Statistics and Applied Probability 5
Technology Management 5
Cheadle Center for Biodiversity and Ecological Restoration 4
Chemistry & Biochemistry 4
Counseling, Clinical, and School Psychology 4
Department of Linguistics 4
Philosophy 4
Psychological & Brain Sciences 4
Comparative Literature 3
East Asian Languages and Cultural Studies 3
EEMB 3
Electrical Engineering 3
Environmental Studies 3
Environmental Studies Program 3
Feminist Studies 3
French and Italian 3
History of Art and Architecture 3
Materials Science 3
physics 3
Religious Studies 3
Writing Program 3
Applied Probability and Statistics 2
Asian American Studies 2
Bren School 2
California NanoSystems Institute 2
Center for Polymers and Organic Solids 2
Climate Hazards Center 2
Computer Science 2
Computer Science Department 2
Counseling, Clinical, & School Psychology 2
Department of Earth Science 2
Department of Education 2
Department of Statistics and Applied Probability 2
Ecology, Evolution & Marine Biology 2
Electrical & Computer Engineering 2
Graduate Division 2
History of Art & Architecture 2
Institute for Collaborative Biotechnologies 2
Institute for Social, Behavioral, and Economic Research 2
Institute for Terahertz Science and Technology 2
Kavli Institute for Theoretical Physics 2
Long Term Ecological Research Network 2
Marine Sciences Institute 2
Materials 2
Materials Research Lab 2
Molecular, Cellular & Developmental Biology 2
Molecular, Cellular and Developmental Biology Department 2
Music 2
NRI 2
Institute for Collaborative Biotechnologies 1
1Department of Molecular, Cellular and Developmental Biology 1
Action Lab and Brain Imaging Center 1
Admissions 1
AlloSphere Research Group 1
Anthropology 1
Arctic Data Center 1
Art 1
Benioff Ocean Science Laboratory 1
Bioengineering 1
Bioengineering; Mechanical Engineering; Biomolecular Science and Engineering; Molecular, Cellular, and Developmental Biology 1
BioPACIFIC MIP 1
Black Studies 1
Black Studies, Sociology, Asian Studies 1
BMSE 1
Bren School of Environ. Science & Mgmt. 1
Bren School of Environment and Management 1
Bren School of Environmental Science & Management 1
Bren School of Environmental Science & Management; Marine Science Institute 1
BREN SCHOOL OF ENVIRONMENTAL SCIENCE AND MANAGEMENT 1
California NanoSystems Institute (CNSI) 1
Center for Black Studies Research 1
Center for Control, Dynamical Systems, and Computation 1
Center for Innovative Teaching, Research, and Learning 1
Center for Science and Engineering Partnerships 1
Center for Spatial Studies 1
Center for Spatial Studies [KnowWhereGraph Research Project] 1
Chamistry and Biochemistry 1
Chem & Biochem Dept. 1
Chemical Engineering 1
Chemical Engineering Department 1
Chemistry / Biochemistry 1
chemistry and biochemistry 1
Chemistry and Biochemistry 1
Chemistry&Biochemistry 1
Chemsitry 1
Classics 1
Coal Oil Point Reserve 1
College of Creative Studies, Writing and Literature and The Writing Program 1
College of Electrical and Computer Engineering 1
College of Engineering 1
College of Letters & Science 1
Comparative Literature & French Studies 1
Comparative Literature Program 1
Computer science 1
Computer Science / Psychological & Brain Sciences 1
Computer Science and Physics 1
Counseling and Career Services 1
Counseling and Psychological Services 1
Department of Chemical Engineering, Center for Bio-engineering 1
Department of Chemistry & Biochemistry 1
Department of Chicana and Chicano Studies 1
Department of Computer Science 1
Department of Earth Science and Earth Research Institute 1
Department of Ecology, Evolution, & Marine Biology 1
Department of Geography, Computer Science, Statistics 1
Department of Global Studies 1
Department of Materials 1
Department of Mathematics 1
Department of Mechanical Engineering 1
Department of Molecular, Cellular and Developmental Biology 1
Department of Political Science 1
Department of Psychological and Brain Sciences 1
Department of Psychology 1
Dept of Statistics and Applied Probability 1
Dept. of Electrical and Computer Engineering 1
Deptartment of Geography 1
Derpartment of mathematics 1
Dynamical Neuroscience 1
Earth Research Institute & Department of Geography 1
Earth Science, Biology and Marine Science Institute 1
Earth Science; Marine Science Inst 1
Earth Sciences 1
Ecology evolution and marine biology 1
Ecology Evolution and Marine Biology, Marine Science Institute 1
Ecology, Evolution, & Marine Biology 1
Ecology, Evolution, and Marine Biology 1
Ecology, Evolution, Marine Biology 1
economics 1
Education 1
EEMB/ES 1
Elecrical and Computer Engineering 1
Electrical & Computer Engineering Department 1
Electrical and Computer 1
electrical and computer engineering 1
Electrical and computer engineering 1
Electrical and COmputer Engineering 1
Electrical and Computer Enginering 1
Electrical and computer science 1
Electronic and Computer Engineering 1
Environmental Markets Lab 1
Environmental Sciences/Management 1
ETS 1
Evolution, Ecology, and Marine Biology 1
Film and Media 1
FRIT 1
Geography & Enviromental Studies 1
Geography Department 1
Germanic and Slavic 1
Gevirtz Graduate School of Education 1
Gevirtz Graduate School of Education (GGSE) 1
Gevirtz School of Education 1
Global Studies & Sociology 1
History 1
History & Anthropology 1
Humanities and Social Change Center 1
ICB/CEEM/CHEM ENG 1
IGPMS 1
Information Technology Services 1
Institute for Collaborative Biotechnologies 1
Institute for Energy Efficiency 1
Interdepartmental Graduate Program for Marine Science 1
International Center for School Based Youth Development 1
ISBER 1
Librarian 1
Library 1
Maine Science Institute 1
Marie Science Institute 1
Marine Science 1
Marine Science Institue 1
Marine Science Institute 1
Marine Science Institute- Sustainable Fisheries Group 1
Material Science 1
Materials and Mechanical Engineering 1
Materials Research Laboratory; BioPACIFIC 1
Materials/California NanoSystems Institute 1
MCD Biology 1
MCDB, Housing, Police 1
MCDB/ NRI 1
MCDB/NRI 1
McNair Scholars Program 1
Mechanical engineering 1
Mechanical Engineering 1
Mechanical Engineering and California NanoSystems Institute (CNSI) 1
Mechanical Engineering and Computer Science 1
Mechanical Engineering and Mathematics 1
Media Arts & Technology 1
Media Neuroscience Lab - Department of Communication & Department of Psychological and Brain Sciences 1
Molecular Cell & Developmental Biology 1
Molecular Cellular and Developmental Biology 1
Molecular, Cellular & Developmental Biology / Psychological & Brain Sciences 1
Molecular, Cellular and Developmental Biology 1
Molecular, Cellular and Developmental Biology (MCDB) . UCSB Neuroscience 1
Molecular, Cellular and DEvelopmentla Biology 1
Molecular, Cellular, & Developmental Biology 1
MRL 1
National Center for Ecological Analysis and Synthesis (NCEAS) and Earth Research Institute (ERI) 1
Natural Reserve System 1
Neuroscience Research Institut 1
Neuroscience Research Institute 1
Neuroscience Research Institute; Molecular, Cellular, and Developmental Biology 1
Office of Education Parntership 1
Office of Education Partnerships 1
Office Of Teaching and Learning, Center for Innovative Teaching, Research and Learning 1
Orientation Services 1
Patterson Group 1
philosophy 1
Philosophy Department 1
Physics 1
Physics and Biomolecular Science & Engineering 1
Physics, Materials 1
Political science, sociology 1
Psych and Brain Sciences 1
Psychological & Brain Science 1
Quantum Foundry 1
religious studies 1
Research Data Services 1
Richardson lab 1
Sociology and Global Studies 1
Spanish & Portuguese Department 1
Spanish and Portuguese Department 1
Special Research Collections, UCSB Library 1
Statistics & Applied Probability 1
Sustainable Fisheries Group 1
Technology Management Program 1
Technology Management, College of Engineering 1
Theater and Dance 1
UC Natural Reserve System 1
UCSB Library 1
UCSB Library DREAM Lab 1
UCSB Library Research Data Services 1
University Library 1
Veterans and Military Services 1
Vision Research Lab in Electrical and Computer Engineering 1
Weimbs Laboratory 1


Well, that’s the usual freetext mess. If you look closely, multiple variants of the same department name occur, there are varying abbreviations and typos, multiple departments are listed in the same record, and so forth.

Cleaning up department names

We can clean up the department names using classification against a list of known, good names (obtained mostly from here). For a distance metric we use Levenshtein editing distance.

seen_names = df %>%
    select(name=department) %>%
    drop_na %>%
    filter(str_detect(name, "[a-z]")) %>%  # remove pure acronyms
    mutate(name_lc=str_to_lower(name)) %>%
    mutate(name_lc=str_replace(name_lc, "department( of)?", ""))

good_names <- read_csv("departments.csv") %>%
    mutate(name_lc=str_to_lower(name))

m <- stringdistmatrix(
    seen_names$name_lc,
    good_names$name_lc,
    method="lv"
)

by_row <- 1
seen_names$classified = good_names$name[apply(m, by_row, which.min)]

seen_names %>%
    select(department=classified) %>%
    group_by(department) %>%
    summarize(count=n()) %>%
    arrange(desc(count))

Here’s the cleaned-up list. (The table below is scrollable.)

department count
Physics 73
Chemistry and Biochemistry 46
Materials 46
Electrical and Computer Engineering 43
Chemical Engineering 38
Marine Science Institute 38
Ecology, Evolution, and Marine Biology 35
Mechanical Engineering 35
Geography 34
Computer Science 33
Molecular, Cellular, and Developmental Biology 32
Bren School of Environmental Science & Management 31
Psychological & Brain Sciences 27
Earth Science 26
History 26
Anthropology 18
Communication 18
Earth Research Institute 18
Mathematics 18
Neuroscience Research Institute 17
Materials Reseach Laboratory 15
Library 14
Linguistics 14
Political Science 14
Education 12
Black Studies 10
Sociology 10
Economics 9
English 9
Environmental Studies 9
Statistics and Applied Probability 9
Biomolecular Science and Engineering 8
Film and Media Studies 8
Long Term Ecological Research Network 7
Spanish and Portuguese 7
Center for Spatial Studies 6
Comparative Literature 6
Counseling, Clinical, and School Psychology 6
Institute for Collaborative Biotechnologies 6
Philosophy 6
Technology Management 6
California NanoSystems Institute 5
Cheadle Center for Biodiversity and Ecological Restoration 5
History of Art and Architecture 5
Marine Science 5
Religious Studies 5
Asian American Studies 4
Biology 4
Center for Science and Engineering Partnerships 4
Gevirtz Graduate School of Education 4
Writing Program 4
Chicana and Chicano Studies 3
Counseling & Psychological Services 3
East Asian Languages and Cultural Studies 3
Feminist Studies 3
French and Italian 3
Institute for Energy Efficiency 3
AlloSphere Research Group 2
Classics 2
College of Engineering 2
Institute for Terahertz Science and Technology 2
Music 2
Office of Education Partnerships 2
Art 1
Center for Control, Dynamical Systems, and Computation 1
College of Letters & Science 1
Germanic and Slavic Studies 1
Information Technology Services 1
International Center for School Based Youth Development 1
Media Arts and Technology 1
Quantum Foundry 1
Theater and Dance 1

Grouping by title

We can similarly group the employment records by position title, which might give us a sense of the extent to which different groups of people are using ORCID. Note that role/title isn’t populated as frequently as department in ORCID profiles. (The table below is scrollable.)

df %>%
    group_by(title) %>%
    summarize(count=n()) %>%
    arrange(desc(count))
title count
NA 156
Professor 123
Assistant Professor 76
Associate Professor 44
Graduate Student Researcher 43
Postdoctoral Scholar 34
Graduate Student 33
PhD Student 26
Postdoctoral Researcher 26
Postdoc 21
Teaching Assistant 16
PhD Candidate 13
Distinguished Professor 11
Postdoctoral researcher 11
Postdoctoral Fellow 10
Lecturer 8
Research Associate 8
PhD student 7
Project Scientist 7
Assistant Project Scientist 6
Assistant Researcher 6
Researcher 6
Graduate Researcher 5
Graduate student 5
Ph.D. Student 5
Postdoctoral fellow 5
Postdoctoral scholar 5
Research Assistant 5
Adjunct Professor 4
Assistant Professor 4
Assistant Teaching Professor 4
Associate Specialist 4
Graduate Research Assistant 4
Graduate Student Researcher 4
Graduate Teaching Assistant 4
Postdoctoral Associate 4
Postdoctoral Researcher 4
Professor Emeritus 4
Undergraduate Researcher 4
Doctoral Candidate 3
Postdoc Researcher 3
Postdoctoral Scholar 3
Project Researcher 3
Staff Research Associate 3
Student 3
graduate student 3
professor 3
Assistant Specialist 2
Assistant professor 2
Associate Researcher 2
Associate Teaching Professor 2
Director 2
Doctoral student 2
Full Professor 2
Grad Student 2
Grad student 2
Graduate Student 2
Graduate Student Reseacher 2
Graduate student researcher 2
Junior Specialist 2
Librarian 2
Ph.D. Candidate 2
PhD 2
PhD Candidate 2
Post Doc 2
Post doc 2
Post-doctoral Scholar 2
Postdoc Fellow 2
Postdoc Scholar 2
Postdoctoral Research Fellow 2
Professor 2
Research Fellow 2
Teaching Associate 2
Visiting Assistant Professor 2
postdoc 2
Professor 1
Academic Affiliate 1
Academic Coordinator 1
Admissions Reader 1
Affiliate Academic Scholar 1
Affiliated Researcher 1
Anne and Michael Towbes Graduate Dean 1
Argyropoulos Professor of Hellenic Studies 1
Assc Project Scientist 1
Assistant Professor of Quantitative and Systems Biology 1
Assistant Professor of Teaching 1
Assistant Project Scientist 1
Assistant Research Specialist 1
Assistant Researcher IV 1
Assistant Specialist I 1
Assoc. Professor 1
Associate Dean 1
Associate Dean for Diversity, Equity, and Inclusion 1
Associate Director 1
Associate Professor 1
Associate Professor and Professor 1
Associate Professor of Writing 1
Associate Professor, Professor, Distinguished Professor 1
Associate Research Biologist 1
Associate Research Specialist-II 1
Associate Specialist I 1
Associate Specialist II 1
Associate Specialist, Caselle Lab 1
Associate University Librarian Digital Strategies 1
Associate specialist 1
Associated Research Specialist 1
Center Associate 1
Co-Director Center for Stem Cell Biology and Engineering 1
Community Engagement and Outreach 1
Coordinator, Veterans and Military Services 1
Curator 1
Curator, California Ethnic & Multicultural Archives (CEMA) 1
Data Services Librarian 1
Data Specialist 1
Dean 1
Deputy CIO 1
Digital Humanities Research Facilitator 1
Dirctor 1
Director of Informatics R&D 1
Director of Teaching & Learning 1
Director, California Global Education Project 1
Director, Carpinteria Salt Marsh Reserve 1
Director, Graduate Diversity Programs 1
Director, Network Office 1
Director, Santa Cruz Island Reserve 1
Distinguished Emeritus Professor and Research Professor 1
Distinguished Professor Emerita 1
Distinguished Professor of French 1
Distinguished Visiting Professor 1
Doctoral Student 1
Earth & Env. Sciences Research Facilitator 1
Editor-in-Chief 1
Education Specialist Credential Coordinator 1
Education/Outreach Coordinator 1
Elings Prize Fellow 1
Emeritus Professor 1
Executive Director 1
Facility Manager 1
Front Desk Associate 1
Fulbright Fellow 1
Fulbright visiting postdoc scholar 1
GIS Web Developer 1
GSR 1
Geospatial Data Curator 1
Grad Student Researcher 1
GradStudent 1
Graduate Research Student 1
Graduate Student / Teaching Associate 1
Graduate Student Researched 1
Graduate Student Researcher & Teaching Assistant 1
Graduate Student Researcher (PhD) 1
Graduate Student Researcher/Teaching Assistant and Associate 1
Graduate Student/Teaching Assistant 1
Graduate Teaching Associate & Teaching Assistant 1
Graduate research assistant 1
Graduate student researcher 1
Grauate Research Assistant 1
HPC Specialist 1
Harriman Professor of Neuroscience, Dept of Molecular, Cellular & Developmental Biology, University of California, Santa Barbara 1
Inclusion and Access Officer 1
Independent Contractor/Senior Advisor 1
Instruction & Reference Librarian, Librarian for Communication & News 1
Janed and Ian Duncan Endowed Chair Professor of Actuarial Science 1
K-12 Programs Director 1
Katherine Esau Director 1
Lab Assistant III 1
Lab Manager 1
Latin American/Iberian Studies Librarian 1
Lecturer PSOE 1
Lecturer in Ecology, Applied Marine Ecology, Aquatic Biology 1
Lecturer of French 1
Lecturer w Security of Employment 1
Lecturer/Organic Chemistry Lab Director 1
Librarian (previously Associate Librarian; multiple positions) 1
Marie Curie Post-Doctoral Fellow 1
Mellichamp Chair and Distinguished Professor 1
Mellichamp Professor of Global Governance 1
Microscopy Facility Director 1
Monitoring Coordinator 1
Moore Postdoctoral Fellow 1
Moving Image Collections Curator and Film and Media Studies Librarian 1
Museum scientist/botanical research 1
NASA Research Fellow 1
NSF Postdoctoral Fellow 1
NSF Postdoctoral Researcher 1
NSF Research Fellow, PhD Candidate 1
POSTDOC 1
Ph.D Candidate 1
Ph.D. Candidate 1
PhD Candidate / Researcher / Teaching Assistant 1
PhD Student 1
Post Doctor 1
Post Doctoral Fellow 1
Post doc fellow 1
Post doctoral fellow 1
Post doctoral researcher 1
Post-Doc 1
Post-Doc and Lecturer 1
Post-Doctoral Fellow 1
Post-Doctoral Research Scholar 1
Post-Doctoral Researcher 1
Post-Doctorant 1
Post-doctoral fellow 1
Post-doctoral scholar 1
PostDoc 1
Postdoctoral Fellow 1
Postdoctoral Investigator 1
Postdoctoral Researcher - CONACyT (Mexico) Fellow 1
Postdoctoral felow 1
Postdoctoral research scholar 1
Postdoctoral research scientist 1
Postodc 1
Principal Experimentalist 1
Professor & Graduate Director 1
Professor & Mellichamp Chair in Racial Environmental Justice 1
Professor / Chair 1
Professor and Chair 1
Professor and Kundan Kaur Kapany Chair for Sikh and Punjab Studies 1
Professor emeritus 1
Professor of French 1
Professor of Physics 1
Professor of Sociology 1
Professor of Technology Management 1
Professor of physics 1
Professor, 1971– present 1
Professor, Dangermond Chair in Conservation Science 1
Program Director 1
Project scientist 1
Psychologist 1
Psychologist II 1
Research & Engagement Librarian 1
Research Affiliate 1
Research Assistant 1
Research Assistant for Aline Ferreira 1
Research Assistant, Miller Memory Lab 1
Research Biologist 1
Research Computing Specialist 1
Research Ecologist 1
Research Facilitator - Social Sciences 1
Research Faculty 1
Research Geophysicist 1
Research Intern 1
Research Professor 1
Research Professor and Emeritus 1
Research Scientist 1
Research Seismologist 1
Research Specialist 1
Research assistant, Sprague Perception, Cognition, and Action Lab 1
Research fellow 1
Resource Manager 1
SRAIII 1
STEM Education Coordinator 1
Science Software Engineer 1
Senior Evaluation Strategist 1
Senior Fellow 1
Senior Manager 1
Senior Project Scientist 1
Senior lecturer 1
Software Engineer 1
Spanish foreign language instructor 1
Specialist III 1
Specialist Research Scientist 1
Staff Research Associate IV / Lab Manager 1
Staff Researcher 1
TA 1
TA / Associate 1
TA/GSR 1
Teaching Assistant 1
Teaching Assistant /Graduate Student Researcher 1
Teaching Assistant/Associate 1
Teaching Associate / Graduate Student Researcher 1
Teaching Professor 1
Teaching/ Research Assistant 1
Tenured Full Professor 1
USGS Mendenhall Postdoctoral Fellow 1
Undergraduate Research Assistant 1
University Librarian 1
Visiting Associate Professor 1
Visiting Researcher 1
Visiting Scholar/Researcher 1
Visiting assistant professor 1
Wilson Lab Researcher and Engineer 1
graduate Student 1
post-doc 1
postdoctoral scholar 1
professor emeritus 1
project scientist 1
research scientist 1
visiting scholar 1


Let’s consolidate these varying descriptions into a few broad categories as follows.

match <- Vectorize(
    function(title, patterns) {
        # Return TRUE if `title` matches any of the given patterns
        any(
            map_lgl(
                patterns,
                \(p) str_like(title, paste("%", p, "%", sep=""))
            )
        )
    },
    "title"
)

df %>%
    drop_na(title) %>%
    mutate(
        category=case_when(
            match(
                title,
                c("professor", "lecturer", "instructor", "dean")
            ) ~ "faculty",
            match(
                title,
                c("student", "graduate", "teaching", "TA", "PhD",
                  "candidate")
            ) ~ "student",
            match(
                title,
                c("post", "fellow")
            ) ~ "postdoc",
            match(
                title,
                c("research", "specialist", "scientist", "director",
                  "coordinator", "librarian", "curator", "associate",
                  "manager", "engineer", "developer")
            ) ~ "staff",
            .default="other"
        )
    ) %>%
    group_by(category) %>%
    summarize(count=n()) %>%
    arrange(desc(count))
category count
faculty 341
student 273
postdoc 176
staff 98
other 15