Data collection and measurement
Our data collection approach focuses on browser activity data, which provide important advantages relative to the history data that are provided by the web browser’s WebExtension Application Programming Interface (API). The browser APIs report the time when a given web page was first opened and the time when a user makes a transition from that page to another page (e.g., by clicking a link). To account for duplicate data, we dropped additional page views of the same URL within 1 s of the prior page view on the assumption that the user refreshed the page (
43). However, the APIs do not report the total dwell time on a given web page taking into account changes in the active browser tab. For example, if someone opens web page A in a tab, then opens web page B in another tab, and then switches their browser tab back to A, the browser history APIs will not register this shift in attention, making it difficult to obtain accurate estimates of time spent on a given web page. Our passive monitoring records all changes in the active tab, allowing us to overcome this issue. (In the Supplementary Materials, we validate our browser activity data against browser history data from the extension.)
In this article, we describe YouTube “views,” “consumption,” and “exposure” using the browser activity data described above. As with any passive behavioral data, we cannot verify that every user saw the content that appeared on their device in every instance.
We measured the amount of time a user spent on a given web page by calculating the difference between the timestamp of the page in question and the next one they viewed. This measure is imperfect because we do not have a measure of eye gaze or a proxy for active viewing. Although some participants might rewind and rewatch videos more than once, we are more concerned about our measure overstating watch time due to users leaving their browser idling. We therefore refine this measure by capping our measure of time spent at the length of the video in question (obtained from the YouTube API).
We measure which channels users subscribed to by extracting additional information from the HTML snapshots of the videos they watched. Specifically, we parsed the subscribe button from each HTML snapshot, which reads “subscribe” when the participant was not subscribed to the video channel at the time the video was watched and “subscribed” when they were already subscribed. Because we must use this indirect method to infer channel subscriptions, we do not know the full set of channels to which participants subscribe. In particular, not all recommended videos in our dataset were viewed by participants. As a result, we could not determine the subscription status for all recommended videos.
We denote the web page that a participant viewed immediately before viewing a YouTube video as the referrer. We are unable to measure HTTP referrer headers using our browser extension, so, instead, we rely on browser activity data to identify referrers to YouTube videos. Using prior browsing history is a common proxy used to analyze people’s behavior on the web (
33,
44).
All analyses of the percentage of recommendations seen or followed are based on the full set of recommendations that we could extract from each video. The mean number of recommended videos captured was 17.9, and the median was 20, which aligns with the default number of recommendations shown on a YouTube video (
20) at the time our study was conducted.
Channel definitions and measurement
Following studies of information consumption online that rely on ratings of content quality at the domain level (
32,
33), we construct a typology of YouTube channel types to measure participant exposure. Given that YouTube has tens of millions of channels and that the types of content we are interested in a relatively rare, it is necessary to rely on the judgment of experts to help us identify alternative, extremist, and mainstream media channels. We use the resulting channel lists to classify all videos to which our participants are exposed as coming from an alternative channel, an extremist channel, a mainstream media channel, or some other type of channel (“other”). The process by which these channel lists were defined and compiled is described further below; the Supplementary Materials provide more detail on the procedures used by these experts to label channels.
In our typology, alternative channels discuss controversial topics through a lens that attempts to legitimize discredited views by casting them as marginalized viewpoints (despite the channel owners often identifying as white and/or male). Our list combines the 223 channels classified by Ledwich and Zaitsev (
26) as Men’s Rights Activists or Anti-Social Justice Warriors, the 141 Intellectual Dark Web and Alt-lite channels from Ribeiro
et al. (
24), and the 24 channels from Lewis’ Alternative Influence Network (
35). After removing duplicates, our alternative channel list contains 322 channels, of which 68 appeared on two source lists, and nine appeared on three. Example alternative channels in our typology include those hosted by Steven Crowder, Tim Pool, Laura Loomer, and Candace Owens. Joe Rogan’s is the most prominent alternative channel in our typology (it appears on all three source lists), accounting for 11.6% (95% CI, 11.3 to 12.0) of all visits and 26.0% (95% CI, 26.0 to 26.1) of all time spent on alternative channel videos.
Our list of extremist channels consists of those labeled as white identitarian by Ledwich and Zaitsev (
26) (30 channels), white supremacist by Charles (
45) (23 channels), alt-right by Ribeiro
et al. (
24) (37 channels), extremist or hateful by the Center on Extremism at the Anti-Defamation League (16 channels), and those compiled by journalist Aaron Sankin from lists curated by the Southern Poverty Law Center, the Canadian Anti-Hate Network, the Counter Extremism Project, and the white supremacist website Stormfront (157 channels) (
46). After removing duplicates, our extremist channel list contains 290 channels, of which 36.2% appeared on two or more source lists. Example extremist channels include those hosted by Stefan Molyneux, David Duke, Mike Cernovich, and Faith J. Goldy.
As the examples above suggest, the potentially harmful alternative and extremist channels identified by scholarly and subject matter experts are predominantly from the (far) right in the United States. Other forms of extremism exist, of course, especially outside the United States (e.g., Islamic extremism).
Following prior research, we define both alternative and extremist channels as potentially harmful (
2,
26,
35,
45). Of the 302 alternative and 213 extremist channels that were still available on YouTube as of January 2021 (i.e., they had not been taken down by the owner or by YouTube), videos from 208 alternative and 55 extremist channels were viewed by at least one participant in our sample. We are not making these lists publicly available to avoid directing attention to them but are willing to privately share them with researchers and journalists upon request.
To create our list of mainstream media channels, we collected news channels from Buntain
et al. (
47) (65 mainstream news sources), Ledwich and Zaitsev (
26) (75 mainstream media channels), Stocking
et al. (
48) (81 news channels), Ribeiro
et al. (
24) (68 popular media channels), Eady
et al. (
49) (219 national news domains), and Zannettou
et al. (
50) (45 news domains). We manually found the corresponding YouTube channels via YouTube search when authors only provided websites (
24,
36,
49). In cases where news organizations have multiple YouTube channels (e.g., Fox News and Fox Business), all YouTube channels under the parent organization were included. Any channels appearing in fewer than three of these sources were omitted. Last, we also included channels that were featured on YouTube’s
www.youtube.com/channel/UCYfdidRxbB8Qhf0Nx7ioOYw News hub from 10 February to 5 March 2021.
The resulting list of mainstream media channels was then checked to identify those that meet all of the following criteria:
1) They must publish credible information, which we define as having a NewsGuard score greater than 60 (
www.newsguardtech.com) and not being associated with any “black” or “red” fake news websites listed in Grinberg
et al. (
32).
2) They must meet at least one criteria for mainstream media recognition or distribution, which we define as having national print circulation, having a cable TV network, being part of the White House press pool, or having won or been nominated for a prestigious journalism award (e.g., Pulitzer Prize, Peabody Award, Emmy, George Polk Award, or Online Journalism Award).
3) They must be a United States–based organization with national news coverage.
Our final mainstream media list consists of 127 YouTube channels. We then placed all YouTube channels in our dataset that did not fall into one of these three categories (alternative, extremist, or mainstream media) into a residual category that we call “other.” (These may include alternative, extremist, or mainstream media that were missed by the processes described above.)
Survey measures of racial resentment and hostile sexism
We measure anti-Black animus with a standard four-item scale intended to measure racial resentment (
40). For example, respondents were asked whether they agree or disagree with the statement “It’s really a matter of some people just not trying hard enough: If blacks would only try harder, they could be just as well off as whites.” Responses are provided on a five-point agree/disagree scale and coded such that higher numbers represent more resentful attitudes. Respondents’ racial resentment score is the average of these four questions. Responses to these questions are taken from respondent answers to the 2018 CCES (as noted above, participants were largely recruited from the pool of previous CCES respondents).
We operationalized hostile sexism using two items from a larger scale that was also asked on the 2018 CCES (
41). For example, one of the questions asks whether respondents agree or disagree with the statement “When women lose to men in a fair competition, they typically complain about being discriminated against.” Responses are provided on a five-point agree/disagree scale and coded such that higher numbers represent more hostile attitudes.
All other question wording is provided in the survey codebook in the Supplementary Materials. Racial resentment and hostile sexism measures were also included in our 2020 survey; responses showed a high degree of persistence over time [
r = 0.92 (95% CI, 0.91 to 0.92)] for racial resentment,
r = 0.79 (95% CI, 0.78 to 0.81) for hostile sexism. The two measures, which we refer to as measuring “resentment” or identifying “resentful” users per, e.g., Banda and Casses (
51) and Schaffner (
52), were highly correlated with each other as well (
r = 0.84).