“Is it a Qoincidence?”: An Exploratory Study of QAnon on Voat

Online fringe communities offer fertile grounds to users seeking and sharing ideas fueling suspicion of mainstream news and conspiracy theories. Among these, the QAnon conspiracy theory emerged in 2017 on 4chan, broadly supporting the idea that powerful politicians, aristocrats, and celebrities are closely engaged in a global pedophile ring. Simultaneously, governments are thought to be controlled by “puppet masters,” as democratically elected officials serve as a fake showroom of democracy. This paper provides an empirical exploratory analysis of the QAnon community on Voat.co, a Reddit-esque news aggregator, which has captured the interest of the press for its toxicity and for providing a platform to QAnon followers. More precisely, we analyze a large dataset from /v/GreatAwakening, the most popular QAnon-related subverse (the Voat equivalent of a subreddit), to characterize activity and user engagement. To further understand the discourse around QAnon, we study the most popular named entities mentioned in the posts, along with the most prominent topics of discussion, which focus on US politics, Donald Trump, and world events. We also use word embeddings to identify narratives around QAnon-specific keywords. Our graph visualization shows that some of the QAnon-related ones are closely related to those from the Pizzagate conspiracy theory and so-called drops by “Q.” Finally, we analyze content toxicity, finding that discussions on /v/GreatAwakening are less toxic than in the broad Voat community.


INTRODUCTION
Conspiracy theories typically credit secret organizations or cabals for controversial, world-changing events [55]; in many cases, they posit that important political events or economic and social trends are the product of deceptive plots mostly unknown to the general public. A prominent example relates to the disappearance of Malaysia Airlines Flight MH370, which is alleged to have been taken over by hijackers and flown to Antarctica [38].
The ability to find like-minded people, at scale, on social media platforms has helped the spread of conspiracy theories, and especially politically oriented ones. For instance, "Pizzagate" [16] emerged during the 2016 US presidential elections, claiming that Hillary Clinton was involved in a pedophile ring. Even when widely debunked, conspiracy theories can help motivate detractors and demotivate supporters, thus potentially threatening democracies.
Over the past few years, the "QAnon" conspiracy has emerged on the anonymous Politically Incorrect (/pol/) board of 4chan. In October 2017, a user going by the nickname "Q" posted numerous threads claiming to be a US government official with a topsecret Q clearance [5]. They explained that Pizzagate was real and that many celebrities, aristocrats, and elected politicians are involved in this vast, satanic pedophile ring. Q further claimed that President Donald Trump is actively working against a satanic pedophile cabal within the US government. QAnon incorporates many theories together into a broadly defined super-conspiracy theory. QAnon adherents also believe that many world events, including the COVID-19 pandemic, are part of a sinister plan orchestrated by "puppet masters" like Bill Gates [24]. Zuckerman [75] argues that QAnon supporters create a vast amount of material that eventually becomes viral. E.g., the book "QAnon: An Invitation to a Great Awakening" [73], written by QAnon followers, ranked second on the Amazon best-selling books list [70].
After Reddit banned QAnon-related subreddits in September 2018 [64,67], QAnon followers reportedly migrated to Voat.co. Voat is a news aggregator, structured similarly to Reddit, where users subscribe to different channels of interest known as "subverses." Newcomers are not allowed to create new submissions, but can upvote or downvote submissions and comments, and comment on existing submissions. Once users manage to get ten upvotes on their comments, they can create new submissions to any subverse.
As with many "fringe" platforms (e.g., Gab), Voat was designed and marketed vigorously around unconditional support of freedom of speech against the alleged anti-liberal censorship perpetrated by mainstream platforms. A year after its creation, HostEurope.de stopped hosting Voat because of the content posted [56] and, shortly after, PayPal froze their account [66]. In August 2015, Voat was thrust into the spotlight when Reddit banned various hateful subreddits (e.g., /r/CoonTown and /r/fatpeoplehate [33,72]) and a large number of users reportedly migrated over [36,37,62]. The platform shut down in December 2020, with the owner explaining in a post that he "cannot keep up. " 1 Research Questions. In this paper, we focus on the QAnon-focused community on Voat. More specifically, we set out to answer the following research questions: RQ1: How active is the QAnon movement on Voat? RQ2: Which words and topics are most prevalent for and best describe the QAnon movement on Voat? What narratives are shared and discussed by QAnon adherents? RQ3: How toxic is content posted on QAnon subverses? How does it compare to popular subverses focusing on general discussion?
Methodology. To address RQ1, we provide a temporal analysis of the most popular QAnon-focused subverse, /v/GreatAwakening, in comparison to a baseline of four of the most popular subverses (in terms of posting activity) focusing on general discussion: /v/news, /v/politics, /v/funny, and /v/AskVoat. 2 We also analyze submission engagement and user activity. Then, we use named entities recognition, topic detection, and word embeddings, along with graph representations of QAnon-specific keywords, to define the narratives around the QAnon movement (RQ2). Finally, to study toxicity within these communities (RQ3), we use Google's Perspective API [41] to measure how toxic the posts in our dataset are.
Main Findings. Our work provides a first characterization of the QAnon community on Voat, through the lens of /v/GreatAwakening. This subverse attracts many more daily submissions than the four (popular) baseline subverses. Indeed, users tend to be quite engaged, with two of the most active QAnon submitters creating over 3.75% of the submissions of the baseline subverses as well. Also, we analyze user profile data and find that over 17.6% (2.3K) unique users registered a new account on Voat when Reddit banned QAnon subreddits in September 2018. Using word embeddings, we visualize words closely related to QAnon-specific keywords. The movement still discusses, among others, its predecessor conspiracy theory Pizzagate, the posts by the user Q, and other social media. The most prominent discussion topics are centered around the US, political matters, and world events, while the most popular named entity of the discussion is Donald Trump. Finally, we find that the QAnon community on /v/GreatAwakening is 16.6% less toxic than on baseline subverses.

BACKGROUND
In this section, we discuss the history, origins, and beliefs of the QAnon movement. We also provide a high-level explanation of the main functionalities and features of Voat. 1 https://searchvoat.co/v/announcements/4169936 2 As discussed later in Section 3, we also identify 16 other subverses related to QAnon but find them to be inactive; thus, we only focus on /v/GreatAwakening.

QAnon
Origins. QAnon originates from posts by an anonymous user with the nickname Q. On October 28, 2017, Q posted a new thread with the title "Calm before the Storm" on 4chan /pol/. In that thread, and over many subsequent cryptic posts, Q claimed to be a government insider with Q-level security clearance. 3 The user declared to have got their hands on documents related to, among other things, the struggle over power involving Donald Trump, Robert Mueller, the so-called "deep state, " and the pedophile ring that Hillary Clinton supposedly ran [57]. The deep state is believed to be a secret network of powerful and influential people (including politicians, military officials, and others that have infiltrated governmental entities, intelligence agencies, etc.), that allegedly controls policy and governments around the world behind the scenes, while officials elected via democratic processes are merely puppets. Q claims to be a combatant in an ongoing war, actively participating in Donald Trump's crusade against the deep state [45].
Ongoing activities. Q has continued to drop "breadcrumbs" on 4chan and 8chan, giving birth to a community named after the anonymous (anon) user's nickname, "QAnon, " devoted to decoding Q's cryptic messages. This allows them to figure out the real truth about the evil intentions of the deep state, pedophile rings run by aristocrats, and updates on the war Donald Trump was waging. Although initially this movement was mostly confined to a small group [57], it has since grown substantially via mainstream social networks like Facebook, Reddit, and Twitter and many QAnon adherents around the world have staged protests [6,35].
Relevance. Sternisko et al. [52] and Schabes [49] argue that conspiracy theories, including QAnon, are extremely dangerous for democracies. Government officials and media often start or promote such conspiracy theories to benefit their political agendas and interests. For instance, at a Trump 2020 rally, the person that introduced Donald Trump used the QAnon motto "where we go one, we go all" to conclude his speech [50]. During the 2020 US Congressional elections (November 3), about 25 US Congressional candidates that somehow expressed their support for the conspiracy appeared on ballots. From those candidates, two elected US House Representatives publicly endorsed the QAnon movement [18].
Notably, before the 2020 US Presidential Elections, the FBI described the QAnon movement as a domestic terror threat [50], and its followers as "domestic extremists. " In fact, on January 6th, a pro-Trump mob stormed the US Capitol claiming that "Q sent them." The insurrection resulted in five deaths [60]. QAnon followers have been arrested for various crimes, including vandalizing churches as the Catholic church allegedly supports human trafficking, kidnapping children to save them from pedophiles, and attempts to murder Canadian Prime Minister Justin Trudeau [58]. Overall, the history of violence surrounding the movement demonstrates that its radicalized followers pose a real danger.
QAnon on social networks. Mainstream social networks like Reddit, Twitter, and Facebook have set to ban QAnon-related groups and conversations. Reddit banned numerous subreddits devoted to QAnon discussion in 2018 [34,53,67], then, Twitter put restrictions on 150K user accounts and suspended over 7K others that promoted this conspiracy theory. Twitter also reported that they would stop recommending content linked to QAnon [4,65]. In October 2020, Facebook banned QAnon conspiracy theory content across all their platforms [2], with YouTube following shortly thereafter [59].

Voat
Voat was a news aggregator launched in April 2014, initially under the name "WhoaVerse" and renamed to Voat in December 2014. As mentioned, the platform shut down on December 25, 2020.
Main features. Areas of interest, called "subverses, " serve to group posts on Voat. Similar to Reddit, users can register new subverses on Voat, but this functionality was disabled in June 2020. When a user registers a new subverse, they become the owner of the subverse. They can delete it and nominate moderators and co-owners, who can in turn then delete comments and submissions. Voat limits the number of subverses a user may own or moderate to prevent a single user from gaining outsized influence. Newcomers can subscribe to subverses of interest, see, vote, and comment on submissions, but are ineligible to post new submissions at this point. Voat users refer to themselves as "goats," due to the platform's mascot that resembles an angry goat.

Submissions.
A user can create a new submission by posting a title and a description or sharing a link and a description. If sharing a link, the title of the submission becomes a hyperlink to the source website. The source website also appears next to the submission's title, along with the username of the user that posted the submission. Some subverses allow users to post anonymously. Other users can then comment on the submission and comments of other users. Also, users can "upvote" or "downvote" the submission or other user's comments. Submissions and comments may have a negative vote rating based on the votes they receive from users. A user becomes eligible for posting new submissions only if their Comment Contribution Points (CCP) is equal or greater than ten. The upvotes a user receives are added towards their CCP, while downvotes are subtracted. Note that users lose their eligibility to post new submissions once their CCP falls under ten.
Ephemerality. Each subverse has a limit of 500 active submissions at a time: up to 25 submissions in 20 pages (page 0 to page 19). When a user creates a new submission on Voat, it appears first on page 0, i.e., the subverse's home page. At the same time, the submission at the end of page 19, usually the one with the least recent comment, is archived. That submission is still reachable, but only if one knows its direct link; no new comments can be posted to it as it is archived. When a submission gets a new comment, it is bumped to the top of page 0, no matter when the submission was originally posted, similar to 4chan's "bumping system" [39]. However, it is not clear when submissions on Voat stop being bumped when they get new comments.

DATA COLLECTION
This section presents our data collection methodology and dataset.
Subverses. Our first step is to identify Voat subverses that are related to the QAnon movement. To do so, we start from several articles from the popular press [13,64,72], which highlight how a few subreddits banned from Reddit re-emerged on Voat. This happened for QAnon-related subreddits as well [34,53,67]; thus, we search for subverses with the same and similar names as the banned subreddits. We identify 17 subverses and, upon manual inspection, confirm that they are indeed devoted to QAnon-related discussions. However, we find that 16 out of 17 are essentially inactive, with less than 800 total posts over almost five months. Therefore, we focus on the most active QAnon subverse, /v/GreatAwakening.
We also use the four most active subverses as a baseline dataset. More precisely, we select the top four, in terms of posts, from the top-10 most subscribed subverses: /v/news, /v/politics, /v/funny, /v/AskVoat. In the rest of the paper, we refer to these four generaldiscussion subverses as the "baseline subverses. " Crawling. We start crawling the five subverses on May 28, 2020, using Voat's JSON API 4 , and stop on October 10, 2020. Voat does not list the archived submissions that fall out of the 20 pages limit, but, as mentioned, these submissions are still reachable if one knows the direct link to it, i.e., the subverse posted in and the submission ID. A manual inspection of the submission IDs in our database indicates that the submission IDs are monotonically increasing, and thus it is technically possible to collect submissions that fall out of the 20 pages limit by using submission IDs smaller than the ones we collect on the first day that our data collection infrastructure started operating. If the submission ID does not exist within the subverses we are interested in, the API will return a 404, and thus we could indeed enumerate through all possible submissions. That said, doing this would require millions of requests to the Voat API, the majority of which would be 404s placing excessive load on their servers, and, if we followed the Voat API usage limits, it would take several years to enumerate through all the possible submissions.
Hence, we use the following methodology to collect all the submissions' comments, focusing only on data posted after May 28, 2020, inclusive. For each subverse, our crawler continuously requests the submission pages from 0 to 19. We obtain each submission ID, and query the Voat API again to collect the comments posted on that submission. Voat's API returns only up to 25 comments at a time (aka comment segments) for a given submission. Next, we note that Voat has a hierarchical, tree-like commenting system, similar to Reddit, with some submissions resulting in branching threads of varying depth. Thus, to ensure we collect all comments on a submission, our crawler implements a depth first search (DFS) algorithm starting with the comments returned by the first request to the API, and then iteratively query for any child comments they might have. For each of the children discovered, we query for their children until we fully explore the submission's comment tree. The primary reason we went with a DFS implementation over breadth first search (BFS) implementation is due to the Voat API returning comment segments: a DFS simply required a bit less bookkeeping and is a more natural fit considering we are not guaranteed to get all comments at a given level with a single request. The crawler revisits the pages of every subverse, looking for new submissions, or updates on the ones already collected, numerous times per day, ensuring the collection of the full state of submissions before they fall off the page 19 limit.  Dataset. Table 1 lists the number of posts (submissions and comments) we collect for each subverse analyzed in this study. Our dataset spans posts from May 28 to October 10, 2020. Alas, our dataset is missing some posts between June 9 and June 13 due to failure of our data collection infrastructure. Besides submissions and comments, we also collect publicly accessible user profile data. More specifically, we collect profile data of the users posting a submission or a comment on /v/GreatAwakening and baseline subverses listed in Table 1. In total, we find 4.9K, 6.2K, 5.6K, 4.9K, and 4.2K usernames that have either created a submission or made a comment in /v/GreatAwakening, /v/news, /v/politics, /v/funny, and /v/AskVoat, respectively. The union of these results in 15K unique usernames, with 13K of these usernames having accessible profiles. The remaining ∼2K (13.16%) of usernames we query result in a 404 error, which we believe is due to deleted or deactivated profiles. Ethics. We only collect openly available data and follow standard ethical guidelines [43]. We do not attempt to identify users or link profiles across platforms. Finally, the collection of data analyzed in this study does not violate Voat API's Terms of Service.

GENERAL CHARACTERIZATION
This section analyzes aggregate and user-specific activity, content engagement, and registrations for all subverses in our dataset.

Posting Activity
We start by looking at how often submissions and comments are posted on the collected subverses. Figure 1(a) plots the number of daily submissions for the baseline and /v/GreatAwakening subverses (note log-scale on the y-axis). From the figure, we see that over 4.5 months, /v/GreatAwakening has more submissions than the individual baseline subverses, with about 100 new submissions per day, on average. The next most active subverse is /v/news, with about 70 new submissions per day. This is remarkable considering that, as of October 2020, /v/GreatAwakening has only 20K subscribers, while /v/news has 100K. When looking at comment activity ( Figure 1(b)), /v/news and /v/GreatAwakening are close, with 1.06K and 1.01K comments per day, respectively. We observe a peak in submission and comment posting activity on /v/GreatAwakening between June 29 and July 3, with the most submissions on July 2 (185 submissions and almost 1.9K comments). Manual inspection indicates the peak in submission activity may be related to Jeffrey Epstein's ex-girlfriend, Ghislaine Maxwell, being arrested by the FBI [3]. Another peak in posting activity appears between August 10 and August 21, with a peak of 183 submissions

Engagement
Next, we look at user engagement. Next, we look at how often users upvote and downvote submissions. In Figure 2(b), we plot the CDF of upvotes, downvotes, and net votes (e.g., upvotes -downvotes) the submissions get. On average, /v/GreatAwakening gets 57.4 upvotes and 0.9 downvotes, while on baseline subverses, we find 61 upvotes and 1.5 downvotes. The most upvoted submission has 537 and 870 upvotes on /v/GreatAwakening and baseline subverses, respectively, while the most disliked submission has 37 downvotes on /v/GreatAwakening, and 114 downvotes in the baseline subverses. Specifically, the title of the most upvoted /v/GreatAwakening submission is "The United States of America will be designating ANTIFA as a Terrorist Organization" and it links We observe that 62.4% and 50.5% of the /v/GreatAwakening and baseline submissions, respectively, have more than 20 upvotes. On the contrary, only 0.46% and 1.79% of the submissions on /v/GreatAwakening and baseline subverses get more than 10 downvotes. We also run a two-sample Kolmogorov-Smirnov (KS) test on the distributions of upvotes, downvotes, and net votes, and reject the null hypothesis that there is no difference between the distributions (p < 0.01 for all comparisons).
Similarly, we plot the CDF of the number of upvotes and downvotes of comments in Figure 2(c). On average, comments get 2.2 upvotes and 0.18 downvotes on /v/GreatAwakening. Comments of the baseline subverses get 2.8 upvotes and 0.35 downvotes, on average. Again, we find statistically significant differences between the distributions via the two-sample KS test (p < 0.01).
Overall, this shows that users of both communities tend to vote the content they encounter positively. Baseline subverses' posts tend to be downvoted and upvoted more often than the /v/GreatAwakening posts. This is probably due to the significant difference in audience between the two communities. Notably, both communities seem to be engaging towards commenting and voting the posts they encounter on the platform.

User Activity
Next, we focus on user profile data to understand how often users post new submissions. More specifically, we investigate whether the audience of /v/GreatAwakening and baseline subverses consume information from specific users due to Voat not allowing newcomers posting new submissions unless they achieve a CCP above 10.
To do so, we count the number of submissions users posted on /v/GreatAwakening and the baseline subverses. We find that only 346 users made the 13.5K submissions of /v/GreatAwakening. The 21.9K submissions of the baseline subverses were made by 1.8K users. Figure 3 reports the top 15 submitters and commenters of both communities. To protect users' privacy, we replace the original usernames with "user1, " "user2, " etc.
We observe that the top submitter, "user1" in Figure 3    as "All Others" in the figure) are responsible for 28.2% (3.8K) of the submissions made on /v/GreatAwakening. This is not the case for submissions of general discussion as the top 15 submitters together are only responsible for 26.8% (5.8K) of the total submissions, as depicted in Figure 3(c). Excluding the top 15 commenters, /v/GreatAwakening (Figure 3(b)) and baseline subverses (Figure 3(d)) comment activity seems to fall on the broader audience of the communities since "All Others" post 80.9% (112K) and 92% (308.5K) of all the comments, respectively.
Manual inspection of our dataset shows that 22.8% (3K) usernames overlap between /v/GreatAwakening and the baseline subverses. Namely, "user8" and "user9" are amongst the top submitters of both communities, and "user30" ranks the first commenter in both. Our results suggest that the audience of /v/GreatAwakening (20K subscribers) consumes content and submissions from a handful of users (349 submitters), and to a great extent, from "user1. "

User Registrations
We also analyze all users' registration dates to understand when they registered a new account on Voat. Since 2015, online press outlets have reported that communities banned from Reddit often migrate to Voat [22,29,67]; thus, we investigate whether Voat user registrations increase when Reddit bans communities. During our data collection period, over 15K users posted a submission or a comment on the subverses. Also, 13.16% (2K) of these users deactivated their account, or their account was deleted by Voat, due to 404 errors our crawler received from Voat's API.  [53,64,67]. We also observe another spike in user registration in both communities between June and July 2015, probably due to Reddit banning hate-focused subreddits [33,61,72].
Although our dataset might not represent Voat's user base as a whole, it indicates the dates users decided to join the platform. Looking only at users engaged in baseline subverses (Figure 4(b)), we confirm that Voat received a high volume of new user registrations close to the periods of Reddit banning hateful subreddits and QAnon related subreddits. Future work, in conjunction with Reddit data, might help shed more light on the effect of Reddit deplatforming and consequent user migration.

Take Aways
Overall, this section answers our RQ1, i.e., how active is the QAnon movement on Voat? The most popular QAnon-focused subverse, /v/GreatAwakening, attracts many more submissions than the baseline subverses, despite the latter are among the top 10 most popular on the platform for number of subscribers. Also, /v/GreatAwakening has always more than 50 new submissions per day, with that number steadily increasing over time and staying above 100 new submissions per day since September 25, 2020. Whereas the number of daily submissions stays in the same margins for the baseline subverses, except for /v/AskVoat, where we observe a decline in posting activity.
Moreover, both communities' audiences tend to comment on and upvote the submissions and comments they see in the subverse. Also, the audience of /v/GreatAwakening consumes information from just a handful of users, while top submitters and commenters seem to overlap between /v/GreatAwakening and the baseline subverses. Finally, we show that new user registrations peaked after Reddit banned hateful and QAnon subverses in June 2015 and in September 2018, respectively.

NARRATIVE ANALYSIS
In this section, we shed light on the QAnon movement's narrative on Voat, aiming to answer RQ2. We explore the topics that /v/GreatAwakening discusses, and detect the most popular entities they mention using entity detection. Finally, we use word embeddings and graph representations to visualize keywords most similar to "qanon." We warn readers that some of the content presented and discussed in this section may be disturbing.

Topics
We analyze the most prominent topics on our dataset by running Latent Dirichlet Allocation (LDA) [7] on the text included in both the title and the body of all submissions as well as their comments. For every post, we remove all the URLs, stop words (e.g., "like," "to, " "and"), and formatting characters, e.g., \n, \r. Then, we tokenize each post and analyze it to detect bigrams and include them in our corpus. We do this as previous work suggests that bigrams improve the accuracy of topic modeling [71]. Last, we create a term-frequency inverse-document frequency (TF-IDF) array to fit an LDA model. We use a TF-IDF array instead of the default LDA approach as TF-IDF statistically measures every word's importance within the overall collection of words. More importantly, previous work suggests it yields more accurate topics [30]. We use guidelines from Li [54] to build the LDA model.
To measure the appropriate number of topics for our model, we calculate the coherence value (c_v) of the model for topic numbers between 4 and 20 with step 1 [44]. For /v/GreatAwakening, the best coherence score is 0.385 with 8 topics, while for the baseline subverses, the highest coherence score is 0.357 with 10 topics.
In Table 2, we list the words per topic, along with their weights, discussed on both /v/GreatAwakening and the baseline subverses. For /v/GreatAwakening, users tend to discuss the US Presidential Table 2: LDA analysis of /v/GreatAwakening and baseline subverses.
Overall, our topic detection analysis shows that discussions on /v/GreatAwakening revolve around Trump and political matters, where baseline subverses feature news, along with hateful and controversial words. We will further analyze toxicity in Section 6.

Named Entities
While topic modeling gives us an idea of what is being discussed, to get an understanding of who is being discussed, we extract the named entities used in our communities of interest. We do so to understand who conspiracies focus on and better define the narratives they might be pushing.
To obtain the named entities mentioned in each post, we use the en_core_web_lg (v2.3) model from the SpaCy library [51]. We select this specific model over alternatives, e.g., MonkeyLearn, since, to the best of our knowledge, it is trained on the largest training set. Moreover, previous work [25] ranks it as the second most accurate method for recognizing named entities in text, with the first being Stanford NER. We choose en_core_web_lg over Stanford NER as it detects dates more accurately. The model uses millions of online news outlet articles, blogs, and comments from various social networks to detect and extract various entities from text. Crucially for our purposes, it also provides an entity category label in addition to the entity itself. For example, the entity category for Donald Trump is "person. " The different categories range from organizations to nationalities, products, and events. 5 In Table 3, we list the ten most popular named entities and categories from /v/GreatAwakening and all the baseline subverses. 5 See https://spacy.io/api/annotation#named-entities for the full list of labels.
Overall, this suggests that discussions within these communities are related to US happenings and events, politics, and established organizations and institutions. Baseline subverses focus mostly on nationalities, and religious or political groups, while /v/GreatAwakening discussions focus on the US, Donald Trump, and the US Presidential elections.

Text Analysis
Word Embeddings. To assess how different words are interconnected with popular QAnon specific keywords (e.g., "qanon"), we analyze our /v/GreatAwakening dataset using word2vec, a twolayer neural network that generates word representations as embedded vectors [31]. A word2vec model takes a large input corpus of text and maps each word in the corpus to a generated multidimensional vector space, yielding a word embedding. Words that are used in similar contexts tend to have similar vectors in the generated vector space.
To clean the QAnon posts before training the model, we follow a similar methodology as for the topic modeling presented in Section 5.1. We train the word2vec model using a context window (which defines the maximum distance between the current word and predicted words when generating the embedding) of 7,   Table 4: Top ten similar words to the term "qanon" and "q" and their respective cosine similarity.
as suggested by [27]. We limit the corpus to words that appear at least 50 times due to our dataset's small size. Finally, we train the word2vec model with 8 iterations (epochs) as, on small corpora like ours, epochs between 5 and 15 epochs are suggested to provide the best results [31,32]. (Choosing more epochs than 8 makes our model overfit and minimizes the word vocabulary, e.g., removing QAnon-specific keywords like "qanon. ") After training, our model includes a 5.6K word vocabulary.
QAnon similar keywords. Next, we find the top ten most similar words to "qanon" and "q" according to the model; see Table 4. We see that "qanon" is linked to words like "conspiracy," "theories," "movement," and "pizzagate." The term "q" seems to be closely related to Q's activity and the research the community does to decode his cryptic messages as evident to "drops, " which refers to the posts that Q leaves as breadcrumbs of information for adherents of the conspiracy to decode. These drops often hint at "psyops", the alleged psychological operations the deep state and governments deploy to control society. Interestingly, the term "larp, " an acronym for "Live Action Role Playing, " is sometimes used in a derogatory fashion to imply that Q is just a troll playing a game. This indicates that even on a community devoted to the QAnon conspiracy, there is at least some degree of pushback or dissent within the user base.
We use graph representations to analyze this finding below.
Graph representations. We follow the methodology from [74] to visualize topics within the word embeddings. Specifically, we transform the embeddings into a graph, where nodes are words and edges are weighted by the cosine similarity between the learned vectors of the nodes the edge connects. We perform community detection [8] on the resulting graph to gain new insights into the high-level topics that groups of words form.
Visualization. Figure 5 shows the two-hop ego network centered around the word "qanon. " Figure 6 depicts a graph centered around the word Q. To improve readability (since our graph transformation results in a fully connected network), we remove all edges with a cosine similarity less than 0.6. We further color each node based on the community it belongs to. Finally, we apply the ForceAtlas2 algorithm [23], which considers the edges' weight when laying out the nodes in the 2-dimensional space, before producing the final visualization.
Remarks. Taking into account how communities form distinct themes, and that nodes' proximity implies contextual similarity, Figure 5 shows that the "qanon" community (red) is very close to the green (far left on the figure) community, which seems to be discussing the movement itself ("qanons, " "cult, " "fascism, " "believers, " "movement, "). The small blue community in the middle of the figure discusses "leaks" and "interviews" from Edward "snowden. " Next, the purple community is focused on Q's activity and the posts he drops ("q, " "drops, " "timeline, " "decode, " "cryptic"). In the yellow community (far right on the figure), we come across the QAnon predecessor "pizzagate, " Q drop aggregators (e.g., "qmap, " which was recently shut down [9]), and other social media platforms (8"kun", 4"chan, " "twitter, " "instagram, " and "parler"). Focusing on the conspiracy theory's originator, Figure 6 plots the discussion around Q. Interestingly, the community of "q" (red) has words like "larp, " "disinfo, " "doubts, " and "shill" (a term used for someone that might be hired by the government pretending to agree with a conspiracy) in close proximity of Q. On the other hand, we find terms like "followers" and "aj" (a term used to describe a man as supportive and perfect). This plot strengthens the hypothesis that although the community is devoted to the QAnon movement, at the same time, there might be signs of chasm with regards to what the users on /v/GreatAwakening think of Q. Finally, the blue community discusses Q's "cryptic" "drops," and various social networks like 4"chan, " 8"kun, " and "qmap" aggregation site that archives Q's posts.

Take Aways
The analysis presented in this section allows us to identify and visualize the narratives around QAnon discussion (RQ2). We show  Figure 6: Graph representation of the words associated with the term "q" on Voat.
that the QAnon community discusses online social media, political matters, and world events. Additionally, the main topic of conversation is Donald Trump and the US overall, and entities discussed are most typically organizations and individuals. These findings confirm that, regardless of the conspiracy theory's particular components, Trump's role in the conspiracy, e.g., as the alleged leader in the war against the deep state, is central.
Finally, our structural analysis of word embedding similarities provides some high-level discussion topics within the community. For example, we find that the term "larp," an oft used criticism of Q implying he is merely playing a game, is often used in the same context as discussion of "q" himself. This is an indicator that adherents are well aware of criticisms of their information source, and perhaps some dissent within the community itself. Additionally, we see that the movement is well embedded across the Web, with external q-drop aggregators (e.g., qmap) and social media platforms commonly discussed along with Q.

TOXICITY ANALYSIS
In this section, we analyze the toxicity of the /v/GreatAwakening community compared to the general discussion subverses.
Motivated by our earlier findings suggesting that toxicity, hate, and racism exist in all subverses of our dataset, we analyze the content of each post according to how toxic, obscene, insulting, profane, and inflammatory they are. To do so, we use Google's Perspective API [41]. We choose this tool, similar to prior work [39], as other methods mostly use short texts (tweets) for training [15], whereas Google's Perspective API is partly trained on crowdsourced annotations and comments with no restriction in character length, like Reddit and The New York times comments, similar to Voat posts [40]. We also acknowledge the limitations of the API; namely, false-negative results due to misspelled words [21] and bias against African American English written posts [48]. However, we do not take the scores at face value but use them to compare the differences between QAnon-related and baseline posts on Voat.
We rely on six models to annotate posts from all subverses: toxicity, severe_toxicity, obscene, insult, profanity, and inflammatory. 6 Note that all methods provide scores (0 to 1) for textual posts. Therefore, we do not have scores for 4.8% (24.6K) of the posts in our dataset since they only contain links or images but no text. In Figure 7, we plot the CDF of the scores for each model. The baseline subverses (B in Figure 7(a)) exhibit higher levels of toxicity and severe_toxicity, compared to /v/GreatAwakening (Q in the figure). Specifically, 39.9% and 28.2% of the baseline posts have, respectively, toxicity and severe_toxicity scores greater than 0.5, while only 23.3% and 13.7% of the QAnon posts have these scores greater than 0.5. We observe similar trends for the other models, with the baseline subverses always scoring higher than /v/GreatAwakening. Overall, 33.6% and 36% of the baseline subverses' posts have an obscene and insult score greater than 0.5, respectively (Figure 7(b)), and 33.6% for profanity and 46% for inflammatory (Figure 7(c)). For all six models, the percentage of the QAnon posts that have perspective score greater than 0.5 is at least 10% smaller than the general discussion posts. Last, we use two-sample KS test to check for statistically significant differences between all the distributions in Figure 7 and find them (p < 0.01).
Remarks. Although the QAnon community's content exhibits some levels of toxicity, the movement is not as toxic as other discussions on the platform. We believe this not to be entirely surprising as the community seems to be more focused on the conspiracy aspects of world events, politics, and Donald Trump, while racist or hateful agendas might more vigorously characterize Voat as a whole, or at least the popular general-discussion subverses in our baseline. In other words, toxicity in the discussions seems to target the so-called "deep-state," the puppet masters, and the pedophile ring members. Whereas baseline subverses like /v/news and /v/politics are likely to include inflammatory discussions between users with contradicting opinions, or comment on world events from a racist/hateful standpoints.
Interestingly, the baseline subverses' level of toxicity appears to be similar to that of 4chan's /pol/, which is measured in [39]. In particular, we find that the percentage of posts that get scores above 0.5, across all models are very similar on /pol/ and our four baseline subverses. Considering that /pol/ is broadly considered to be a highly toxic place [20], this suggests that Voat is too.

RELATED WORK
In this section, we review previous work on QAnon and Voat.
Qualitative work on QAnon. Prooijen [69] studies why people believe in conspiracy theories like QAnon, arguing that their beliefs are not necessarily pathological or novel and can be followed by individuals who behave relatively normally. The author explains that, typically, individuals follow more than one conspiracy theory, as also discussed by Goertzel [19], and they believe that nothing happens coincidentally. At their core, conspiracy theories reinforce the idea that hostile or secret machinations permeate all social layers, thus forging an appealing account of events for the individuals that seek "explanations, " especially after experiencing anxiety and uncertainty due to societal events that traumatized them.
Sternisko et al. [52] argue that conspiracy theories pose a real threat to democracies, as governments and media might start or amplify them to benefit their political agendas and interests. Schabes [49] stresses that social networks help conspiracy theories spread faster, which threatens individual autonomy and public safety, enforces political polarization, and harms trust in government and media. Rutschman [47] explains that misinformation spread by the QAnon movement can be dangerous to individuals, e.g., claiming that drinking chlorine dioxide prevents COVID-19 infections. Thomas and Zhang [68] explain that small groups of engaged conspiracists, like QAnon followers, can potentially influence recommendation algorithms to expose new, unsuspecting users to their beliefs. The same study notes that conspiracy theories often include information from legitimate sources or official documents framed with misleading and conspiratorial explanations to events, which creates illusions and further complicates moderation efforts against conspiratorial content.
Quantitative work on QAnon. McQuillan et al. [28] collect 81M tweets related to COVID-19 between January and May 2020, finding that the QAnon movement not only has grown throughout the pandemic but also that its content has reached more mainstream groups. In fact, the Twitter QAnon community almost doubled in size within two months. Darwish [14] gathers 23M tweets related to US Supreme Court judge Brett Kavanaugh for 3 days and 4 days in September and October 2018, respectively. They find that the hashtags #QAnon and #WWG1WGA (Where We Go One We Go All) are in the top 6 hashtags in their dataset. Chowdhury et al. [11] identify 2.4M accounts suspended from Twitter and collect 1M tweets, performing a retrospective analysis to characterize the accounts and their behavioral activities. They observe that politically motivated users consistently and successfully spread controversial and political conspiracies over time, including the QAnon conspiracy.
Faddoul et al. [17] collect the top-recommended YouTube videos from 1,080 YouTube channels between October 2018 and February 2020. In total, they analyze more than 8M recommendations from YouTube's watch-next algorithm and use 500 videos labeled as "conspiratory" to train a classifier to detect conspiracy-related videos with 78% precision. Using TF-IDF, they also find that, within the top 15 discriminating words in the snippet of the training set videos, the term "qanon" ranks third. Also, QAnon-related videos belong to one of the three top topics identified by an unsupervised topic modeling algorithm. The authors conclude that YouTube's recommendation engine might operate as a "filter bubble. " Recently, Aliapoulios et al. [1] collect Q drops archived by six "aggregation sites" to study QAnon from Q's perspective, and how links to these sites are shared on platforms like Twitter and Reddit.
Voat. Chandrasekharan et al. [10] detect abusive content using data from 4chan, Reddit, MetaFilter, and Voat, introducing a novel approach called Bag of Communities (BoC). Part of the Voat data collected for their work originates from /v/CoonTown, /v/Nigger, and /v/fatpeoplehate: three communities focused on hate towards groups of individuals with specific body or race characteristics. These subverses were created in Voat after Reddit banned the original /r/CoonTown, /r/fatpeoplehate, and /r/nigger subreddits in 2015 [33,61,72]. Similarly, Salim et al. [46] use Reddit and Voat's /v/CoonTown, /v/fatpeoplehate, and /v/TheRedPill comments to train a classifier to detect hateful speech. Khalid and Srinivasan [26] collect 872K comments from /v/politics, /v/television, and /v/travel in an attempt to detect distinguishable linguistic style across various communities; more specifically, they compare the features of Voat comments to Reddit and 4chan comments and train a classifier to predict the origin of a comment based on its style and content. Finally, Popova [42] uses data from /v/DeepFake and mrdeepfakes.com, finding that pornographic deepfakes are often created for circulation and enjoyment within the community. Note that both the mrdeepfakes.com and the subverse /v/DeepFake were created after Reddit banned the subreddit /r/DeepFakes in 2018 [13,63].
Remarks. Our paper presents the first characterization of the QAnon community on Voat. Some of our findings are aligned with those from previous studies, e.g., a steady increase in posting activity on /v/GreatAwakening, somewhat similar to [28], which finds that the QAnon movement on Twitter increased in size over their collection period.

CONCLUSION
This work presented a first characterization of the QAnon movement on the social media aggregator site Voat. We collected over 510K posts from five subverses: /v/GreatAwakening, the largest QAnon-related subverse, as well as a baseline consisting of the four most active subverses, /v/news, /v/politics, /v/funny, and /v/AskVoat.
We showed that users on both the QAnon and baseline subverses tend to be engaged. However, the audience of /v/GreatAwakening consumes data from just a handful of content creators responsible for over 72.8% of the total submissions in the community. The /v/GreatAwakening subverse had a peak in registration activity shortly after Reddit banned QAnon related communities in September 2018. Using topic modeling techniques, we showed that conversations focus on world events, US politics, and Donald Trump. We also trained a word2vec model to illustrate the connection of different terms to closely related words, finding that the terms "qanon" and "q" are closely related to other conspiracy theories like Pizzagate, other social networking platforms, the so-called deep-state, and "research" activities the community performs to decode Q's cryptic posts. Finally, toxicity scores from Google's Perspective API show that posts in /v/GreatAwakening are less toxic than those on popular general-discussion (baseline) subverses.
Although this paper represents the first large-scale study of the QAnon movement on Voat, it is far from comprehensive, and numerous questions about the movement remain, leaving several directions for future work. First, while this paper focused on Voat, the QAnon movement is decidedly multi-platform, and thus we encourage work that examines it from a cross-platform perspective [1]. Next, even though it has only recently entered mainstream discourse, QAnon has a long and still somewhat muddied evolution. This calls for longitudinal studies that cover a much longer period than that in the present work to get a firm grasp on how the movement has evolved, both in terms of components of the conspiracy as well as user engagement and discussion (e.g., how do adherents react when the predictions in a q-drop do not come to pass). Finally, we believe that while understanding the movement itself is important, there are real indications that it exhibits cult-like characteristics -e.g., recovery stories from former adherents [12] and communities devoted to emotional support for people whose loved ones have become followers 7 -it is crucial to understand more about the QAnon counter-movement, which might provide insights into the real-world impact of the spread of dangerous conspiracy theories as well as devising mitigation strategies.