Priming and Actions: An Analysis in Conversational Search Systems

In order to accurately simulate users in conversational systems, it is essential to comprehend the factors that influence their behaviour. This is a critical challenge for the Information Retrieval (IR) field, as conventional methods are not well-suited for the interactive and unique sequential structure of conversational contexts. In this study, we employed the concept of Priming effects from the Psychology literature to identify core stimuli for each abstracted effect. We then examined these stimuli on various datasets to investigate their correlations with users' actions. Finally, we trained Logistic Regression (LR) models based on these stimuli to anticipate users' actions. Our findings offer a basis for creating more realistic user models and simulators, as we identified the subset of stimuli with strong relationships with users' actions. Additionally, we built a model that can predict users' actions.


INTRODUCTION
Conversational search systems (CSSs) are widely discussed and recognized as an established area within the Information Retrieval (IR) community.Although many studies and industry eorts are directed towards enhancing user experience, the method of assessment remains restricted [2,6,23].Due to the interactional nature of CSSs, traditional metrics cannot capture user satisfaction [6,19,21,22,24].Consequently, the industry primarily depends on online evaluation techniques for assessing CSSs [14].
Most traditional search systems treat queries as separate entities, but in conversational scenarios, context becomes crucial [8,17,24].This implies that the current user's inquiry also relies on preceding inquiries and responses.An irrelevant response might be considered invalid in a search system, while in the case of CSSs, it can impact subsequent inquiries, leading the user to pose additional questions to rene their search.
One potential approach could be utilizing user simulation (US) for CSSs [11].USs aim to generate data close to the data we would collect from users.The benets of USs include: 1) The ability to operate without actual data.2) The capability to predict the behaviour of users.3) It is suitable for both developing and assessing CSSs.
One question that remains unanswered is what motivates users to take certain actions.In this paper, we focused on priming eects.Priming eect refers to an unconscious inuence of past experience on current performance or behaviour [4,30].Tulving et al. [34] conducted an experiment to demonstrate the eect of priming where participants saw a list of 96 words.Then the participants were asked to nish several tasks, including completing graphemic word fragments 1 hour later and seven days later than having seen the list.This study shows that previous tasks help accelerating later tasks.The priming eect is widely used in social marketing.Fukawa [16] summarized that aects consumers' behaviour and judgments with an example: When an individual is exposed to primes, e.g., wholesome and nourishing, it can activate related concepts, e.g., being healthy, which makes them more susceptible to purchasing corresponding products, e.g., vegetable juice.
Although there is a long history of exploring priming eects, it is still a developing topic in CSSs.Church [7] proposed an adaptive language model for lexical adaptation to depict priming eects, where each document is divided into prime and target, where the prime will inuence the target.In many studies, correlations between primes and targets are discovered, such as active/passive, verbal particle placement, etc. [9,18,20,28,33].Reitter and Moore [28] demonstrated that lexical and syntactic repetition can predict task-oriented dialogues' success.While it is fruitful in modelling the conversation, the correlation between users' actions and the inuence of priming eects is not yet clearly understood.
The objective of this paper is to nd a relationship between context-based stimuli and the actions performed by users.We rst modelled the interaction between users and systems.We then adapted priming eects from the psychology community, which have the potential to depict users' actions, to analyze our datasets.After that, we analyze the correlations between priming eects and users' actions.Finally, we proposed a model to predict users' actions.
Our contributions can be summarized as follows: • An analysis of correlations between stimuli of priming eects and users' actions.• A logistic regression (LR) model to predict users' actions.
Our ndings establish a basis for creating reliable user simulations and better evaluation metrics for CSSs.

RESEARCH QUESTIONS
In this paper, we aim to answer the following research questions: RQ1. Are there relationships between stimuli and users' actions?
To answer this question, we will calculate the correlation coecients between various stimuli adapted from priming eects and users' actions.This is crucial for understanding the priming eects in CSSs.
RQ2.What is the performance of models built on these stimuli?
To answer this question, we will train LR models based on stimuli in RQ1.This will demonstrate if these stimuli include enough information to predict users' actions.

PRIMING EFFECTS
Researchers have identied various types of priming eects that have distinct eects on user behaviour.For example, some stimuli will speed up the processing [25], which is called positive priming eects, as well as negative priming eects will slow down the processing.This study focuses on dierent priming eects and corresponding stimuli, with consideration given to various dimensions such as depth of history, scope in history, and scope in the current turn.
In general, three dimensions are depth of history, scope in history, and scope in the current turn.Depth of history refers to the number of turns before the current turn when the stimuli are calculated.In this study, we chose 1, 2 and 5 turns.Scope in history and last turn refers on which part (i.e., queries, replies or both) we calculate the stimuli.The scope has A for replies, @ for queries and A@ for both.
Repetition priming eects.Repetition priming eects refer to the response when the stimuli are repeatedly presented [30].Low-frequency words tend to have stronger repetition priming eects than high-frequency words [12].They also discovered two components of normal repetition priming eects: a short-term effect independent of frequency and a long-term eect dependent on frequency.In this study, we consider two types of repetition: repetition of each part of the tokenized text and repetition of each noun phrase.According to dierent scopes, the number of repetitions will be counted as the stimuli.
Semantic priming eects.Foss [13] concluded semantic priming eect is "awakening" from the context, where stimuli will trigger the same semantic category.In their study, Blank and Foss [5] provide an example for semantic priming eects: nurse is a semantic prime of doctor rather than of butter.When a stimulus related to one word is present, not only the word but also related words are "awakened".For stimuli of these priming eects, Word2Vec [26] is used to calculate the semantic similarity between words.We use three methods to summarize the top 5 similarities in each scope.They are: Taking the average, Summing up and Taking the max.Aective priming eects.Aective priming eects involve assessing people, ideas, objects, and goods not solely based on their physical characteristics but also based on their emotional context.Studies on aective priming eects typically involve presenting positive, neutral, or negative cues before a stimulus to inuence how it is evaluated or responded to.Aective priming eects may be more powerful and widespread when the cue is barely noticed by the individual [29].In this study, we will measure polarity and subjectivity via TextBlob1 and similarly summarize them as we do for the semantic priming eects.

USERS' ACTIONS
In this study, we categorize users' actions when interacting with CSSs into three types, namely Stopping, Following up and Switching topic.
Stopping.Stopping occurs when users decide to terminate a conversation, and it typically marks the nal turn of the exchange.This action holds signicance in measuring various eects, such as the principle of least eort [35] and the recency eect, wherein individuals tend to start recalling information with the most recent items.
Following up.Conversational sessions frequently involve followup queries that rely on prior interactions, taking into account the absence of certain context and references to previously mentioned subjects [27].Users ask follow-up queries to correct their search space and seek better answers.
Switching topic.According to Stede and Schlangen [32], the inquisitive user in an ongoing interaction may develop an interest in additional, yet related topics based on the information presented in the responses.This phenomenon is commonly referred to as topicswitching behaviour, which is frequently observed in informationseeking conversations, particularly when utilizing search systems for information gathering [31].
To broaden the range of scenarios, we selected the following datasets.Some datasets, like Topi and FD, feature agent responses in free-form, while others consist of passages from documents.Additionally, the datasets possess distinct characteristics.For instance, the average length of conversations in Topi is 13, with a standard deviation of 3.3, while FD has an average length of 4.5, with a standard deviation of 0.5.Consequently, Topi conversations tend to be lengthier with a broader distribution, while in FD, most conversations end after 4 or 5 turns.In datasets like Topi and FD, the agent's answers are composed of free-form responses, making the detection of repeated retrieved documents challenging since the same content can be expressed dierently.
To analyze the following-up action, we chose ORC, where followup questions are labelled.In this dataset, 53% of questions are followup questions.To analyze the switching-topic action, we chose Topi, where switching-topic queries are also labelled.In Topi, 27% of questions are switching-topic queries.
We created a new dataset AllMix based on the above original datasets.As the minimal size of the seven datasets is 26 (TREC), in AllMix, we randomly picked 26 samples from each dataset to keep the balance of dierent datasets.

EXPERIMENTS AND FINDINGS 6.1 Stimuli vs Actions: Correlation
To answer RQ1, we calculated correlation coecients between stimuli and users' actions on various datasets.As mentioned in Section 3, we took into account stimuli with diverse scopes and  conditions.In this study, there count a total of 270 stimuli.The labels on the y-axis in Figure 1 exemplify the names of such stimuli.
Stimuli with dierent features begin with dierent names.For instance, W2V denotes the semantic similarity assessed by Word2Vec, R represents repetition, and Se refers to sentiment.Then comes the type of stimuli, where T refers to stimuli computed from tokenized lists, N refers to noun lists, Sub refers to subjectivity, and Pol refers to polarity.The following number denotes the depth of the history.After that, the scope is indicated.For W2V and R, the scope consists of two parts and is written as ⌘22 where ⌘ refers to the scope of history, and 2 refers to the scope of the current turn.For Se, there is only one part that refers to the scope of history.The scopes have three types, as mentioned in Section 3. Finally, the method used to compute the stimuli is indicated, which includes sum, max, and mean.
In this section, we utilized Spearman's rho to gauge the correlation coecients between stimuli and users' actions across various datasets.We processed every turn of each conversation for each stimulus and concatenated them into one stimulus list for each corresponding dataset.We then calculated the correlation coecients with corresponding actions at each turn.Figure 2 shows the result.In this gure, we selected the top 10 stimuli for each dataset and action based on the absolute value of their Spearman's rho without any duplication.There are a total of 44 stimuli in the gure.According to the gure, we have the following ndings: First, sentiment-based stimuli have a reasonable association with stopping points.For example, SeSub1qrsum has a negative correlation with stopping in most datasets.It indicates that this stimulus has the potential to prolong the conversation.
Second, stimuli based on the same feature tend to have a similar relationship with the same action.In Figure 2, there is a notable trend that almost all the similar stimuli play a similar role for the same action.For example, most of the stimuli based on sentiment play a negative role in stopping under most conditions.In contrast, in switching topics, this group of stimuli always have a smaller strength compared with other stimuli.We can also observe this trend in stimuli based on W2V in stopping and switching topics, as well as in stimuli based on repetition in stopping, switching topics and following up.
Finally, only a few stimuli have a weak correlation with switching topics and following up.RT1q2qsum, RT2qr2qsum, and RT5qr2qsum are the only three stimuli that have a correlation coecient larger than 0.2 for switching topics, while SeSub1rsum is the only stimulus with a correlation coecient larger than 0.2 for following up.This suggests that predicting these two actions based on stimuli is challenging, unlike predicting stopping points.

Stimuli vs Actions: LR models
To address RQ2, LR models were trained for three actions using the 270 stimuli outlined in Section 3.Each dataset was randomly split into 70% training and 30% test sets for LR model training.At each turn of the conversation, the LR models received 270 features of stimuli as input and the action to be taken as the target.The LR models consisted of a Min-Max scaler for normalization and an LR layer for classication.
Table 2 displays the scores of LR models in the corresponding test sets.The LR models exhibit good performance in all datasets for stopping, poor scores in Topi for switching topics, and fair scores in ORC for following up.This indicates that the 270 stimuli used in this study can represent stopping behaviour and capture some information about asking follow-up queries.However, predicting switching topics solely based on repetition, semantic similarity, and sentiment is dicult.
The performance of LR models also aligns with the correlation coecients distribution presented in Section 6.1, where the stopping action exhibited stronger relationships than the other two actions, as LR models have the best performance in predicting it.
Dierent from correlation coecients, the weights of the LR models focus on the stimuli aggregated using the mean.The top 3 stimuli, sorted by the accumulated reciprocal of the rank for the stopping action, are SeSub2qrmean, SeSub2rmean and SeSub2qmean; for the switching topic action are SeSub5qmean, SeSub5qrmean and W2VT1q2qmean, and; for the following up action are: W2VT5q2rmean, W2VT5qr2qrmean and SeSub1qmean.

DISCUSSION AND CONCLUSION
This paper examined dierent stimuli based on several priming effects.Initially, we combined six original datasets to create a diverse dataset.Next, we adjusted priming eects to match the CSS setting.Then, we generated 270 stimuli based on varying feature, type, depth, scope, and method.Following that, we computed and scrutinized the correlation between stimuli and users' actions.Finally, we employed logistic regressions on seven datasets with 270 stimuli and evaluated their eectiveness in predicting users' actions.
Our results show: • Stimuli based on sentiment have reasonable relationships with the stopping action.• There is a limited correlation between certain stimuli and the act of switching topics and following up.• Stimuli based on the same feature share a similar relationship with the same action.• LR models based on repetition, semantic similarity and sentiment stimuli are capable of predicting the stopping and follow-up actions but cannot predict when the topic of the conversation changes.• Dierent from correlation coecients, LR models prefer stimuli aggregated using the mean operator rather than sum or max.Our study analyzed how dierent stimuli, such as repetition, semantic similarity, and sentiment inuenced the three user's actions in various datasets.According to our results, LR models that utilize these stimuli are capable of predicting the stopping of the conversation and the asking follow-up questions, but they cannot anticipate a change of topic.This would be benecial in upcoming research involving simulating user behaviour in CSSs.
This study has identied two limitations that warrant attention in future research.Firstly, incorporating a more diverse set of priming eects beyond just repetition, semantic similarity, and sentiment may be benecial, as these factors only represent a subset of potential priming eects.If labelled data is available, introducing additional priming eects and stimuli could improve future studies.Secondly, since this study was restricted to using only one dataset for the topic-switching and follow-up actions, using more datasets would give us more condence in the results of this paper.

Table 1 :
Properties of each dataset.Size refers to the total number of conversations, ; refers to their average length, f refers to its standard deviation.The last two columns indicate if the follow-up and switch-topic actions are labelled.

Table 2 :
Scores of LR models on datasets, while AUC refers to the area under P-R curves.