Confirmation bias and trust: Human factors that influence teachers' attitudes towards AI-based educational technology

Evidence from various domains underlines the key role that human factors, and especially, trust, play in the adoption of AI-based technology by professionals. As AI-based educational technology is increasingly entering K-12 education, it is expected that issues of trust would influence the acceptance of such technology by educators as well, but little is known about this matter. In this work, we bring the opinions and attitudes of science teachers that interacted with several types of AI-based technology for K-12. Among other things, our findings indicate that teachers are reluctant to accept AI-based recommendations when it contradicts their previous knowledge about their students, and that they expect AI to be absolutely correct even in situations that absolute truth may not exist (e.g., grading open-ended questions). The purpose of this paper is to provide initial findings and start mapping the terrain of this aspect of teacher-AI interaction, which is critical for the wide and effective deployment of AIED technologies in K-12 education.


Introduction
Despite its relatively small community of researchers, AI in Education (AIED) has been around for about half a century since the Carbonell's seminal work on an adaptive geography instruction system called SCHOLAR [1]. During this period, significant contributions have been made to the design and evaluation of AI systems that can be used in educational settings. For instance, adaptive instruction tools based on AI can be more effective than traditional classroom instruction, reading printed or digital texts, or doing homework alone [2], [3]. However, in contrast to the applied sciences, finance, and medicine, the adoption of AI in real-world educational settings seems to lag behind [4].
Perhaps, this is, at least to a certain extent, due to the limited scope of AIED solutions focusing mainly on the technical and pedagogical aspects of delivery in a closed system, rather than taking a "mixed initiative" approach [5] aiming to combine human and machine agency in complex educational systems [6], [7]. Such an approach that highlights the significance of human agency in the adoption of AI in Educational systems, requires an in depth-investigation of practitioners' perception of AI tools as well as their trust in them to adopt them into their practice.
Investigating educators' perceptions of AI is important as it can lead to regulatory activity with potentially serious repercussions [8] as well as for helping us define policy [9]. More specifically for AI in Education, the potential aversion of the public to AI is costly for society at large. For instance, AIED systems can be comparable to human tutors in terms of their effectiveness (i.e, [10], however many educators may remain resistant to using them, and actively demonstrate against their use in schools 1 , 2 .
The issue of educators' attitudes towards AIED systems is not disconnected from how AI is perceived in Media. For instance, not all mentions of AI are associated with positive attitudes [11], and concerns regarding the potentially harmful impact of AI technologies are also often raised in the media and public rhetoric. Moreover, highly respected academics and public figures have exacerbated the construction of dystopian scenarios of AI machines causing existential consequences to humankind. Besides, there are other arguments on the observed and expected impact of AI on the future of the workforce, and related fears around mass unemployment [12], [13]. These negative connotations have the potential to skew educators' perceptions and can lead them to avoid and/or ignore AI systems in their practice. For example, in finance contexts, although evidence-based algorithms more accurately predict the future than do human forecasters, when forecasters are deciding whether to use a human forecaster or an algorithm, they often choose the human forecaster [14]. Among the issues affecting humans' attitudes towards machine-based decisions, Trust is a critical factor [15].
As it is expected that AIED tools would be increasingly used in education, it is important to investigate educators' attitudes toward them. However, little is known about this matter. Our research aims to fill this gap. In this paper, we present findings from a series of interviews and group discussions with ∼20 science teachers. The teachers participated in a year-long professional training program that involved co-design of AI-based solutions for the science classroom. Early on, it was clear that human factors, and especially, trust in the AI-based decision making, shape teachers' interaction with the tools. For example, we noted that when the teachers were not familiar with the students (e.g., a synthetic class or anonymized data), their level of trust in the tool recommendations tended to be high, while they were much more reluctant when the analysis was performed on their own class. So, this research started as a response to this phenomenon. To the best of our knowledge, the issue of educators' trust in AI was not previously studied within the learning analytics community. Our research takes a qualitative approach to start mapping this terrain of teachers' attitudes towards AI-based decision making, with special attention to factors affecting their willingness to accept it.

Research Setting
The teachers were presented with AI-based educational technology that is being piloted for PeTeL. To make the scenarios presented to teachers concrete, all the interactions were around specific artifacts presented through authentic use-cases. In Physics, the teachers were presented with a tool that clusters students into groups of students that exhibit similar response patterns on a multi-dimensional assessment instrument ( Figure 1). In Biology, the teachers were presented with an NLP-based algorithm that analyzes responses to open-ended questions in Biology ( Figure 2). Teachers' interaction with the tools included two phases. In the first, teachers interacted with the tools using anonymous authentic student data, and in the second they worked with data of their own class.

Data collection and analysis
Data were collected via interviews and group discussions during the 2019-20 academic year (face to face) and during the 2020-21 academic year (online due to the COVID-19 pandemic outbreak). Face to face interviews and group discussions were audio-taped, and online interviews and group discussions were video-taped via Zoom. The collected data was later transcripted and analysed by the first author. The themes that emerged were discussed with the other authors based on the data and triangulated between the interviews and the group discussions.
The main goal of the qualitative data analysis was to identify emerging themes, in order to further investigate them in a more quantitative manner in a subsequent experiment with a much larger population of science teachers.
Diagnosing students according to learning outcomes Nov   The white squares represent the information given in the question. The green squares indicate that the student correctly mentioned the corresponding property, while the red squares indicate the student missed the corresponding part of the written explanation.

Interviews
The interviews were conducted with several Physics teachers who participated in professional training programs in our department. The interviews were semi-structured and their main goal was shedding light on teachers' attitudes towards applying AI-based personalized instruction in their classrooms. The artifact that was presented to the teachers was the clustering tool ( Figure 1). The interviews (30-60 minutes each) focused on the following key questions: • In your opinion, will assigning different learning activities for individual students or groups of students in your class would improve student learning outcomes? • In your opinion, what are the most suitable teaching scenarios for implementing personalized learning in your classroom? • Do you implement them in your classroom? If yes, how? If not, what is preventing you? • Do you believe that the information provided on the dashboard would help you to plan follow-up activities for your students, which are tailored to their specific needs? If yes, how? If not, why not?

Group Discussions
The group discussions were held during several meetings held within a year-long professional training programs (a separate program for the physics and biology teachers). Within these meetings (four meetings throughout the year), the teachers were presented with the tools and in various development phases (first with a mock-up, later with the beta version) and provided feedback. The interaction with the tool was typically in small group discussions (3-6 participants, 20-30 min each discussion) that were moderated by a teachers teacher. The group discussions focused on the following questions: • Do you believe that the information provided on the dashboard would help you planning follow-up activities for your students, which are tailored to their specific needs? • If yes, how do you intend to use this information for planning a personalized follow-up activity (or a learning sequence)? • If not, why?
Each discussion group summarized its ideas on a shared sketch, and these were later presented to the community (about sixteen teachers) and were discussed in a wider forum.

Results
Based on the analysis of the interviews and the group discussions, a few themes emerged regarding teachers' trust in AI-based learning technology. Below we first briefly describe them, and then provide a few examples of how each of these themes was manifested in teachers' utterances (all quotes are translated from Hebrew).
1. Trust in one's ability to adopt AI tools and adjust the pedagogy: Teachers expressed concerns regarding their ability to shift their pedagogy towards AI-based personalized instruction. The concerns touched upon the necessity to change class routines, their ability to effectively manage situations where different students are following different learning sequences, and uncertainty about how learning would look like.

Confirmation bias -intuition and past experience override AI analysis:
Teachers perceived their pedagogic experience, human intuition, and previous knowledge about the students as superior to the AI analysis, and resisted to accept its recommendations when they contradicted their image of individual students and the whole class. 3. AI is an oracle that is either perfect or worthless: An additional issue that emerged was related to an unrealistic perception that AI-powered machines should make perfect predictions regardless of how imperfect the 'world' is. This means that teachers considered AI-based tools as sort of an 'oracle' that should provide an objective, deterministic and accurate answer even in situations (such as grading students' writing) where there is no absolute 'ground truth'. Among other things, teachers were tolerant to disagreements with peers on such tasks, while disagreements with the algorithm were considered as a fatal failure and an evidence that the tools are unreliable.
Below we provide more details on how each of these was reflected in teachers' utterances in the interviews and the group discussions. (Physics teachers are coded as Teachers 1-10, and the biology teachers are coded as Teachers 11-16).

Trust in one's ability to adopt AI tools and adjust the pedagogy
The findings from our research indicate that teachers usually prefer to base their pedagogy on a principle that can be described as "One-size fits most" (of the students in the class). This is despite the fact that they believe in the value of personalizing their instruction for improving students' learning gains. One of the reasons that they mentioned as preventing them for implementing individualized instruction was the lack of information on students' current state of knowledge, and the absence of means to search and discover tailored interventions that match individual needs. Another difficulty that was mentioned by the teachers in the interviews was related to classroom management -monitoring and controlling the progress of students that follow individual learning sequences (or even tasks). "It is simply impossible, it will take a huge amount of time and I do not have it", stated Teacher 6. Teacher 4 said: "I will be totally out of control in the class". The discussions that followed the pilot experiments with the tools clearly indicated that the teachers believe that the AI-based tools can assist them in personalizing their instruction. However, they were unsure about the changes that are required in order to integrate the tools into their teaching practices, and whether they are capable of conducting these changes.
These are exemplified in the following utterances made by the teachers during the group discussions: • Teacher 1 stated: "That if you want to use such a tool you probably need to change the instruction. So, the question is, is it possible?" • Both Teachers 2 and 6 explicitly repeated several times: "We need to change our perceptions about how we teach". • Teacher 3 mentioned that adopting a tool for a content recommendation would require him to use a different scheme of blending online and face-to-face teaching. "Although I did not use it, a mechanism for allocating follow-up activities seems to be able to serve me in the future, but requires substantial changes in my work habits, since today I do not rely solely on PeTeL as a learning platform". • Teacher 8 mentioned "This is a new and important tool that allows [...] to see the real picture of the class and provides the ability to treat a specific problem based on automated analysis of the answers". • "The truth is that the system is very clear about what should be addressed in the class", said Teacher 9. And he also added: "This tool is very effective and will improve and focus the learning process". • Teacher 11 described the tool as something that allows the teacher to realize which conceptual parts in student responses are missing: "[now] I can check it, may be I did not teach this subject good enough". • Teacher 14 said: "It may be for the good. I agree that it will require a change in the pedagogy. [...] It will be a tool to get to every student. It will not save time or effort, but now my efforts will be more effective and focused if I know exactly what the student's problem is". • "The computer helped a lot. The answers are sometimes vague. It is hard to evaluate.
If the computer gives an indication of the missing knowledge based on an automatic assessment, it can clarify and help", said Teacher 16. She also added: "The tool can assess in a more objective way, which is something that I struggle to achieve".
To summarize, our impression from the interviews and group discussions is that the teachers believe that educational technology, and in particular the AI-based tools that were developed for PeTeL, can assist them in providing personalized instruction. However they were uncertain about their capability of making effective use of the tools, as this would require substantial changes in their current teaching practices.

Confirmation bias -intuition and past experience override ML analysis
The idea that humans may treat evidence differently depending on whether the issue is of a personal matter or not is considered as one of the flavors of the well-known Confirmation Bias phenomenon [17]. "If we have nothing personally in stake in a dispute between people who are strangers to us, we are remarkably intelligent about weighing the evidence and reaching a rational conclusion. [...] But let the fight be our own, or let our own friends, relatives, fraternity brothers, be parties to the fight, and we lose our ability to see any other side of the issue than our own" (Thurstone, 1924 p.101). Closely related to Confirmation Bias are the Primacy Effect and Belief Perseverance (Nickerson, 1998). These phenomena provide an explanation why people may have the tendency to form an opinion at the beginning of a process based on partial evidence and then either seek information that reinforces it or reject information that could disprove it.
We find these phenomena especially useful in explaining how some of the teachers reacted to the AI-based analysis provided by the tools. Below we provide several observations, supported by corresponding teacher utterances. AI analysis is accurate, but not when performed on my students. As mentioned earlier, we presented to the Physics teachers a tool that analyzes students into clusters, based on their response data (see Figure 1). When the tool was demonstrated on synthetic data, most of the teachers found its analysis sensible and were generally glad about the idea of an ML algorithm working for them. However (and quite surprising at the beginning), they were much more reluctant about the results of running the ML algorithm on the data of their own students. Some examples from the group discussions are provided below.
• Teacher 1, when presented with the AI recommendation on the synthetic class, mentioned that the tool was as precise as himself in identifying groups of students with similar difficulties, and mentioned that it may be useful for planning the next activities for these groups. He did not raise any concern about the analysis. However, when presented with the results of the analysis on his own class, he rejected that analysis: "The data did not tell me anything. The analysis does not tell anything. If more tasks were given, we would probably get more data and more meaningful information. " • Similarly, teacher 2 was very positive about the ability of the algorithm to identify groups of students with similar difficulties, when operated on synthetic data: "Yes, I did a similar division, and the fact that the algorithm can do this for me is very convenient". However, when presented with the results of the analysis on the class of a fellow teacher (that he knows closely) he mentioned that the results cannot be trusted without previous knowledge on the students: "It did not tell me too much. Because I .. do not know the students". • Teacher 3, as well as Teacher 1 and 2, was keen to adopt the ML analysis conducted by the tool when ran on data of a synthetic class: "It may help me to approach students and provide them with follow-up activities". However, he as well reacted very differently when asked about the tool's recommendation on his own class: "My judgment is primarily based on my familiarity with the students, their learning habits, their willingness to deal with their own difficulties and their self-discipline".
Once a laggard, always a laggard. The majority of the teachers tended to see the students on a uni-dimensional scale -from weak to strong -and were less open to multi-dimensional, real-time analysis that may identify students who tend to be successful as having difficulties (or the opposite), thus contradicting the teacher's former belief about these students.
• Teacher 4 mentioned that he used to divide the class into groups based on his intuition and previous knowledge of the students ("I divide them into the group even before they start the learning activity. I know who is strong and who is weak"), rather than based on the results of a recently completed diagnostic learning activity. • Teacher 5 repeated several times: "I think it's hard to ignore my 'teacher intuition' in favor of this robot [the ML algorithm] that will do the job for me. [...] I have my intuition, I trust myself, I know my students, I know how to help them. As opposed to trusting the algorithm to tell me. " Some of the teachers referred to the tool's recommendation as a sort of confirmatory analysis that should reinforce their former beliefs about their students, and in case of disagreement, they argued that the tool is inaccurate and thus could not be trusted. For example: • Teacher 3 stated that he did not trust the results of the algorithm as they contradicted his intuition: "I know who the student is, I know him well enough, I know where he is strong [...] When I use statistics to get to know the students it confuses me. No way? It's not possible that he [a certain student] did not understand the right-hand rule [a physics concept] at all. Can't be", and "When I see a student I look at it [the response], and I see. What is important for me to know is who answered a question, who answered a particular question. Because I know him or her. " • Teacher 7 was very surprised to see two of his students, which he perceived as weak, in the group with the best achievements. "Everything is very similar to what I know about the students. Except, how come the tool marked these two laggards as successful??? they usually do nothing and learn only before the final exam". In fact, he was so surprised that he asked to stop the interview and went to further examine the results of these students (just to find out that the tool was correct). • Teacher 10 stated "The analysis is good because it confirms my knowledge about the students". She completely ignored the fact that the analysis was conducted on a specific set of recently solved items from a specific topic, and referred to students' knowledge as something that is permanent -strong students always succeed, weak ones almost always fail.
"Reading between the lines". Confirmation bias also manifests itself in a tendency to "read between the lines" -interpreting students' open-ended responses according to previous knowledge about the student, and not in an objective way that is solely based on the specific response. We include this issue, even though it is not directly related to the tools, as it presents a different aspect of the confirmation bias.
• Teacher 12 described how he struggled to grade responses of anonymous students: "It was very hard to grade them because I'm not familiar with the students". • Teacher 13 mentioned, "I should force myself sometimes to reduce a point because it is clear to me that the student knows, only she did not mention it in the answer". • During a group discussion Teacher 14 said: "I taught the topic and administered this activity in my class. It was a catastrophe. I need to teach them how to formulate. They definitely know the material, but do not know how to write it down in answers". While we expected the other teachers to ask something like "But how do you know that they know if they do not write it properly in their answer?" Instead, several teachers (Teachers 11, 12, and 15) fully agreed, making statements such as: "Sure, often they know, but are not able to formulate", and "Yes, we always 'read between the lines' and decide according to our prior knowledge of the student. "

(AI should be) Perfect in Imperfect World
One of the defining characteristics of teachers' expectation from the AI-based tools was that the tools would provide absolute truth and fully comply with their own opinion. Even a slight disagreement was perceived as an indication that the tool is unreliable, impractical, and useless. This perception co-existed with teachers' awareness to the fact that absolute truth may actually not exist, and that there may be multiple interpretations to the analyzed data (e.g., how to grade an open-response), as evident in the fact they were much more open to disagreements with peer teachers on these matters.
Below are a few exemplifying statements from one of the discussions.
1. Teacher 3 started it by arguing that to be useful the algorithm should be as precise as his own prediction. However, as the algorithm lacks previous knowledge about his students, he suggested that "It [the algorithm] may need more content to get better statistics. More assessment of the students. More, more, and more so that information will be helpful". 2. Teacher 1 used the analogy of Netflix as an example of a well-known recommendation system, emphasizing that learning environments are very different. "It could be that this whole story of big data ... is irrelevant when a child at the end has to sit down and solve an exercise. It works when she watches dozens of TV series on Netflix. Netflix is learning from these [data]. You can't ask the child to solve more and more exercises just to be really accurate. So maybe we're in trouble. " 3. It is interesting that the participants completely ignored the fact that when several human teachers are given the same data, they typically have various opinions regarding the quality of student answers, what type of pedagogic intervention would best suit the needs of each student, etc. Despite the possible disagreements, teachers find the opinions of peers and experts valuable, as we frequently observe in professional development programs, social forums of teachers, etc. In contrast to this tolerance to the subjectivity of human judgement in such cases, the majority of the teachers expected to get from AI-based systems an objective 'truth' (speaking in their language), and in case that this (unrealistic) expectation is not met, reject the analysis as unreliable.

Discussion
Our findings demonstrate that adoption of AI-based educational technology by teachers is not a straightforward process, and teachers' willingness to use such tools in their classroom may be influenced by their attitudes and perceptions of AI-based decision making. While factors affecting the adoption of educational technology by educators are studied for decades, the 'AI' nature of the current wave of technology seems to raise new types of issues, which resembles the growing evidence on factors affecting human-AI interaction in other domains, such as Healthcare.
Among our findings are that teachers see AIED as a potentially valuable technology, but are uncertain about their ability to make the adjustments in their pedagogy to fully utilize its value; that when interpreting AI-based analysis, teachers resist adopting recommendations that contradict their previous beliefs about the students, what we interpreted as Confirmation Bias and Belief Perseverance; and that teachers may expect AI to be an oracle that is always correct (even in situations where there is no fully objective 'truth'), but in case that it fails to meet this unrealistic expectation, reject it as worthless.
In addition to shedding light on how teachers' attitudes may shape their interaction with AIED technology, our research also underlines the importance of studying teacher-AI interactions in authentic settings, rather than in synthetic ones. For example, teachers' attitudes towards the AI-based recommendations changed dramatically from 'mostly positive' when performed on the data of a synthetic class, to 'very critical' when performed on data from the teachers' own class. Performing experiments with synthetic data is a common methodology when studying human-AI interaction, for example in the Healthcare domain, where synthetic data is frequently being used due to privacy issues or lack of real data. Similar issues (privacy, lack of data) also arise in education research. As a growing field that still needs to prove its value, AIED research tends to take a techno-optimist position that is in many cases based on experiments made in controlled environments. As we showed, such experiments may not reveal teachers' genuine attitudes, as would experiments in natural settings would.
To the best of our knowledge, this research is the first to study educators' trust in AI, and how attitudes impact the adoption of AIED technology by teachers. While the findings presented here are preliminary, they are a first step in developing our understanding of this important aspect of teacher-AI partnership, which we intend to further explore.