"What is the truth about lie detection?" This is a question that I am often asked and one that is profound because everyday we assess each other for veracity, be it at home, work, or social situations. Over the years, both in my writings and lectures, I have tried to give insight into this important question. I will try to do so again here.
As the best researchers can tell, and in my own experience as an FBI Special Agent (now retired), detecting deception is very difficult. Every study conducted since 1986, when the famed researcher Paul Ekman first wrote about this, has demonstrated that we humans are no better than chance at detecting deception (Ekman & O'Sullivan 1991, 913-920; Granhag & Strömwall, 2004, 169; Mann & Vrij 2004). That means that if you toss a coin in the air you will be as likely to detect deception as the truth. And while it is true that a very few people are better at detecting deception than others, they are barely above chance. In fact, those that are really good are only correct somewhere around 60% of the time; that means that 40% of the time they are wrong and you would not like them sitting on a jury judging you.
Unfortunately many people have come along and declared themselves deception experts over the years and that has influenced professionals and society in significant ways. I have listened to jurors post trial comment that they thought a witness was lying because they had "heard somewhere that if you touch your nose you are lying." Likewise I have talked to many a law enforcement officer who is convinced that they are experts at detecting deception. They have deluded themselves that they are, as have judges and other professionals. In fact, every time I hear Judge Judy (of TV fame) say, "I know you are lying," I cringe (unlike us she is covered by judicial privilege in saying what she wishes, the rest of us would be sued for slander). What she and others don't realize is that as Ekman, De Paulo, Frank, Mann, O'Sullivan, Vrij and others have stated, there is no single behavior indicative of deception (Ekman 1985 et.al., infra.)
So much of lie detection is based on the verbal as well as the nonverbals that one would have to have expertise in psychology, anthropology, sociology, criminology, jurisprudence, sociobiology, neurobiology, psychiatry, anatomy, physiology, communications, zoology, ethnography, primatology, linguistics, language, and grammar (to name a few), to truly understand the depth of what is behind deception and how to detect it. Fortunately there are those who have availed themselves to a wide disciplinary approach to the study of deception, but sadly few have.
Starting in 1971, when I first started studying the subject, I have heard of claims of individuals being able to detect deception based on behavior such as when someone avoided eye contact, looked up and to the right, touched their lips while speaking, cleared their throat, or displayed micro expressions. Instructors both in law enforcement and even researchers came in and lectured us young FBI agents about deception armed with videos of someone who touched their nose or covered their mouth when lying, or they showed signs of contempt as if that were scientific proof of deception. They were wrong and they were also incorrect in insisting that they were right; an anecdotal vignette of a person as they perform a behavior when lying is not science. It is interesting, but it is not science nor is it reliable. There are other times when the person uses the same behavior merely to relieve or reduce stress based on circumstances (e.g., in a police interview or the person is worried about getting to work late during a stressful traffic stop) and they are not lying but those are never shown.
As I look back on everything that has been written since the 1970s, I have begun to question some of the research. Not because the studies were not properly conducted, but rather, what did the experiments really accomplish? For over forty years, well meaning researchers have studied deception in the lab using college students. Utilizing elaborate schemes they got participants to lie about what they saw on a TV screen or they got them to take money and hide it and then lie about it and if they were successful they could keep it. Observers were then asked to determine who was lying or telling the truth and from that we get accuracy rates of from 50-60%. These experiments sounded pretty good at the time and they are still being performed. There is only one problem: a sterile laboratory environment, using college students, to me is not reality.
I say this in no way criticizing the researchers and their intentions because I think they are honestly trying to figure out how to detect deception. Some of them I know personally and admire how long they have been at this and how clever some of their experiments have been. But what I do question is the assumption that what we see in the laboratory is the same as in real life. And I have to say it is not.
Unlike what we see on television, the majority of law enforcement interviews (in fact about 97% of all police interviews) are done at night or in low light conditions outdoors, where it is noisy, there are distractions, others may be present - conditions that are not ideal as they are in a laboratory (Schafer and Navarro 2004, 3-13).
While the laboratory uses college students most of our prisons are not made up of college students. These experiments on deception, as far as I can tell, do not include individuals who are psychopaths (about 1% of Americans according to Robert Hare) or who are considered clinically antisocial (about 4-6% of Americans but about 60-70% of the prison population); nor do they use white-collar swindlers or "conmen" in these experiments who are experienced habitual liars (Hare 1993). Likewise most studies don't take into consideration that in 30-40% of arrests (with subsequent interviews), alcohol and drugs are a factor and if you ever do that kind of an interview, it is nothing like a laboratory.
Nor do laboratories test people who are stressed from travel such as at airports with canceled flights (that have serious consequences) or who may be stressed by being interviewed in a police facility: surrounded by officers with guns and handcuffs, and where their future is literally in the hands of strangers with power; not a clipboard and a white coat.
I spent a career interviewing spies and terrorists as well as criminals and I am still waiting for researchers to see how well observers perform in a laboratory on trained intelligence officers from the Russian FSB or Cuban Intelligence Service. Having done those interviews, I can say they are in no way like interviewing college students. Nor are the interviews of mobsters and "made" mafia capos the same as college students. In fact, most of the people who are interviewed in a forensic setting will not likely be in any way similar to students in a laboratory experiment. College students are not people who have to live and survive by lying such as conmen, spies, or repeat criminal offenders. These types learn to master the lie and deceit - their lives depend on it.
Likewise, no college laboratory can ever match what goes on between an intelligence officer and an interviewer conducting an interview in a hostile environment or a "denied area" of the world. Nor can it replicate the countless interviews of individuals in domestic situations where you have the wife who has been battered, tugging at you, as you interview the intoxicated husband in handcuffs, while three kids are screaming at you to let their daddy go. That lab experiment has yet to be performed and yet that is the reality of the swing shift (6 PM to 2 AM) for most enforcement officers. And not just that, most lab interviews are done while the subject is seated; conversely, at least for patrol officers, most police interviews (except the very few at the police station and on television) are actually done while standing up.
Of concern also is the profound dissonance of priorities between a law enforcement officer (who is desperate to get the facts to solve a homicide and needs information for leads or who seeks to fulfill the requirements of the statute's corpus delecti) and that of the interviewee who wants to hide what he knows because of consequences. There is a significant dynamic that takes place in the interview room between an officer and a suspect in the form of nonverbals as each feeds off of and reacts to the other. That alone effects perceptions, as does proximity to the interviewee this is very different than experiments where there is little interaction between the observer/interviewer and the interviewee. And of course, there is no social experiment that can replicate either life imprisonment or capital punishment. And so, because humans are sensitive to initial conditions as well situational context, I think it is very difficult to accept that what we see in a lab experiment with college students is congruous with what we see in real life when it comes to deception.
Recently there was talk of having machines detect deception based on cues from the face and eyes. To that I would ask, what about the rest of the body - that too transmits information? Additionally I would also ask, how have these machines been tested and vetted: in laboratories using college students?
I have to say that over the years, there has been an over-reliance on the face for clues to deceit by some researchers at the expense of other areas of the body. In fact I argue that there has been too much emphasis on the face when years of experience doing thousands of interviews teaches us that the whole body needs to be considered to get a more accurate read on a person.
Looking for cues to deception merely from ephemeral facial micro-expressions is questionable and likely fruitless. Micro gestures may be indicative of internal emotional turmoil that is being suppressed, but that is it. The distinguished Paul Ekman, who in fact coined the term micro - expression has stated in his book Telling Lies that micro expressions are rare and they "don't occur that often" (Ekman 1985, 131, 165). Plus as others have said, there is no single behavior indicative of deception (Matsumoto et. al., 2011, 1-4). I am concerned that machines that focus solely on the face will no doubt miss other information from the body (sweating, jittery hand, etc.) or generate lots of false positives because negative emotions abound especially where such machines are intended such as airports (stress of travel, stress of being subjected to searches, or inconvenient interviews, etc) or in a police setting.
I think we need to listen to experts such as Paul Ekman, Bella DePaulo, Mark Frank, Maureen O'Sullivan, Aldert Vrij, and Judee Burgoon, who have repeatedly stated, there is no single behavior indicative of deception and that the detection of lies is very difficult (Navarro 2008, 205-208). And this of course includes micro gestures such as a sneer or look of contempt, which is just that; contempt, not necessarily deception. That there are people who have been photographed lying while showing signs of contempt is interesting but again, that is merely anecdotal. If you interview enough people on the streets where there is a lot of police presence due to high crime rates, you will see the look of contempt quite often same as in a prison or when interacting with street gangs.
As for the polygraph, what can I say? Here is a machine that is very precise, which is why polygraphers reverently refer to it as an "instrument" and yet it does not detect deception. Wait, what? That is correct. A polygraph machine is not a lie detector and the so-called "instrument" does not and has never detected lies (Ford 1996, 221-236). It merely recognizes physiological changes in reaction to a cue (a question) but it doesn't detect lies and it can't. I repeat it can't. It is the polygrapher who interprets the instrument and your reactions to it and decides whether or not there is deception. It is this human factor, not dissimilar from some of the activity noted above, that the courts have found wanting (this is why polygraph result cannot be used against you in court) and why the American Academy of Sciences had less than choice words for the use of the polygraph in its formal report on the polygraph in 2002.
As for other gimmicks out there including machines that read eye behavior or voice stress analysis, again, I am dumbfounded by how many people are convinced that these machines actually work. Test after test has shown that these systems do not detect deception.
The Significance of This Topic:
This topic of deception would not be anything more than a curiosity if it did not have very serious consequences. Historically and even recently, people have been accosted, jailed, tortured, prosecuted, even executed when those in authority deemed them to be lying or complicit, based on their body language. Sadly, many individuals have confessed to crimes they never committed merely because someone misread them.
The price we pay for believing the unrealistic expectation that some have handed us about the reliability of detecting deception through nonverbals or other means is this: In the 261 DNA exoneration cases I have looked at, where the suspect's DNA was not at the crime scene (it was someone else who committed the crime), in all of those cases 100% of the investigators and the prosecuting attorneys could not detect the truth. They weren't just coin toss wrong (50/50), they were 100% wrong (Navarro 2011). They were so arrogantly sure that the behaviors and protestations they saw were lies that they could not recognize the truth. That is the price of falsely believing we are good at detecting deception. And if that were not bad enough, fully ¼ of these DNA exoneration cases, the individual gave a false confession. It's a funny thing about abuse and a coercive environment, in time most people, even the innocent, will yield and so they admit to crimes just to make the interview process stop (Kassin 2004, 172-193).
We all have a stake in detecting deception, after all, no one wants to invest with another Bernard Madoff or date a Ted Bundy. But we have to be realistic as to what we can detect, as Paul Ekman warned us decades ago (Ekman 1985,165-178). This goes for law enforcement officers, judicial officers, and clinicians, as well as the average person interested in the topic. It is also my hope that researchers in the future will consider who is tested, where they are tested, and how they are tested to give us a more accurate view as to who really is good at detecting deception and under what circumstances.
I have been the beneficiary of great instructors in my professional career and in my life and they have taught me how to use nonverbals to understand the thoughts, feelings, desires, and intentions of others. In forensic settings I was able to use it not so much to detect deception but rather to detect issues or concerns based on the questions that I asked. This allowed me to identify the innocent, to detect criminal activity, to uncover unknown conspirators, and to pursue leads in furtherance of investigations. But in the end, and this is cautionary, no matter what technique is used to look for deception, the only way to really know the truth is to verify and corroborate every single last detail of what someone says. And that is the truth about lie detection.
If you are interested in how body language is used in a forensic setting, please read Three Minutes to Doomsday; An FBI Agent, A Traitor, And The Worst Breech in U.S History (Scribner) - a true life account of how body language was used to catch a spy and uncover the "worst espionage breach in U.S history."
* * *
Joe Navarro, M.A. is 25 year veteran of the FBI and is the author of What Every Body is Saying, as well as Louder Than Words. For additional information and a free bibliography please contact him through www.jnforensics.com or follow on twitter: @navarrotells or on Facebook. Copyright © 2012-2017 Joe Navarro.
DePaulo, B.M. et. al. 2003. Cues to Deception. Psychological Bulletin, 129 (1), 74-118.
Ekman, Paul. 1985. Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriages. New York: W.W. Norton & Co.
Ekman, Paul & M. O'Sullivan. 1991. Who can catch a liar? American Psychologist, 46 (9), 913-920.
Ford, Charles V. 1996. Lies!, Lies!, Lies!: the Psychology of Deceit. Washington, D.C.: American Psychiatric Press. Inc.
The Global Deception Research Team. 2006. A world of lies. Journal of Cross-Cultural Psychology, 37, 60-74.
Granhag, Pär Anders and Leif A. Strömwall, Eds. 2004. The Detection of Deception in Forensic Contexts. Cambridge, UK: Cambridge University Press.
Hare, Robert D. 1993. Without Conscience: The Disturbing World of the Psychopaths Among Us. New York: Pocket Books.
Hartwig, M., Granhag, P. A., Strömwall, L. A., & Kronkvist, O. 2006. Strategic use of evidence during police interviews: When training to detect deception works. Law and Human Behavior, 30, 603-619.
Hartwig, M., Granhag, P. A., Strömwall, L. A., & Vrij, A. 2005. Deception detection via strategic disclosure of evidence. Law and Human Behavior, 29, 469-484.
Inbau, Fred E. et. al. 2001. Criminal Interrogation and Confessions, 4th. Ed. Gaithersburg, MD: Aspen Publishers, Inc.
Kassin, Saul. 2004. "True or false: 'I'd know a false confession if I saw one.'" In Granhag, P. A., & Strömwall, L. A. 2004. The detection of deception in forensic contexts. New York, NY: Cambridge University Press: 172-193).
Mann, S., Vrij, A., & Bull, R. 2004. Detecting true lies: Police officers' ability to detect suspects‟ lies. Journal of Applied Psychology, 89, 137-149.
Matsumoto, D., & Hwang, H. S., et. al., 2011. Evaluating truthfulness and Deception: New tools to aid investigators. FBI Law Enforcement Bulletin, (June): 1-9.
Morris, Desmond. 2002. Peoplewatching: The Desmond Morris Guide to Body Language. London: Vintage Books.
Navarro, Joe. 2003. A Four Domain Model of Detecting Deception. FBI Law Enforcement Bulletin, (June): 19-24.
Navarro, Joe. 2011. Clues to Deceit: A Practical List. Amazon Kindle
Navarro, Joe and John R. Schafer. 2001.Detecting Deception. FBI Law Enforcement Bulletin, (July): 9-13.
Navarro, Joe. 2009. The psychology of body language. Amazon Kindle.
Navarro, Joe. 2008. What Every Body Is Saying. New York: Harper Collins.
Schafer, John R. and Joe Navarro. 2004. Advanced Interviewing Techniques. Springfield, Il.: Charles C. Thomas Publisher.
Vrij, A. 2008. Detecting lies and deceit: Pitfalls and opportunities (2nd ed.). New York, NY: John Wiley & Sons.
Vrij, Aldert, Katherine Edward, Kim P. Roberts, and Ray Bull. 2000. Detecting Deception via Analysis of Verbal and Nonverbal Behavior. Journal of Nonverbal Behavior 24, (4), Winter 2000: 239-263.
Vrij, Aldert. 2000. Detecting Lies and deceit: the psychology of lying and the implications for professional practice. Chichester, England: John Wiley & Sons, Ltd.
Vrij, Aldert and G.R. Semin. 1996. Lie experts' beliefs about nonverbal indicators of deception. Journal of Nonverbal Behavior 20: 65-80.
Vrij, A., & Mann, S. 2001. Telling and detecting lies in a high-stake situation: The case of a convicted murderer. Applied Cognitive Psychology, 15, 187-203.
Walters, Stan B. 2003. Principles of kinesic interview and interrogation, 2PndP ed. Boca Raton, Florida: CRC Press LLC.
Warren, G., Schertler, E., & Bull, P. 2009. Detecting deception from emotional and unemotional cues. Journal of Nonverbal Behavior, 33, 59-69.
The Scientific Content Analysis (SCAN) is a verbal veracity assessment method that is currently used worldwide by investigative authorities. Yet, research investigating the accuracy of SCAN is scarce. The present study tested whether SCAN was able to accurately discriminate between true and fabricated statements. To this end, 117 participants were asked to write down one true and one fabricated statement about a recent negative event that happened in their lives. All statements were analyzed using 11 criteria derived from SCAN. Results indicated that SCAN was not able to correctly classify true and fabricated statements. Lacking empirical support, the application of SCAN in its current form should be discouraged.
Keywords: deception detection, scan, Scientific Content Analysis, Lie Detection, verbal cues, verbal credibility assessment
Research has revealed that non-verbal cues (e.g., behavioral cues such as gaze aversion, sweating) are faint and differences between liars and truth tellers are small at best (DePaulo et al., 2003; Sporer and Schwandt, 2007). However, findings about verbal cues are less variable and are more strongly related to deception (Bond and DePaulo, 2006; Vrij, 2008b). Verbal cues (or content cues) are cues that can be found in the content and meaning of a statement, such as the number of details that are included in a story (e.g., he had a large spider tattoo in his neck). Indeed, lying has been shown to result in qualitative differences between deceptive and truthful language. As a result, various verbal credibility assessment tools have been developed that address these content criteria within statements. Although the exact content criteria included may differ depending on the method, the procedure is highly similar. The presence of the criteria within the statements is carefully checked, and based on the presence or absence of the various criteria, a conclusion is drawn about its truthfulness.
One example of such a content criterion is “quantity of details”. In order to fulfill this criterion, a statement has to be rich in details, such as mentioning places (e.g., it happened in the kitchen), times (e.g., on Sunday evening at 8 p.m.), descriptions of people and objects (e.g., a tall man with bright blue eyes), etc. Additionally, deceit has been related to the use of fewer personal pronouns (e.g., using “the house” instead of “our house”) and fewer negations (e.g., no, never, not), using less perceptual information (e.g., “I could smell the alcohol in his breath”), less details overall and shorter statements (Newman et al., 2003; Masip et al., 2005; Hauch et al., 2014; Amado et al., 2015). As mentioned previously, several methods have been developed to address these issues.
Two well-established credibility assessment tools that tap into such content differences are the Criteria Based Content Analysis (CBCA) and Reality Monitoring (RM). For CBCA, two theoretical assumptions have been presented by Köhnken (1996). First, lying is seen as more cognitively challenging that telling the truth. Second, liars are expected to be more concerned with impression management than truth tellers. More precisely, a first subset of CBCA criteria is included because they are deemed too difficult to fabricate (e.g., descriptions of interactions with the perpetrator). Hence, their presence in a statement indicates an actual experience. The remainder of the CBCA criteria are concerned with the way an interviewee presents his or her story. It is expected that liars are concerned with how they are viewed by others and therefore leave out information that can possibly damage their view of being an honest person (e.g., mentioning self-deprecating information). Consequently, a truthful person is more likely to include these criteria in their statement than a deceptive person. RM, in contrast, is derived from memory research and holds that memories of real events are obtained through sensory processes, making them more clear, sharp, and vivid. Fabricated statements, on the other hand, are the result of fantasy and are usually more vague and less concrete (Johnson and Raye, 1981). Indeed, various studies reported supportive evidence for these methods. Their overall accuracy for detecting deceit varies around 70%, and is considerably higher than chance level (Undeutsch, 1967; Johnson and Raye, 1981; Steller and Köhnken, 1989; Masip et al., 2005; Vrij, 2005; Amado et al., 2015).
Despite the research showing above chance accuracy for CBCA and RM, their field use seems limited. A third method – that is used by Law enforcement worldwide – is Scientific Content Analysis (SCAN). SCAN was developed by former Israeli polygraph examiner Avinoam Sapir (2005), who – based on his experience with polygraph examinees – argues that people who tell the truth differ from liars in the type of language they use. Based on these assumed differences, Sapir developed criteria that, according to him, can assist in differentiating between true and fabricated statements, but without reporting a theoretical foundation as to why these specific criteria should differ. For example, SCAN includes the criterion “social introduction”. It is argued that people who are described in the statement should be introduced with name and role (e.g., My friend, John). If a person leaves out information (e.g., We stole the key), so leaving out the name, role or both, this indicates deception. Another criterion is the “structure of the statement”. According to SCAN, 20% of the statement should consist of information that led up to the event, 50% should be about the main event and 30% of the statement should be about what happened after the event. The more the statement deviates from this structure, the higher the likelihood that the statement is deceptive. Yet, in contrast to CBCA and RM, no theoretical rationale is presented, and there is no evidence that these criteria are actually diagnostic (Nahari et al., 2012; Bogaard et al., 2014a; Vanderhallen et al., 2015).
Research about SCAN is scarce, although the method is used worldwide (e.g., Australia, Belgium, Canada, Israel, Mexico, UK, US, the Netherlands, Qatar, Singapore, South Africa) and is also used by federal agencies, military law enforcement, private corporations, and social services (retrieved from www.lsiscan.com/id29.htm). Moreover, the third author asked during an investigative interviewing seminar which lie detection tool was used by the practitioners in the audience. These practitioners came from many different countries and the most frequent answer was SCAN (Vrij, 2008a). In a typical SCAN procedure, the examinee is asked to write down “everything that happened” in a particular period of time, to get a “pure version” of the facts (Sapir, 2005). This pure version is typically obtained without the interviewer interrupting or influencing the examinee. Next, a SCAN trained analyst investigates a copy of the handwritten statement, using several criteria that are described throughout the SCAN manual (Sapir, 2005). Criteria that are present within the written statements are highlighted according to a specific color scheme, circled or underlined. The presence of a specific criterion can either indicate truthfulness or deception, depending on the criterion itself. This SCAN analysis is then used to generate questions that could elucidate important details within the statement, and/or to make a judgment of the veracity of the statement. Although SCAN is used worldwide, it lacks a well-defined list of criteria, as well as a standardized scoring system. Bogaard et al. (2014b) has shown that 12 criteria primarily drove SCAN in sexual abuse cases, largely overlapping with the criteria list described in Vrij (2008a). Only six published studies examined the validity of SCAN (Driscoll, 1994; Porter and Yuille, 1996; Smith, 2001; Nahari et al., 2012; Bogaard et al., 2014a; Vanderhallen et al., 2015) of which only four were published in peer reviewed journals. The two studies that were not published in peer reviewed journals [Driscoll (1994) and Smith (2001)] were both field studies investigating suspect statements.
Driscoll (1994) investigated 30 statements that were classified as either apparently accurate or doubtful. With the help of SCAN, 84% of the statements could be classified correctly. In the study of Smith, five groups of experts were asked to analyze 27 statements. These statements were previously classified by police officers as truthful, false, or undecided. This classification was made on the basis of confessions and supportive evidence. Three groups consisted of SCAN trained officers that had minimal, moderate, or extensive experience with using SCAN. The two other groups consisted of newly recruited officers and experienced officers. The first three groups used SCAN to analyze the statements, while the latter two groups judged the veracity of the statements without using SCAN. Overall, the SCAN groups correctly judged 78% of the statements, which was similar to the accuracy of the experienced officers. At first glance, these results seem to support SCAN. Yet, in both studies ground truth of the statements was unknown and statements were categorized as either truthful or doubtful without having hard evidence supporting this categorisation. Moreover, it cannot be excluded that the SCAN outcome influenced the course of the investigation, and therefore the confessions and supporting evidence that was gathered. A typical problem that can occur in such studies is that errors are systematically excluded from the sample. For example, if a statement is erroneously judged as truthful, no further investigation takes place. This means that no evidence will be found revealing that an error has been made, and such erroneous classifications are then excluded from the sample. This way of selecting the sample may therefore be biased to overestimate SCAN’s accuracy (for more information see Iacono, 1991; Meijer et al., 2016). Moreover, in Smith’s study, it was unclear whether the three undecided statements were included in the reported analyses (Armistead, 2011).
The following four studies investigating SCAN were published in peer-reviewed journals. Porter and Yuille (1996) resolved the problem of ground truth by asking participants to commit a mock crime. However, they only investigated three SCAN criteria (i.e., unnecessary connectors, use of pronouns, and structure of the statement), and results indicated no significant differences between true and fabricated statements concerning these criteria. Nahari et al. (2012) asked six independent raters to assess the presence of 13 SCAN criteria within various true and fabricated statements. Results showed that SCAN did not discriminate between truthful and fabricated statements, a conclusion that was also supported by Bogaard et al. (2014a). In their study, participants were asked to write down one truthful and one fabricated autobiographical statement about a negative event that recently happened to them. Two raters indicated the presence of 12 SCAN criteria, but no significant differences emerged between truth tellers and liars. Vanderhallen et al. (2015), finally, asked SCAN trained police officers to classify four statements as either truthful or deceptive based on SCAN, and compared their accuracy to students and police officers who made this classification without the help of SCAN. The SCAN group had an average accuracy of 68%, police officers without SCAN 72%, and students 65%. The accuracy of the SCAN group did not significantly differ from the police officers who did not use SCAN. Consequently, from these results it was concluded that SCAN did not have an incremental value in detecting deceit.
Given that SCAN is used worldwide in police investigations, providing support, or the lack thereof, is not trivial (Meijer et al., 2009). Using a data set of 234 statements, the current study aimed at extending previous SCAN findings, and to investigate whether the different SCAN criteria can actually discriminate between truthful and fabricated statements. Although Nahari et al. (2012), Bogaard et al. (2014a), and Vanderhallen et al. (2015) investigated SCAN, they mainly focused on the SCAN total scores, and not on the separate criteria, or the accuracy of SCAN. Separate criteria scores were reported, but their power was too low to make any conclusions from these results. In contrast, Nahari et al. (2012) asked participants to perform a mock crime, meaning that the statements that were analyzed with SCAN were restricted to “false denials” (i.e., people who performed the mock crime but lied about it). Moreover, in the study of Vanderhallen et al. (2015) four statements on traffic accidents were used. The statements included in our study are broader than false denials or traffic accidents, as we requested participants to write about a negative autobiographical event. In this way, participants not only reported false denials, but also false allegations (i.e., stating they fell victim to a crime, while in fact they were not). Participants could report about whatever they preferred, thereby including various topics, as would also be the case in police investigations where SCAN is usually applied.
Materials and Methods
The study was approved by the standing ethical committee of the Faculty of Psychology and Neuroscience, Maastricht University.
All participants (N = 117) were first and second year health sciences students (i.e., Mental health or Psychology) of Maastricht University (37 men). The data of 85 participants were collected specifically for this study, while the remaining 32 came from the control group of Bogaard et al. (2014a). Instructions for these two datasets were identical, and they were combined to increase power. We report the analysis for the entire sample below, but also include the findings for the new dataset in Appendix B.
Participants could choose whether they wanted to receive one course credit or a 7,5 €; gift voucher for their participation. Approximately 50 students chose the gift voucher over the course credit. All participants read and signed a letter of Informed Consent before they took part in this study. Participants had a mean age of 21 years (SD = 2.35). The experiment was approved by the appropriate standing ethical committee.
Upon arrival in the lab, participants were told that the study was about the accuracy of verbal lie detection methods. Participants were asked to write about a truthful and a fabricated event. The order in which participants wrote these statements was randomized. Approximately half of the participants started with the truthful statement, the other half started with the fabricated statement. For the truthful statement participants received the following instruction: “For this study we ask you to think about an event you actually experienced. More specifically, this event should be about a recent negative experience; think about a financial, emotional or physical negative event you’ve been through the past months.” For the fabricated statement participants received the following instruction: “For this study we ask you to think about an event that you have not actually experienced. This event should be about a recent negative experience; think about a financial, emotional, or physical negative event you could have been through the past months. This event should not be based on something that actually happened to you or your friends or family. Please pretend as if this event took place somewhere in the previous months. Although the story should be fabricated, the statement should consist of a realistic scenario.” After the instruction, participants had the opportunity to think about a real and a fabricated story for a maximum of 5 min. Participants were assured that their stories would be treated confidentially and anonymously. They were told that the length of the stories should be approximately one written page (A4). No time limit was set for the production of the statements.
After participants finished their stories, these were analyzed by four raters. One rater completed the three-day SCAN course. The other three raters received a 2-h training about SCAN, using the SCAN manual (Sapir, 2005), given by the SCAN trained rater. Moreover, they received the appropriate pages of Vrij (2008a) about SCAN (Chapter 10; 282-287). During the training all 12 criteria were discussed separately and examples of the specific criteria were presented and discussed. Next, raters received two practice statements of one page each, and were asked to analyze these statements. After all raters analyzed these statements, their analyses were discussed and questions they still had about SCAN were answered. When the training was completed, raters started analyzing the statements.
Although the raters were not blind to the aim of the study, they were blind to the veracity of the statements. The first author served as one of the raters, the other raters were not otherwise involved in the study and were research assistants of the first author. The rater who completed the original SCAN training scored all 234 statements, while the other three raters scored approximately 80 statements each. In order to control for potential order effects, the sequence of the statements to be scored was varied from rater to rater. Rater A scored all statements in the order of 1–234, while the other raters scored the statements in the reverse order (rater B started from 79 to 1, rater C started from 157 to 80 and rater D from 234 to 158).
A total of 12 criteria (Vrij, 2008a) were coded within the statements. According to SCAN, seven of these criteria indicate truthfulness: (1) denial of allegations, (2) Social introductions, (3) Structure of the statement, (4) Emotions, (5) Objective and subjective time, (6) First person singular, past tense, (7) Pronouns, while the remaining five indicate deception: (8) Change in language (9) Spontaneous corrections (10) Lack of conviction or memory (11) Out of sequence and extraneous, (12) Missing information. See Appendix A for a complete description of the different criteria. All criteria that are expected to indicate truthfulness were scored on a three-point scale ranging from 0 (not present) to 2 (strongly present), while the five criteria that are expected to indicate deception were scored in reverse, ranging from -2 (strongly present) to 0 (not present). By using this scoring system, a higher score indicates a higher likelihood that the statement is truthful and vice versa.
Inter-rater reliability was calculated by means of Cohen’s Kappa for each of the 12 separate criteria. The Kappa values for the truthful statements varied from 0.60 to 1 with an average Cohen’s Kappa of 0.77. The Kappa values for the fabricated statements varied from 0.65 to 1, with an average kappa of 0.78. These results indicated that there is high agreement between the raters (Landis and Koch, 1977). Because variance was low for several criteria, Cohen’s Kappa could give a distorted image of the actual inter-rater reliability. Therefore, we also included inter-rater agreement calculated by means of percentage agreement and its presence in the statement. Therefore, we dichotomized the original data set with presence coded as 1 and absence as 0. High agreement was achieved for all SCAN criteria ranging from 80.34 to 100% with an average of 90.56%. The scoring of the three raters was always compared to those of the rater that completed the SCAN training. As reliability showed to be sufficient, this also showed that our 2-h SCAN training was sufficient to score the investigated SCAN criteria reliably.
Because the inter-rater reliability was high, we averaged the scores of the two raters for each criterion. Due to the nature of our instructions (i.e., autobiographical statements) the first criteria could not have been coded in the statements. As such, we have left out “denial of allegations” in the following analysis. Next, we calculated the sum scores for each statement by summing up the averaged scores of the separate criteria. To investigate the discriminability of SCAN, we conducted several Generalized Estimation Equation (GEE) analyses (see for example Burton et al., 1998); one for each separate criterion. Moreover, we conducted a paired samples t-test for the sum score, and a discriminant analysis to test SCAN’s predictive power concerning the veracity of the statements.
Number of Words
The length of the statements did not significantly differ between the true (M = 265.42; SD = 85.48) and fabricated statements (M = 261.86; SD = 88.12) [t(116) = 0.63, p = 0.53, d = 0.04].
SCAN Criteria Scores
Table 1 shows the mean differences in each of the SCAN criteria as a function of veracity. To analyze the separate criteria, we have dichotomized our data by recoding presence as 1 (regardless of whether the score was a 1 or a 2) and absence as 0. Next, we analyzed the data with GEE in order to investigate the differences between truthful and fabricated statement for each of the separate criteria. Due to very low variability of the criterion “pronouns” (i.e., it was present in almost all of the statements), this criterion was left out of the analysis. To correct for multiple testing we used an alpha level of 0.01. As Table 2 shows, only one criterion significantly differed between the statements, namely “Change in language”. Participants included more changes in language in their fabricated statements compared to their truthful statements. This criterion was present in 29 out of 117 fabricated statements (24.8%) and in 14 out of 117 true statements (12%). In Appendix B (Table B1) we have presented the results of only the new data, and results showed again that “Change in language” significantly differed between statements.
Means, standard deviations and percentage present for each SCAN criterion as a function of veracity.
Overview of parameters from the GEE analysis.
SCAN Sum Scores
Results indicated that there were no differences in SCAN sum scores between true (M = 5.33; SD = 2.10) and fabricated (M = 5.15; SD = 2.25) statements [t(116) = 0.77, p = 0.44, d = 0.12].
Lastly, we conducted a discriminant analysis to investigate whether the SCAN criteria were able to predict veracity. As can be seen in Table 3, only one significant mean difference was observed, and this was for “Change in language” (p < 0.01). The discriminate function revealed a low association between veracity and SCAN criteria, only accounting for 7.20% of the variability. Closer analysis of the structure matrix revealed that three criteria that had moderate discriminant loadings (i.e., Pearson coefficients), these were – again – “Change in language” (0.664), “Structure of the statement” (0.412), and “Social introduction” (-0.353). The uncorrected model resulted in correct classification of 59% of the truth tellers, and 65% of the liars. The cross-validated classification, however, showed that 49.60% of the liars and 53% of the truth tellers were correctly classified, thereby showing that SCAN performed around chance level. In Appendix B (Table B2), we have presented the results of only the new data, and results showed to be similar. The uncorrected model resulted in a correct classification of 63% of the truth tellers, and 58% of the liars. The cross-validated classification showed that 50% of the liars and 55% of the truth tellers were correctly classified, again showing that SCAN performed around chance level.
Detailed overview of discriminant analysis coefficients.
In the current study, we failed to find support for SCAN as a lie detection method. The total SCAN score did not significantly differ between true and fabricated statements, so confirming previous results (Nahari et al., 2012; Bogaard et al., 2014a). Interestingly, for a subset of our data CBCA and RM sum scores were coded and did discriminate between the truthful and fabricated statements (Bogaard et al., 2014a). As such, it seems that the absence of significant SCAN findings cannot be attributed to the quality of the statements used in this study. Furthermore, we investigated the separate SCAN criteria, and only one criterion “Change in language” significantly differentiated between true and fabricated statements; participants changed their language more in their fabricated statements compared to their truthful statements.
Interestingly, the criterion “Change in language” is not described in other verbal credibility methods (e.g., CBCA, RM). Therefore, our findings concerning this criterion are noteworthy. Sapir (2005) explained in his manual that especially words describing family members (e.g., mother, father, dad, mom, etc.), people (e.g., someone, individual, man, guy, etc.), communication (e.g., told, spoke, talked, etc.), transport (e.g., vehicle, car, truck, etc.) and weapons (e.g., gun, rifle, revolver, pistol, etc.) should be investigated carefully. The idea is that such a change indicates something has altered in the mind of the writer. When the events in the statements justify this change it does not indicate deception per se, however, in all other cases these changes indicate deceit. But what exactly is meant by a justification is not described in the manual. Consequently, due to the absence of clear guidelines on verifying whether a change is justified, the current study scored all changes in language as a cue to deceit, and might therefore differ from how SCAN is used in practice.
Both the analyses of the SCAN sum score and the discriminant analysis showed SCAN did not perform above chance level. This chance level performance can be understood when looking at various contradicting interpretations of its criteria compared with CBCA. More precisely, both methods describe “spontaneous corrections” and “lack of conviction or memory”, but differ in their use. For CBCA both criteria are interpreted as a sign of truthfulness, while for SCAN both criteria are interpreted as a sign of deceit. Commonsensically, only one interpretation can be correct. As CBCA is far more embedded in the scientific literature and has been shown to detect deceit above chance level (Vrij, 2005; Amado et al., 2015), CBCA’s interpretations should be favored over SCAN. Also, SCAN does not consider criteria involved in judging distinctive types of details. Both CBCA and RM consist of various types of details that have to be checked. For example, with these methods it is checked whether there is information in the statement about when (i.e., temporal details) and where (i.e., spatial details) the event took place, about what the writer saw during the event (i.e., visual details) and whether there were any other perceptual details (i.e., smells, tastes, sensations, sounds). Research showed that especially these types of criteria are significantly more present in truthful compared to fabricated statements (DePaulo et al., 2003; Masip et al., 2005; Vrij, 2005).
Relatedly, recent meta-analytical research reveals that passively observing cues only has a limited influence on our deception detection abilities, as most of these cues are generally weak (Hartwig and Bond, 2011). The authors argue we should actively increase the verbal and non-verbal differences between liars and truth tellers. Various techniques have already been suggested, such as focusing on unanticipated questions during the interrogation (Vrij et al., 2009), applying the Strategic Use of Evidence technique (Granhag et al., 2007) or inducing cognitive load (Vrij et al., 2006, 2008, 2011, 2012). SCAN fails to actively influence the information that is provided by the interviewee, which potentially contributes to its chance performance.
Finally, users of SCAN may argue that the way SCAN is tested in laboratory studies such as these, is far from how it is applied in the field, and that the results therefore do not translate. However, the diagnostic value of SCAN and its criteria lies within its capabilities of discriminating between truthful and fabricated statements. SCAN makes no assumptions as to why or when these differences between truths and lies occur, only that they occur. As such, also laboratory studies – for example where participants are asked to fabricate a negative event – should be able to pick up such differences, if they exist. Moreover, it has proven to be exceptionally difficult to test the accuracy of SCAN in field studies as the reliability of SCAN has shown to be extremely low (Bogaard et al., 2014b; Vanderhallen et al., 2015). The only way to control for this low reliability is to use a more standardized scoring system, as we have done so in the current study. For example, as is mentioned previously, SCAN does not consist of a fixed list of criteria, and the criteria are not scored on a scale. In field studies, SCAN analysts write a report about the presence or absence of the criteria, and on the basis of this report, they make a conclusion about the truthfulness of the statement. As such, it is unclear how many criteria are actually taken into consideration when making a judgment, and whether these criteria are weighed equally.
Scientific Content Analysis has no empirical support to date, and fails to include criteria investigating different types of details. Only one criterion showed potential for lie detection research, but has to be investigated more thoroughly in order to overcome the problems that are inherent to SCAN and its criteria (e.g., vague description, ambiguous interpretation). As a result, we discourage the application of SCAN in its current form.
All authors have made substantial contributions to the conception and design of the work, GB acquired and analyzed the data, and all authors interpreted the data and revised the article concerning its content and approved the current version.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research was funded by a grant from the Ministry of the Interior and Kingdom Relations of the Netherlands.
- Amado B. G., Arce R., Fariña F. (2015). Undeutsch hypothesis and criteria based content analysis: a meta-analytic review.Eur. J. Psychol. Appl. Legal Context7 1–10. 10.1016/j.ejpal.2014.11.002 [Cross Ref]
- Armistead T. W. (2011). Detecting deception in written statements: the british home office study of scientific content analysis (Scan).Policing34 588–605. 10.1108/13639511111180225 [Cross Ref]
- Bogaard G., Meijer E., Vrij A. (2014a). Using an example statement increases information but does not increase accuracy of Cbca, Rm, and Scan.J. Investig. Psychol. Offender Profil.11 151–163. 10.1002/jip.1409 [Cross Ref]
- Bogaard G., Meijer E., Vrij A., Broers N. J., Merckelbach H. (2014b). Scan Is Largely driven by 12 criteria: results from field data.Psychol. Crime Law20 430–449. 10.1080/1068316X.2013.793338 [Cross Ref]
- Bond C. F., DePaulo B. M. (2006). Accuracy of deception judgments.Pers. Individ. Differ.10 214–234. [PubMed]
- Burton P., Gurrin L., Sly P. (1998). Tutorial in biostatistics. Extending the Simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modeling.Stat. Med.17 1261–1291. 10.1002/(SICI)1097-0258(19980615)17:11<1261::AID-SIM846>3.0.CO;2-Z [PubMed][Cross Ref]
- DePaulo B. M., Lindsay J. J., Malone B. E., Muhlenbruck L., Charlton K., Cooper H. (2003). Cues to deception.Psychol. Bull.129 74–118. 10.1037/0033-2909.129.1.74 [PubMed][Cross Ref]
- Driscoll L. (1994). A validity assessment of written statements from suspects in criminal investigations using the scan technique.Police Stud.4 77–88.
- Granhag P. A., Strömwall L. A., Hartwig M. (2007). The sue-technique: the way to interview to detect deception.Forensic Update88 25–29.
- Hartwig M., Bond C. F. (2011). Why do lie-catchers fail? A lens model meta-analysis of human lie judgments.Psychol. Bull.137 643–659. 10.1037/a0023589 [PubMed][Cross Ref]
- Hauch V., Blandón-Gitlin I., Masip J., Sporer S. L. (2014). Are computers effective lie detectors? A meta-analysis of linguistic cues to deception.Pers. Soc. Psychol. Rev.19 307–342. 10.1177/1088868314556539 [PubMed][Cross Ref]
- Iacono W. G. (1991). “Can we determine the accuracy of polygraph tests?,” in Advances in Psychophysiology, eds Jennings J. R., Ackles P. K., Coles M. G. H., editors. (London: Jessica Kingsley Publishers; ), 201–207.
- Johnson M. K., Raye C. L. (1981). Reality monitoring.Psychol. Rev.88 67–85. 10.1037/0033-295X.88.1.67 [Cross Ref]
- Köhnken G. (1996). “Social psychology and the law,” in Applied Social Psychology, eds Semin G., Fiedler K., editors. (London: Sage Publication; ), 257–282.
- Landis J. R., Koch G. G. (1977). The measurement of observer agreement for categorical data.Biometrics33 159–174. 10.2307/2529310 [PubMed][Cross Ref]
- Masip J., Sporer A. L., Garido E., Herrero C. (2005). The detection of deception with the reality monitoring approach: a review of the empirical evidence.Psychol. Crime Law11 99–122. 10.1080/10683160410001726356 [Cross Ref]
- Meijer E. H., Verschuere B., Gamer M., Merckelbach H., Ben-Shakhar G. (2016). Deception detection with behavioral, autonomic, and neural measures: conceptual and methodological considerations that warrant modesty.Psychophysiology 10.1111/psyp.12609 [Epub ahead of print]. [PubMed][Cross Ref]
- Meijer E. H., Verschuere B., Vrij A., Merckelbach H., Smulders F., Leal S., et al. (2009). A call for evidence-based security tools.Open Access J. Forensic Psychol.1 1–4.
- Nahari G., Vrij A., Fisher R. P. (2012). Does the truth come out in the writing? Scan as a lie detection tool.Law Hum. Behav.36 68–76. 10.1037/h0093965 [PubMed][Cross Ref]
- Newman M. L., Pennebaker J. W., Berry D. S., Richards J. M. (2003). Lying words: predicting deception from linguistics styles.Pers. Soc. Psychol. Bull.29 665–675. 10.1177/0146167203029005010 [PubMed][Cross Ref]
- Porter S., Yuille Y. C. (1996). The language of deceit: an investigation of the verbal clues to deception in the interrogation context.Law Hum. Behav.20 443–458. 10.1007/BF01498980 [Cross Ref]
- Sapir A. (2005). The Lsi Course on Scientific Content Analysis (Scan). Phoenix, AZ: Laboratory for Scientific Interrogation.
- Smith N. (2001). Reading between the lines: an evaluation of the scientific content analysis technique (Scan).Police Res. Series Paper135 1–42.
- Sporer S. L., Schwandt B. (2007). Moderators of nonverbal indicators of deception.Psychol. Public Policy Law13 1–34. 10.1037/1076-8922.214.171.124 [Cross Ref]
- Steller M., Köhnken G. (1989). “Criteria based statement analysis,” in Psychological Methods in Criminal Investigation and Evidence, ed. Raskin D. C., editor. (New York, NY: Springer; ), 217–245.
- Undeutsch U. (1967). “Beurteilung der glaubhaftigkeit von aussagen,” in Handbuch Der Psychologie Vol 11: Forensische Psychologie, ed. Undeutsch U., editor. (Göttingen: Hogrefe; ).
- Vanderhallen M., Jaspaert E., Vervaeke G. (2015). Scan as an investigative tool.Police Pract. Res. 1–15. 10.1080/15614263.2015.1008479 [Cross Ref]
- Vrij A. (2005). Criteria based content analysis: a qualitative review of the first 37 studies.Psychol. Public Policy Law11 3–41. 10.1037/1076-89126.96.36.199 [Cross Ref]
- Vrij A. (2008a). Detecting Lies and Deceit: Pitfalls and Opportunities. Chichester: Wiley.
- Vrij A. (2008b). Nonverbal dominance versus verbal accuracy in lie detection: a plea to change police practive.Crim. J. Behav.35 1323–1336. 10.1177/0093854808321530 [Cross Ref]
- Vrij A., Fisher R., Mann S., Leal S. (2006). Detecting deception by manipulating cognitive load.Trends Cogn. Sci.10 141–142. 10.1016/j.tics.2006.02.003 [PubMed][Cross Ref]
- Vrij A., Fisher R., Mann S., Leal S. (2008). A cognitive load approach to lie detection.J. Investig. Psychol. Offender Profiling5 39–43. 10.1002/jip.82 [Cross Ref]
- Vrij A., Granhag P. A., Mann S., Leal S. (2011). Outsmarting the liars: toward a cognitive lie detection approach.Curr. Dir. Psychol. Sci.20 28–32. 10.1177/0963721410391245 [Cross Ref]
- Vrij A., Leal S., Granhag P. A., Mann S., Fisher R., Hillman J., et al. (2009). Outsmarting the liars: the benefit of asking unanticipated questions.Law Hum. Behav.33 159–166. 10.1007/s10979-008-9143-y [PubMed][Cross Ref]
- Vrij A., Leal S., Mann S., Fisher R. (2012). Imposing cognitive load to elicit cues to deceit: inducing the reverse order technique naturally.Psychol. Crime Law18 579–594. 10.1080/1068316X.2010.498422 [Cross Ref]