0% found this document useful (0 votes)
7 views

P1 Automated Recognition Chatv2

The document discusses the need for an automated system to recognize chat-based social engineering attacks. It provides background on social engineering and discusses previous research. The authors present their approach for an automated recognition system for chat-based attacks based on personality, influence, deception, speech acts and chat history.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

P1 Automated Recognition Chatv2

The document discusses the need for an automated system to recognize chat-based social engineering attacks. It provides background on social engineering and discusses previous research. The authors present their approach for an automated recognition system for chat-based attacks based on personality, influence, deception, speech acts and chat history.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Towards an Automated Recognition System for Chat-based

Social Engineering Attacks in Enterprise Environments


Nikolaos Tsinganos Georgios Sakellariou
University of Macedonia University of Macedonia
Thessaloniki, Greece Thessaloniki, Greece
tsinik@uom.edu.gr geosakel@uom.edu.gr

Panagiotis Fouliras Ioannis Mavridis


University of Macedonia University of Macedonia
Thessaloniki, Greece Thessaloniki, Greece
pfoul@uom.edu.gr mavridis@uom.edu.gr

ABSTRACT ACM Reference Format:


Increase in usage of electronic communication tools (email, IM, Nikolaos Tsinganos, Georgios Sakellariou, Panagiotis Fouliras, and Ioan-
nis Mavridis. 2018. Towards an Automated Recognition System for Chat-
Skype, etc.) in enterprise environments has created new attack
based Social Engineering Attacks in Enterprise Environments. In ARES
vectors for social engineers. Billions of people are now using elec- 2018: International Conference on Availability, Reliability and Security, Au-
tronic equipment in their everyday workflow which means billions gust 27–30, 2018, Hamburg, Germany. ACM, New York, NY, USA, 10 pages.
of potential victims of Social Engineering (SE) attacks. Human is https://doi.org/10.1145/3230833.3233277
considered the weakest link in cybersecurity chain and breaking
this defense is nowadays the most accessible route for malicious
internal and external users. While several methods of protection
1 INTRODUCTION
have already been proposed and applied, none of these focuses on
chat-based SE attacks while at the same time automation in the In an assessment made in 2006 about users’ awareness in Social
field is still missing. Social engineering is a complex phenomenon Engineering (SE) methods in the form of email phishing attacks,
that requires interdisciplinary research combining technology, psy- Karakasiliotis et al. [17] reported that out of 179 participants 36%
chology, and linguistics. Attackers treat human personality traits were successful in identifying legitimate emails, versus 45% that
as vulnerabilities and use the language as their weapon to deceive, were successful in spotting illegitimate ones. Almost ten years
persuade and finally manipulate the victims as they wish. Hence, a later, in a similar assessment, Verizon in the 2015 Data Breach
holistic approach is required to build a reliable SE attack recogni- Investigation Report [37] presented the results of a test conducted
tion system. In this paper we present the current state-of-the-art by sending 150,000 emails; they reported that within the first hour,
on SE attack recognition systems, we dissect a SE attack to rec- 50% of the recipients had opened the email and clicked on phishing
ognize the different stages, forms, and attributes and isolate the links. The first user clicked the phishing link only after 82 seconds.
critical enablers that can influence a SE attack to work. Finally, we Social Engineering is also recognized as the second reason for
present our approach for an automated recognition system for chat- security breaches at 35%, right behind traditional hacking methods.
based SE attacks that is based on Personality Recognition, Influence Furthermore, today, it is a very common practice in workspaces to
Recognition, Deception Recognition, Speech Act and Chat History. enable employees to use their own computers or other electronic
mobile devices under ’bring your own device’ (BYOD) policies.
This increase in working at home magnifies the SE problem due
CCS CONCEPTS
to insufficiently protected personal computers. A social engineer’s
• Security and privacy → Phishing; • Computing methodolo- successful attack on an employee could results also in compromise
gies → Supervised learning; of entire employer’ s information system.
Until now, various methods have been used in order to pro-
KEYWORDS tect the weakest link in the cyber security chain, the human. Such
Social Engineering, Personality, Persuasion, Deception, Speech Act methods are penetration tests using social engineering techniques,
security awareness training programs for the employees, creation
and enforcement of corporate cyber security policies and develop-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
ment of security-aware organizational culture. Trying to uncover
for profit or commercial advantage and that copies bear this notice and the full citation the social engineer’s behavior, cyber security researchers noticed
on the first page. Copyrights for components of this work owned by others than ACM that this category of attacks needed an interdisciplinary approach
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a that would help understand the inner workings of the attack, and
fee. Request permissions from permissions@acm.org. the methods of social engineers in combination with the psycho-
ARES 2018, August 27–30, 2018, Hamburg, Germany logical characteristics of the human being manipulated. SE attacks
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6448-5/18/08. . . $15.00
are here to stay and threaten all users in enterprises, government
https://doi.org/10.1145/3230833.3233277 agencies and every single individual.
1
ARES 2018, August 27–30, 2018, Hamburg, Germany N. Tsinganos et al.

Although, much research have been done regarding several fabricated dataset composed of three phone conversations between
forms of SE attacks, the rise of cyber communication tools usage professional social engineers and unaware victims). The results on
is a strong motivation to design stronger defenses for chat-based the second corpus (Supreme Court Dialog Corpus) are showing
SE attacks. While many of social engineering attack vectors and zero false positive. The researchers did not present an accuracy
different communication channels exist, direct human-to-human value, while at the same time the dataset is very small for measuring
communication offers attackers critical advantage and instanta- precision and recall. Due to the same reason (small dataset with
neous results. only three conversations) the results of precision and recall are
In Section 2, a comprehensive literature review for the current weak as a success measure. Furthermore, the researchers did not
state-of-the-art in SE attack recognition systems, focusing on at- take into account any context information during the classification
tacks that involve text-based conversation between the attacker process. Therefore, the algorithm is unaware of the intricacies of
and the victim, is presented. Section 3 summarizes our findings re- the specific environment it is operated upon. Another disadvantage
garding the SE attack cycle, the various forms of SE attacks and the is that the process is not fully automated, since the creation of the
related attack attributes. Section 4 presents the SE attack enablers, TBL requires human involvement. Furthermore, the authors did
namely human Personality, influence, deception, speech act and not consider the target as a factor of influence in the process and
chat history. Our proposed approach towards an automated recog- they did not use cognitive models or any other personality traits.
nition system for chat-based SE attacks in enterprise environments Finally, Uebelacker et al. [36] propose a SE attacks taxonomy
is presented in Section 5 and we conclude in Section 6. based on Cialdini’s Influence principles. More specifically, they
study the relationship between the Big-5 Theory (personality traits)
and Cialdini’s influence (Persuasion) principles and finally propose
2 RELATED WORK
a theory-based SE Personality Framework (SEPF). Moreover, they
Hoeschele and Rogers [14] presented the Social Engineering De- propose a complete research road map for their future work on
fense Architecture (SEDA) - an architecture for detecting social SEPF . They define three domains related to cyber security, namely:
engineering attacks over phone conversation in real-time. The archi- physical, digital, and social. They focus on the social domain related
tecture uses a storing facility to save caller details (voice signature, to the victims (employees) excluding the attackers. After a thorough
etc.) and can provide authentication services, too. Subsequently, study of the related literature they summarize their findings as
Hoeschele [15], presented a SEDA proof-of-concept model where follows: "Conscientiousness, Extraversion, and Openness... show both
some simple SE attack detection processes were implemented along increased and decreased susceptibility to SE depending on context and
with a database to store all gathered information. The model man- sub-traits". Furthermore, "Agreeableness increases and Neuroticism
aged to process and then detect all attacks resulting in 100% accu- decreases susceptibility to SE".
racy. Nevertheless, that system lacks the use of previous activity
history and personality recognition characteristics for both attacker
and victim. In [3] the authors propose an architecture called So-
3 BACKGROUND
cial Engineering Attack Detection Model (SEADM). Their system
helps users decide by using a simple binary decision tree model. In a typical social engineering attack, the attacker acts in a predeter-
The authors make many unrealistic assumptions in order to jus- mined manner, where she initially gathers information using every
tify the logic behind their proposed system. SEADM had a second possible technique or tool, then approaches the potential victim and
chance in [23] and also an android implementation as a proof of develops a trust relationship. Next, she exploits this trust relationship
concept. The authors revised SEADM to cater for SE attacks utiliza- to manipulate the victim to perform an action that would enable her
tion of unidirectional, bidirectional and indirect communication to violate the respective information system. At the final stage, the
between the attacker and the victim. The proposed and revised attacker reaches her original target violating a CIA triad member
SEADMv2 extends the previous model. Bhakta et al. [4] argue that (confidentiality, integrity, availability) of informational resources.
the most effective SE attacks involve a dialog between the attacker In order for the attacker to develop a trust relationship, she relies
and the victim. Their approach uses a predefined Topic Blacklist on specific human (victim) personality traits treating them as vulner-
(TBL) against which dialog sentences are checked. The TBL is man- abilities and adapting her tactics accordingly. Her aim is to influence
ually populated with pairs of verbs (actions) and nouns (objects) the victim’s way of thinking, and to persuade him to behave in a
using security policy documents or other expert knowledge. The di- mistaken way. The act of deception is underlying throughout the at-
alogues are then processed using natural language processing (NLP) tacker’s effort. A communication scenario between the SE attacker
techniques. The authors claim 100% precision and 88.9% recall using and her victims involves message exchange through an electronic
their approach. Unfortunately, they do not present a classification chat system. This is the point where our efforts on recognizing SE
accuracy value. attacks are focusing.
Following the above work, Sawa et al. [29] used more advanced A SE attack is mainly related to deception and concerns every
language processing techniques striking a balance between syn- human activity, making it difficult to precisely predefine and recog-
tactic and semantic analysis. A handful of various tools are used nize it by only syntactic or semantic analysis of the chat messages.
(Stanford parser, Penn tagset symbols, Tregex tool and others) in Furthermore, human language ambiguity makes discriminating a
order to generate a parse tree and then search for questions or sentence as malicious or not, even harder. To cope with this chal-
commands. This approach is still using the TBL and the results lenge, a researcher has to employ a toolkit (e.g., machine learning
shown are 100% precision and 60% recall for the first corpus (a tools) to process all available data and to infer in a probabilistic

2
Automated recognition of Chat-based SE attacks ARES 2018, August 27–30, 2018, Hamburg, Germany

way. Moreover, for an automated SE attack recognition system to


be efficient it has to embrace several scientific disciplines.

3.1 SE Attack Cycle


SocialEnдineerinд is defined in [31] as "a deceptive process in which
crackers ’engineer’ or design a social situation to trick others into
allowing them access to an otherwise closed network, or into be-
lieving a reality that does not exist." According to Mitnick et al. [22]
a SE attack, also known as the SE attack cycle, is composed of four
stages:
• Information Gathering (IG)
• Development of Relationship (RD)
• Exploitation of Relationship (RE)
• Execution to achieve objective (EX)
The attacker gathers information from various public sources at
"Information Gathering", develops a trusting relationship with the
victim at "Relationship Development", exploits this relationship in
order to steal valuable information at stage "Relationship Exploita-
tion" and finally, having all necessary knowledge, attacks the real Figure 1: SE attack attributes.
target in stage "Execution". These four stages correspond to the
attacker’s steps during a SE attack. For an attacker to be success-
ful and move from one stage to the other some conditions should
which is one of socio-technical, technical, physical or social attack
be met. We focus on these conditions, which we call SE Attack
category. This taxonomy seems more agile and easy to classify
Enablers. ISACA [1] defines enablers as "Factors that, individually
existing or new attack vectors as shown in the same work.
and collectively, influence whether something will work". SE Attack
Fig. 1 summarizes the previous works and presents a unified
Enablers are further discussed in section 4.
view of the most common attack attributes: actor, approach, method,
route, technique and medium used to manipulate victims. Our work
3.2 SE Attack Attributes focus is shown in bordered, bold fonts for every different attribute.
A SE attack can be either human- or computer-based. In human-
based attacks we have a human-to-human interaction (e.g., phone 4 SE ATTACK ENABLERS
conversation), while computer-based attacks require the use of a
Social engineering is a term that characterizes the general phe-
digital medium [26]. SE attacks can also be categorized as direct,
nomenon of deception involving the field of information systems.
if the attacker is interacting with the victim (phone conversation,
Its success depends on specific traits of human personality. These
social media chat, etc.) or as indirect if some electronic medium
personality traits define the way of human behavior. Our interested
mediates (phishing email, rogue website, etc.).
lies in traits that:
In [24], the author proposes a new model in order to describe the
SE attack cycle. This model is called "The cycle of Deception" and is • Enhance the attacker’s ability to influence and deceive.
more of a conceptual model that combines models for the defense • Make the victim vulnerable to manipulation.
cycle, the victim behavior cycle and the attack cycle. Janczewski An employee’s previous conversations can also help us draw a
et al. [16] conducted an interview experiment of IT practitioners more complete picture of his vulnerability level and trigger an alarm
and proposed the following concepts as relevant to every SE at- with more confidence if a threshold is exceeded. In the following
tack: people, security awareness, psychological weakness, technol- sub-sections, the main SE attack enablers are presented that, in our
ogy, defenses, attack methods, security strategy, technical controls, belief, are decisive for the success or failure of a SE attack.
security-enhanced product, and education.
In [34], Tetri et al. tried to analyze functions of different tech- 4.1 Personality
niques by extrapolating three dimensions: persuasion, fabrication,
and data gathering in which they dissect all SE attacks to be easier In psychology, human personality "refers to individual differences
to understand. Heartfield et al. [12] claim that SE attacks aiming in characteristic patterns of thinking, feeling and behaving" and, al-
at deceiving the user by means of phishing emails, scareware, or though there is no universal acceptance, the Big-5 Theory analyzes
spoofed websites are semantic attacks. The authors present a tax- a five-factor model (FFM) of the personality traits, or otherwise
onomy for semantic attacks and defense mechanisms. Another called factors to classify personalities. These factors are believed to
interesting taxonomy of SE attacks is presented by Krombholz et al. capture most of the individual differences in terms of personality.
in [18], where the authors define three main categories: Channel The five factors, usually measured between 0 and 1, are [33]:
which is the medium that the attacker uses (e.g., email, telephone, • conscientiousness: "The degree to which individuals are hard-
physical to contact the target), Operator which is a way to differ- working, organized, dependable, reliable, and persevering ver-
entiate between human-based and automated attacks, and Type sus lazy, unorganized, and unreliable."
3
ARES 2018, August 27–30, 2018, Hamburg, Germany N. Tsinganos et al.

Table 1: Mapping of Influence Principles and Factors.

Cialdini (2001) [27] Harl (1997) [11] Gragg (2003) [8] Granger (2001) [9] Peltier (2006) [26]
Authority Authority Impersonation
Scarcity Strong Affect, Overload-
ing
Liking & Similarity Diffusion of Responsi- Deceptive Relationship, Ingratiation, Imper- Diffusion of Respon-
bility, Personal Persua- Diffusion of Responsi- sonation, Diffusion sibility, Ingratiation,
sion, Ingratiation bility of Responsibility, Trust Relationship,
Friendliness
Reciprocation Co-operation Reciprocation
Social Proof Involvement, Moral Moral Duty Guilt
Duty
Commitment & Consis- Conformity Integrity/Consistency Conformity
tency

• extraversion: "The extent to which individuals are gregarious, two types of influence: compliance and persuasion. Using persua-
assertive, and sociable versus reserved, timid, and quiet." sion the attacker sends a message and then the victim changes his
• agreeableness: "The degree to which individuals are coopera- behavior, attitude or knowledge as a result of the received message.
tive, warm, and agreeable versus cold, disagreeable, rude, and Compliance forces the change of a behavior as a result of a direct
antagonistic." request. The request can be explicit (hard) or implicit (soft). Cialdini
• openness: "the extend to which an individual has richness in conducted experiments and field studies on sales and marketing
fantasy life, aesthetic sensitivity, awareness of inner feelings, department employees, and defined six influence principles:
need for variety in actions, intellectual curiosity, and liberal
values." • Reciprocation: a social norm that make us repay others for
• neuroticism: "the degree to which one has negative effect, and what we have received. It builds trust between humans and
also disturbed thoughts and behaviors that accompany emo- we are all trained to adhere or suffer severe social disapproval.
tional distress" Humans feel obliged after receiving a gift.
• Commitment and Consistency: humans commit by stating
Research in [5], [21], reports that high values on conscientious- who they are, based on what they do or think. They also
ness, extraversion and openness sometimes increase and sometimes like to be consistent because that builds character. Attackers
decrease susceptibility to SE attacks. High values on agreeableness exploit that kind of belief by initially asking for a small favor,
increase susceptibility and high values on neuroticism decrease then a bigger one and finally the big bad favor. Humans
susceptibility to SE attacks. The results are contradictory in many that have already served an attacker feel they have to show
situations and they do not lead to a direct conclusion. Up till now, commitment and be consistent with their prior behavior.
researchers have examined the relation between personality traits • Social Proof: humans tend to believe what others do or think
and social engineering by combing knowledge of human behav- as right.
ior in other fields (marketing, etc.). It would be of great benefit to • Liking: if someone likes us and makes it obvious, it is hard
analyze and measure the exact relation of personality traits with to resist not to like him back. After that it is easier for him
specific SE techniques. to ask us a favor and difficult for us to deny him one. On the
Nevertheless, after the work of [19], several attempts have been opposite direction we all want to be liked
made to exploit the results and apply the findings to different re- • Authority: humans tend to trust and obey experts or someone
search fields. in a high hierarchical position. It is difficult for an employee
to deny a request from an IT manager, for example.
4.2 Influence • Scarcity: limited information leads to wrong decisions and
limited resources are more desirable. If an attacker knows
As Schneier points out [28], human risk perception has evolved that an employee wants a specific application then she can
over thousands of years. Nevertheless, progress in technology has offer it (after injecting an exploit), or claim a reason to request
changed our lives very fast without allowing enough time for our a favor based on evidence that only the user possesses.
risk perception to adjust to new threats. This vulnerability in hu-
man design is exploited by social engineers and then transferred Apart from Cialdini, many researchers tried to capture the psy-
to information systems to compromise them. Schneier discusses chological aspects of human behavior related to influence. Gragg [8]
also heuristics (called shortcuts) in human behavior and biases. presents a list of such principles and calls them triggers: Strong af-
Both are causal factors for wrong appraisals and decisions. Robert fect, Overloading, Reciprocation, Deceptive Relationships, Diffusion
Cialdini [35] agrees with Schneier and discusses the principles of of Responsibility, Authority, Integrity and Consistency. Scheeres
influence and how heuristics and biases are exploited by a human [30] makes obvious the relationship between Gragg’s and Cialdini’s
to manipulate another human. Cialdini also argues that there are treatment by correlating all these principles and triggers. Granger
4
Automated recognition of Chat-based SE attacks ARES 2018, August 27–30, 2018, Hamburg, Germany

[9] and Peltier [26] present similar factors of influence based on 4.4 Speech Act
their point of view. Theoretical linguistics inquire into the nature of human language
Table 1 summarizes the mapping of the above factors along with and seek to answer fundamental questions as to what a language is,
Cialdini’s principles. In our approach Cialdini’s influence principles or the inner workings of it. Several different levels of analysis are de-
are chosen because there is a major overlap with all of the factors fined, such as syntactic (studies the structure of the visible/audible
proposed by the other researchers. form of the language), semantic (studies the relations and depen-
dencies between different language structures and their potential
4.3 Deception meanings), and pragmatic (studies the issues related to language
An [2] describes Deception as "an act or statement intended to make use due to context and uncovers the intention of the speaker in an
people believe something that the speaker does not believe to be true, utterance).
or not the whole truth". A more precise definition for Deception Our study on chat-based conversations can benefit by finding
is given in [10] where "Deception is a successful or unsuccessful at- the ordering and patterns of interaction between two interlocutors.
tempt, without forewarning, to create in another a belief which the Our interest is in uncovering the actions that are hidden between
communicator considers to be untrue". Over the years the research the words and pragmatic analysis seems to be the appropriate
community became very interested in the detection of deception. approach from such a language/action perspective [39]. The starting
Due to the interdisciplinary nature of the phenomenon, researchers point to study the pragmatics of language action is Speech Act
from various scientific fields (psychology, computer science, lin- Theory (SAT). According to SAT [32], the uttering of a sentence is
guistics, philosophy, etc.) have already presented their results by an action, and in short form says that "saying is doing" or similarly
studying and analyzing several different deceptive cues (e.g., bio- "words are deeds". Austin claimed "all utterances, in addition to
metric indicators, facial indicators or gestural indicators). There are meaning, perform specific acts via the specific communicative force
two categories of deception [2]: of an utterance" and introduced a three-fold distinction among the
acts one simultaneously performs when saying something:
• face saving: when humans lie to protect themselves, to avoid • Locutionary act: the production of a meaningful linguistic
tension and conflict in a social interaction, or to minimize expression.
hurt feelings and ill will, • Illocutionary act: the action intended to be performed by a
• malicious: when humans lie with harmful intent. speaker in uttering a linguistic expression, either explicitly or
Our primary interest is in detecting a malicious deception at- implicitly. Examples include: accusing, apologizing, refusing,
tempt in a text-based conversation and use this finding as an extra ordering, etc.
indicator for recognizing a social engineering attempt. So far, sev- • Perlocutionary act: the effect of the illocutionary act on the
eral research attempts have been made studying verbal or nonverbal hearer such as persuading, deterring, surprising, misleading
cues in order to detect deceptive behavior [25], [7]. Current work in or convincing.
deception detection is mainly based on verbal cues and has shown For example, the phrase of an IT technician: "The operating system
that it is possible to reliably predict a deception attempt [38]. In will reboot in five minutes." results in saying that the OS will reboot
most of the works researchers have collected data and manually in 5 minutes (locutionary act) and informs the users of the imminent
annotated them for deceptive status. After that, the labeled data rebooting of the OS (illocutionary act). By producing his utterance
were fed to a classification algorithm for supervised learning. The the IT technician intents to make users believe that the OS will
features extracted for text-based deception detection are critical reboot in 5 minutes and urges them to do housekeeping activities
and directly connected to prediction accuracy [25], [7]. (perlocutionary act). The IT technician performs all these speech
The common scientific approach is to use three types of features, acts, at all three levels, just by uttering the above sentence.
namely lexical, acoustic, and speech features. The most frequently Searle proposed speech acts to be classified into five categories
used techniques for lexical analysis are: Linguistic Inquiry and along four dimensions (illocutionary point, direction of fit between
Word Count (LIWC), N-gram, Part-of-speech (POS), and Dictionary words and world, psychological state, and propositional content):
of Affect in Language (DAL). • Representatives express the speaker’s beliefs. Examples in-
LIWC is primarily used for detecting psychological character- clude claiming, reporting, asserting, stating and concluding.
istics by calculating several metrics for usage of different word Using representatives the speaker makes words fit the world
categories, usage of casual words, existence of positive or negative by representing the world as he believes it is.
emotions in text, etc. In [13], [25] researchers used LICW to ex- • Directives express the speaker’s desire to get the hearer act
amine text-based communication and managed to extract valuable in a specific way. Examples include commands, advice, orders
knowledge regarding people’s personality, and cognitive and emo- and requests. Using directives, the speaker intends to make
tional characteristics. The above research works differ in accuracy the world match the words via the hearer. E.g., "Double-click
results due to the use of different datasets that lead to accurate or this file."
less accurate machine learning algorithms. DAL is mostly used to • Commissives are used to express the speaker’s intention
analyze emotive content and its main difference from LICW is that and commitment to do something in the future. Examples
it has a narrower focus. N-gram is usually combined with other include offers, pledges, promises, refusals, and threats. Using
more advanced techniques, like LICW to train binary classifiers commissives, the speaker adapts the world to the words; e.g.,
(e.g., Naive Bayes, SVM, etc.) during a lexical analysis. "I’ll never give you access to your account."
5
ARES 2018, August 27–30, 2018, Hamburg, Germany N. Tsinganos et al.

Figure 2: System Architecture.

• Expressives express the psychological state of the speaker Table 2: Enablers, Stage and Techniques
such as joy and sorrow. Examples include praising, blaming,
apologizing, and congratulating. There is no direction of fit Enablers Stages Techniques
for expressives; e.g. "Well done, John!" Personality Traits IG, RD Classification
• Declaratives are used to express immediate changes in the Deception IG, RD Classification, Con-
current state of some affair. Examples are firing (from em- versation for Action
ployment), declaring war, etc. Both directions of fit, suit these (Speech Act)
type of speech act (words-to-world and world-to-words). E.g., Influence/Persuasion RD Classification, Semantic
"I object, Your Honor." Analysis
Speech Act RE Conversation for
Action (Speech Act),
4.5 Chat History
Typed Dependency
This enabler refers to the technical challenge of assessing the risk Trees, Named-Entity
of a potential SE attack through the history of a user’s chat dia- Recognition
logues. In many cases, SE attacks take place in multiple repeated Past Experience IG, RD, Value Threshold
phases, where the offender is properly prepared before the attack RE, EX
commences. In particular, she creates a ’trust’ relationship, which
requires time to explore her victim until she finds the right spot for
the attack to take place. Therefore, the purpose of this process is
to utilize all previous chat dialogues between the same interlocu-
(NLP) techniques along with psychological characteristics detec-
tors, transform them to a measurable value and use it as an extra
tion for both interlocutors. The system includes five recognition
indicator for detecting a SE attack.
subsystems, namely: Influence Recognition (IR), Deception Recog-
nition (DR), Personality Recognition (PR), Speech Act (SA) and Past
5 SYSTEM ARCHITECTURE Experience (PE). Each subsystem calculates a separate risk value
In order to protect users from SE attacks through person-to-person (Rir, Rdr, Rpr, Rsa, Rpe), which is then fed to the Risk Aggregator
text communication, a technical solution is needed, beyond the that calculates the overall probability distribution of SE attack risk
training programs and psychological preparation. Such a technical R S E . Figure 2 presents a conceptual diagram of the automated
solution could make use of all important factors to develop and SE attack recognition system. The tools and techniques used in
implement an automated process for risk assessment during a chat every stage of a SE attack (Information Gathering - IG, Relationship
conversation. However, it seems challenging as human personality Development - RD, Relationship Exploitation, and Execution to
traits can lead someone to be influential, persuasive, and decep- achieve objective - EX) in correspondence with the associated SE
tive while at the same time another human can be more or less Enablers are depicted in Table 2.
vulnerable to deceptive acts.
Automated SE attack recognition means that there must be a 5.1 Dialogs, Context and Preprocessing
clear decision making (even though probabilistically) on whether a The dialogue text between the two interlocutors is the initial input
person aims to intentionally deceive another person. Working in for all system processes, along with contextual information. Con-
this direction, we designed an automated recognition system which textual information may include time and location details, which
functions in a linear manner based on Natural Language Processing can be used by the Past Experience subsystem, as described below.
6
Automated recognition of Chat-based SE attacks ARES 2018, August 27–30, 2018, Hamburg, Germany

reciprocation, social proof and commitment & consistency) well-


known binary classifiers (Naive Bayes, Support Vector Machines)
are used which are effective in feature vector models. Feature vec-
tors are populated with metric values for topic initiation, topic
control, sentence structure and dialogue goal. Furthermore, two
commonly used features in NLP, word unigrams and bigrams are
used along with the implied Bag-of-Word model.

5.3 Deception Recognition


The system is able to calculate the degree of deception that is hid-
den in the attacker’s writings according to the section 4.3. In our
approach, deception detection is treated as a classification problem
where lexical features are used to apply machine learning algo-
Figure 3: Typed Dependency Tree. rithms. There are many algorithms, like SVM, capable of handling
large number of features. To extract the lexical features Linguistic
Inquiry and Word Count (LIWC), Part-of-Speech (POS) and N-gram
Location details (e.g., in the form of an IP address) is useful for
techniques are utilized. Discovering positive emotion words is a
separating insiders from outsiders and for controlling the use of
main objective of the Deception Recognition subsystem because
different nicknames from the same location.
a great proportion of these appears more frequently in deceptive
As a first step, the captured dialogue text is pre-processed with
speech than in truthful speech [13]. Similar measurements are per-
Natural Language Processing techniques. Depending on the origi-
formed using DAL, while N-gram is used in conjunction with LIWC
nal raw text, pre-processing (also collectively known as tokeniza-
to train the classifiers.
tion), comprises cleaning of unwanted tags and labels (e.g., HTML
Zuckerman [40] argues that deception can be categorized in
tags) and unnecessary capitalization, stemming (lemmatization),
three categories, namely: emotional stress, cognitive effort, and
stop-words removal, syntactic analysis, vocabulary creation (for
attempted behavioral control. Emotion recognition is simultane-
blacklist creation, topic modeling etc.), and annotation (e.g., POS
ously performed in DR subsystem to detect emotional stress that is
tagging). Subsequently, the vectorized representation of the pre-
generally caused (fear, guilt, delight, etc.) while an attacker tries to
processed dialogue text and contextual information is inputted to
deceive. A deceiver might feel fear that she will be caught, or she
the recognition subsystems.
might feel guilty doing something wrong, or even she could feel
To efficiently process a sentence and extract valuable informa-
delighted by fooling someone else.
tion, typed dependency trees are utilized to represent the structure
of a sentence and all the dependencies between the individual words.
All the dependencies are also labeled with grammatical relations 5.4 Personality Recognition
(e.g., subject, object, indirect object, etc.). After parsing a sentence Personality recognition is performed using classification tools that
and representing it as a typed dependency tree, information about are utilizing the results of Mairesse [19] . The personality traits of
predicate-argument structures, which are not readily available from the victim are used to calculate the related risk of being vulnerable
other structure parses, can be extracted easily. to a SE attack. The main objective of the Personality Recognition
A hierarchy of grammatical relations rooted with the most generic subsystem is to identify the personality category (as defined in the
relation is created where the relations between heads and their Big-5 theory) of both interlocutors (attacker and victim) based on
dependents can be easily identified. The creation of such a tree the captured dialog. To this extend, a document-modeling technique
is based on special rules/patterns that are applied on the corre- [20] is utilized, based on a convolutional neural network features
sponding phrase structure tree [6]. First, a dependency extraction extractor. Chat dialog sentences are fed to convolution filters to
is performed where a sentence is parsed using a phrase structure create a sentence model in the form of n-gram feature vectors.
grammar parser, followed by a dependency typing where the head Each text-based dialog is then represented as aggregated vectors
of each word of the sentence is identified using modified rules to of its sentences. The vectors are created at the preprocessing stage
retrieve the semantic head of each word rather than the syntactic based on Mairesse’s features and then they are concatenated. All
head. In Figure 3, an example of a typed dependency tree is depicted, emotionally neutral sentences are discarded from the text-based
where nsubj means nominal subject, dobj means "direct object", det dialog to further improve the results.
means determiner, ref means referent, rel means relative (word
introducing a rcmod), and rcmod means relative clause modifier. 5.5 Speech Act
Identifying and tracking proposed actions and corresponding re-
5.2 Influence Recognition sponses over communication channels (like text-based chat) is cru-
The system calculates the degree of influence of the attacker by cial for protecting users from SE attacks. These are difficult tasks due
analyzing the text as described in section 4.2. We are interested in to syntactical, grammatical and structural idiosyncrasies of chat-
modeling persuasion arguments using neural networks and per- based conversations. In our approach, every action is decomposed
form semantic analysis of the dialogue to predict persuasiveness. in different lexical units with accompanying parameters while ev-
Based on Cialdini’s model (authority, scarcity, liking & similarity, ery response can be in different states (acceptance, denial). Actions
7
ARES 2018, August 27–30, 2018, Hamburg, Germany N. Tsinganos et al.

all previous chats between the same interlocutors, together with


the proportion of the same user’s conversations.
The exclusivity of an attacker’s conversations with a particular
victim is calculated as a ratio. The importance of calculating this
ratio results from the fact that most attackers form a deceptive rela-
tionship with their victims before the attack begins. Therefore, the
elevated rate can signal a possible attack preparation. Specifically,
whenever a chat conversation is detected, the nickname used by
an attacker is also recorded together with his network connection
details. Thus, considering the number of conversations recorded in
the past, (where the attacker had the same nickname in which the
victim participates), the ratio can be calculated.
Figure 4: Timeline of chat offering a bait-project Fig 6 shows the utilization of the SE attack recognition subsys-
tems during the evolution of the SE attack stages. Past Experience
is able to utilize content from historic data from every SE attack
and responses are then identified based on features extracted from stage. For Personality Recognition and Deception Recognition data
the captured dialogues. We assume that every chat has a short life- are gathered during the IG and RD stages. The Influence Recogni-
cycle depending on the particular interlocutors (attacker, victim). tion subsystem monitors the dialogues during the RD stage. Finally,
A time-line depicting the aforementioned chat steps is shown in during the EX stage data are gathered by the Speech Act subsystem.
Figure 4. An attacker can initiate a chat (Initiation), and plan/offer
a bait-project (Planning). If the victim’s response is acceptance
(Acceptance) then the attacker has taken control (Control) of the
situation. After that, the victim executes the action (Execution), and
the attack reaches completion (Completion).
Our focus is on human chat-based conversations from a per-
spective based on language as action. Therefore, the Speech Act
subsystem defines actions and responses based on extracted fea-
tures from text conversation (typed dependency) in the context of
SE attacks, identifies the response state, determines achieved steps
in the chat conversation timeline, and monitors the corresponding
SE attack progression to raise an alert. Here, our main interest is
in identifying a "conversation for action" in which the attacker (A)
makes a request to the victim (B) either to do something or say
Figure 6: Utilization of SE attack Enablers during SE attack
something (e.g., reveal information). The state transition diagram
stages
in Fig. 5 is an adapted version of the one that Winograd [39] devel-
oped to represent a Conversation for Action (CfA) as a pattern of a
Speech Act.
More specifically, the state transition diagram represents a CfA
6 CONCLUSIONS
initiated by a request from an Attacker (A) to a Victim (V). The In this paper, we demonstrated that SE attacks is a persistent cy-
circles represent conversation states and the labeled lines represent ber threat in enterprise environment and that detection is needed
speech acts. After the initial request of A, V can accept, decline, or in early stages. A thorough review of related works is conducted
counter-offer. A makes a request to V and V can promise to fulfill which revealed the shortage of automated recognition systems for
the request, reject it or counter-offer. V can accept the counter chat-based SE attacks. A dissection of the separate SE attack stages
offer, counter again or withdraw. In case V promises to fulfill the was presented along with the related SE attack attributes and the
request, he can later assert that the request is done. A can declare various forms of the attacks. The major enablers were identified
the request done, not done, or withdraw. To identify requests for for every stage, namely: personality traits, influence (persuasion),
action by the attacker and monitor the flow from state to state in a deception, speech act and past experience. Finally a system capable
CfA, we utilize NLP techniques, Typed Dependencies Trees, and of recognizing chat-based SE attacks in early stages is proposed
Named-Entity Recognition techniques (NER). by combining the related corresponding indicators to the afore-
mentioned SE attack enablers. The proposed system is required
5.6 Past Experience to comply with the European General Data Protection Regulation
The Past Experience process analyses features from dialogues cap- (GDPR) and other related international data protection regulations.
tured in a long period of time, along with accompanying previously
stored risk values. History is expressed in number of dialogues ACKNOWLEDGMENTS
rather than some time metric. Since many SE attacks last long and This work has been partially supported by the European Com-
take place in several phases, it is beneficial to use this past history. mission through project FORTIKA funded by the European Union
The PE subsystem handles the following values: the risk values of Horizon 2020 programme under Grant Agreement No. 740690 . The
8
Automated recognition of Chat-based SE attacks ARES 2018, August 27–30, 2018, Hamburg, Germany

Figure 5: Conversation for Action state diagram adapted for SE Attack recognition.

opinions expressed in this paper are those of the authors and do [19] Francois Mairesse and Marilyn Walker. 2000. Words Mark the Nerds: Computa-
not necessarily reflect the views of the European Commission. tional Models of Personality Recognition through Language. In 28th Annu. Conf.
Cogn. Sci. Soc., Vol. 28. 543–548.
[20] Navonil Majumder, Soujanya Poria, Alexander Gelbukh, and Erik Cambria. 2017.
Deep learning-based document modeling for personality detection from text.
REFERENCES IEEE Intelligent Systems 32, 2 (2017), 74–79.
[1] 2018. Information Technology - Information Security - Information Assurance | [21] Maranda McBride, Lemuria Carter, and Merrill Warkentin. 2012. Exploring
ISACA. (2018). https://www.isaca.org/pages/default.aspx the role of individual employee characteristics and personality on employee
[2] Guozhen An. 2015. Literature review for Deception detection. Dr. Diss. City Univ. compliance with cybersecurity policies. 2011 Dewald Roode Work. Inf. Syst. Secur.
New York (2015). Res. (2012), 1–13.
[3] Monique Bezuidenhout, Francois Mouton, and Hein S Venter. 2010. Social engi- [22] Kevin D Mitnick and William L Simon. 2011. The art of deception: Controlling the
neering attack detection model: Seadm. In Information Security for South Africa human element of security. John Wiley & Sons.
(ISSA), 2010. IEEE, 1–8. [23] Francois Mouton, Louise Leenen, and H S Venter. 2015. Social Engineering
[4] Ram Bhakta and Ian G Harris. 2015. Semantic analysis of dialogs to detect social Attack Detection Model: SEADMv2. In 2015 Int. Conf. Cyberworlds. IEEE, 216–223.
engineering attacks. In Proc. 2015 IEEE 9th Int. Conf. Semant. Comput. IEEE ICSC https://doi.org/10.1109/CW.2015.52
2015. IEEE, 424–427. https://doi.org/10.1109/ICOSC.2015.7050843 [24] Marcus Nohlberg. 2008. Securing Information Assets: Understanding, Measuring
[5] Ali Darwish, Ahmed El Zarka, and Fadi Aloul. 2012. Towards Understanding and Protecting against Social Engineering Attacks. Ph.D. Dissertation. Department
Phishing Victims’ Profile. In 2012 Int. Conf. Comput. Syst. Ind. Informatics. IEEE, of Computer and Systems Sciences (together with KTH), Stockholm University,
13–17. Kista.
[6] Marie-Catherine De Marneffe, Bill MacCartney, Christopher D Manning, and [25] Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding Decep-
Others. 2006. Generating typed dependency parses from phrase structure parses. tive Opinion Spam by Any Stretch of the Imagination. In Proc. 49th Annu. Meet.
In Proceedings of LREC, Vol. 6. Genoa Italy, 449–454. Assoc. Comput. Linguist. Hum. Lang. Technol. 1. Association for Computational
[7] Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic stylometry for de- Linguistics, 309–319. https://doi.org/10.1145/2567948.2577293 arXiv:1107.4557
ception detection. In Proceedings of the 50th Annual Meeting of the Association for [26] Thomas R Peltier. 2006. Social engineering: Concepts and solutions. Inf. Syst.
Computational Linguistics: Short Papers-Volume 2. Association for Computational Secur. 15, 5 (nov 2006), 13–21. https://doi.org/10.1201/1086.1065898X/46353.15.4.
Linguistics, 171–175. 20060901/95427.3
[8] David Gragg. 2003. A Multi-Level Defense Against Social Engineering Social. [27] Alan J Resnik and Robert B Cialdini. 1986. Influence: Science & Practice. J. Mark.
SANS Inst. (2003), 21. https://doi.org/10.9780/22307850 Res. 23, 3 (1986), 305. https://doi.org/10.2307/3151490 arXiv:arXiv:1011.1669v3
[9] Sarah Granger. 2001. Social Engineering Fundamentals, Part I: Hacker Tactics | [28] Serge Vaudenay (eds.) Samuel Galice Marine Minier (auth.). 2008. Progress in
Symantec Connect. Secur. Focus. December 1527 (2001). http://www.symantec. Cryptology - AFRICACRYPT 2008, First International Conference on Cryptology in
com/connect/articles/social-engineering-fundamentals-part-i-hacker-tactics Africa, Casablanca, Morocco, June 11-14, 2008. Proceedings (1 ed.). Lecture Notes
[10] Pär Anders Granhag and Leif A Strömwall. 2004. The Detection of Deception in in Computer Science 5023 Security and Cryptology, Vol. 5023. Springer-Verlag
Forensic Contexts. Vol. 9780521833. Cambridge University Press, Cambridge. 1–348 Berlin Heidelberg.
pages. https://doi.org/10.1017/CBO9780511490071 arXiv:arXiv:gr-qc/9809069v1 [29] Yuki Sawa, Ram Bhakta, Ian G Harris, and Christopher Hadnagy. 2016. Detec-
[11] Harl. 1997. Psychology of Social Engineering. (1997). http://barzha.cyberpunk. tion of Social Engineering Attacks Through Natural Language Processing of
us/lib/cin/se10.html Conversations. In 2016 IEEE Tenth Int. Conf. Semant. Comput. IEEE, 262–265.
[12] Ryan Heartfield and George Loukas. 2015. A Taxonomy of Attacks and a Survey https://doi.org/10.1109/ICSC.2016.95
of Defence Mechanisms for Semantic Social Engineering Attacks. ACM Comput. [30] Jamison W Scheeres. 2008. Establishing the Human Firewall: Reducing an In-
Surv. 48, 3 (2015), 1–39. https://doi.org/10.1145/2835375 dividual’S Vulnerability To Social Engineering Attacks. Technical Report. DTIC
[13] Julia Hirschberg, Stefan Benus, Jason M Brenier, Frank Enos, Sarah Fried- Document. 49 pages.
man, Sarah Gilman, Cynthia Girand, Martin Graciarena, Andreas Kathol, Laura [31] Bernadette H Schell and Clemens Martin. 2006. Webster’s New World Hacker
Michaelis, Bryan Pellom, Elizabeth Shriberg, and Andreas Stolcke. 2005. Distin- Dictionary. Wiley Pub, Indianapolis, IN. 387 pages.
guishing Deceptive from Non-Deceptive Speech. Proc. Interspeech 2005 (2005), [32] John R Searle, Ferenc Kiefer, Manfred Bierwisch, and Others. 1980. Speech act
1833–1836. https://doi.org/10.1.1.59.8634 theory and pragmatics. Vol. 10. Springer.
[14] Michael D Hoeschele and Marcus K Rogers. 2004. CERIAS Tech Report 2005-19 [33] Charles Donald Spielberger. 2004. Encyclopedia of applied psychology. Elsevier
DETECTING SOCIAL ENGINEERING. (2004). Academic Press.
[15] Michael D Hoeschele and Marcus K Rogers. 2006. CERIAS Tech Report 2006-15 [34] Pekka Tetri and Jukka Vuorinen. 2013. Dissecting social engineering. Behav. Inf.
DETECTING SOCIAL ENGINEERING by Michael Hoeschele Center for Education Technol. 32, 10 (oct 2013), 1014–1023. https://doi.org/10.1080/0144929X.2013.
and Research in Information Assurance and Security , Purdue University , West 763860
Lafayette , IN 47907-2086. (2006). [35] David R Tobergte and Shirley Curtis. 2013. INFLUENCE The Psychology of Per-
[16] Lech J Janczewski and Lingyan Fu. 2010. Social Engineering-Based Attacks: Model suasion. Vol. 53. Harper Collins. 1689–1699 pages. https://doi.org/10.1017/
and New Sealand Perspective. IEEE, Piscataway, NJ. CBO9781107415324.004 arXiv:arXiv:1011.1669v3
[17] A Karakasiliotis, S M Furnell, and M Papadaki. 2006. Assessing end-user aware- [36] Sven Uebelacker and Susanne Quiel. 2014. The social engineering personality
ness of social engineering and phishing. (2006), 4–5. https://doi.org/10.4225/75/ framework. In Proc. - 4th Work. Socio-Technical Asp. Secur. Trust. STAST 2014 -
57a80e47aa0cb Co-located with 27th IEEE Comput. Secur. Found. Symp. CSF 2014 Vienna Summer
[18] Katharina Krombholz, Heidelinde Hobel, Markus Huber, and Edgar Weippl. 2014. Log. 2014. 24–30. https://doi.org/10.1109/STAST.2014.12
Advanced Social Engineering Attacks. J. Inf. Secur. Appl. 22 (2014), 11.

9
ARES 2018, August 27–30, 2018, Hamburg, Germany N. Tsinganos et al.

[37] Verizon. 2015. 2015 Data Breach Investigations Report. Verizon Bus.
J. 1 (may 2015), 1–70. https://doi.org/10.1017/CBO9781107415324.004
arXiv:arXiv:1011.1669v3
[38] Aldert Vrij. 2014. Detecting lies and deceit: Pitfalls and opportunities in nonverbal
and verbal lie detection. In Interpers. Commun. 321–346. https://doi.org/10.1515/
9783110276794.321
[39] Terry Winograd. 1986. A language/action perspective on the design of cooper-
ative work. In Proceedings of the 1986 ACM conference on Computer-supported
cooperative work. ACM, 203–220.
[40] Miron Zuckerman, Bella M Depaulo, and Robert Rosenthal. 1981. Verbal and
nonverbal communication of deception. Adv. Exp. Soc. Psychol. 14, C (1981), 1–59.
https://doi.org/10.1016/S0065-2601(08)60369-X arXiv:2066187

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy