Machine Learning And Security Pdf Download
Machine learning techniques have been applied in many areas of science due to their unique properties like adaptability, scalability, and potential to rapidly adjust to new and unknown challenges. Cyber security is a fast-growing field demanding a great deal of attention because of remarkable progresses in social networks, cloud and web technologies, online banking, mobile environment, smart grid, etc. Diverse machine learning methods have been successfully deployed to address such wide-ranging problems in computer security. This paper discusses and highlights different applications of machine learning in cyber security. This study covers phishing detection, network intrusion detection, testing security properties of protocols, authentication with keystroke dynamics, cryptography, human interaction proofs, spam detection in social network, smart meter energy consumption profiling, and issues in security of machine learning techniques itself.
Figures - uploaded by Vitaly Ford
Author content
All figure content in this area was uploaded by Vitaly Ford
Content may be subject to copyright.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
Applications of Machine Learning in Cyber Security
Vitaly Ford and Ambareen Siraj
Computer Science Department, Tennessee Tech University
Cookeville, TN, 38505, USA
vford42@students.tntech.edu, asiraj@tntech.edu
Abstract
Machine learning techniques have been applied in many
areas of science due to their unique properties like
adaptability, scalability, and potential to rapidly adjust to
new and unknown challenges. Cyber security is a fast-
growing field demanding a great deal of attention because
of remarkable progresses in social networks, cloud and
web technologies, online banking, mobile environment,
smart grid, etc. Diverse machine learning methods have
been successfully deployed to address such wide-ranging
problems in computer security. This paper discusses and
highlights different applications of machine learning in
cyber security. This study covers phishing detection,
network intrusion detection, testing security properties of
protocols, authentication with keystroke dynamics,
cryptography, human interaction proofs, spam detection in
social network, smart meter energy consumption profiling,
and issues in security of machine learning techniques itself.
keywords: Sec urity , machine learning, survey.
1 Introduction
Alongside of fast evolvement of web and mobile
technologies, attack techniques are also becoming more
and more sophisticated in penetrating systems and evading
generic signature-based approaches. Machine learning
techniques offer potential solutions that can be employed
for resolving such challenging and complex situations due
to their ability to adapt quickly to new and unknown
circumstances. Diverse machine learning methods have
been successfully deployed to address wide-ranging
problems in computer and information security. This paper
discuss es and highlights different applications of machine
learning in cyber security.
The paper is structured as follows. Section 2 describes
various applications of machine learning in information
security: phishing detection, network intrusion detection,
testing security properties of protocols, authentication with
keystroke dynamics, cryptography, human interaction
proofs, spam detection in social network, smart meter
energy consumption profiling, and issues in security of
machine learning techniques itself. Section 3 concludes
with future work.
2 Methodology
2.1 Phishing Detection
Phishing is aimed at stealing personal sensitive
information. Researchers [2] have identified three principal
groups of anti-phishing methods: detective (monitoring,
content filtering, anti-spam), preventive (authentication,
patch and change management), and corrective (site
takedown, forensics) ones. These categories are
summarized in Table 1.
Table 1: Phishing and Fraud Solutions [1, 2]
1. Monitors account
life cycle
2. Brand monitoring
3. Disables web
duplication
4. Performs content
filtering
5. Anti-Malware
6. Anti-Spam
2. Patch and
change
management
3. Email
authentication
4. Web
application
security
1. Phishing site
takedown
2. Forensics
and
investigation
A comparison of phishing detection techniques appears
in [1]. It was observed that many phishing detection
solutions under consideration have a high rate of missed
detection. Researchers compared six machine learning
classifiers, using 1,171 raw phishing emails and 1,718
legitimate emails, – "Logistic Regression (LR),
Classification and Regression Trees (CART), Bayesian
Additive Regression Trees (BART), Support Vector
Machines (SVM), Random Forests (RF), and Neural
Networks (NNets)". The error rates of all the above-
mentioned classifiers are summarized in Figure 1.
Figure 1: The error rates of classifiers [1]
For experimentation, text indexing techniques were used
for parsing the emails. All attachments were removed,
"header information of all emails and html tags" from the
emails' bodies as well as their specific elements were
extracted. Afterwards, a stemming algorithm was applied
and all the irrelevant words were removed. Finally, all
items were sorted according to their frequency in emails.
As a result of this work, it can be concluded that LR is a
more preferable option among users due to low false
positive rate (usually, users would not want their legitimate
emails to be misclassified as junk). Also, LR has the
highest precision and relatively high recall in comparison
with other classifiers under contemplation. The comparison
of precision, recall, and F-measure is given in Table 2.
Table 2: Comparison of precision, recall, and F1 [1]
Zhuang et al. [6] developed an automatic system for
phishing detection applying a cluster ensemble of several
clustering solutions. A feature selection algorithm for
extracting various phishing email traits was used, which
was: Hierarchical Clustering (HC) Algorithm that adopted
cosine similarity (using the TF-IDF metric) for measuring
the similarity between two points, and K-Medoids (KM)
Clustering approach. The proposed methods for phishing
website and malware categorization have about 85%
performance. The architecture of their Automatic
Categorization System (ACS) is shown in Figure 3.
Figure 3: The Architecture of ACS [6]
First, the ACS parses the malware samples and phishing
web-sites. It extracts terms and specific malware
instructions and saves them to a database. After that the
system applies the information retrieval algorithm for
calculating the TF-IDF metrics. Then, the ACS utilizes the
ensemble of clustering algorithms and, taking account of
constrains manually generated by security experts, splits
the data into clusters.
2.2 Network Intrusion Detection
Network Intrusion Detection (NID) systems are used to
identify malicious network activity leading to
confidentiality, integrity, or availability violation of the
systems in a network. Many intrusion detection systems
are specifically based on machine learning techniques due
to their adaptability to new and unknown attacks.
Lu et al. [8] proposed a unified effective solution for
improving Genetic Network Programming (GNP) for
misuse and anomaly detection. Matching degree and
genetic algorithm were fused so that redundant rules can be
pruned and efficient ones can be filtered. The system was
tested on KDDcup99 [22] data to demonstrate its
efficiency. The proposed pruning algorithm does not
require "prior knowledge from experience". The rule is
pruned if the average matching degree is less than some
threshold. On the training step, 8,068 randomly chosen
connections were fed into their system (4,116 were normal,
3,952 – smurf and neptune attacks). After training the
system, the proposed solution was tested on 4,068 normal
connections and 4,000 intrusion connections. The accuracy
(ACC) is reported to be 94.91%, false positive rate (FP) is
2.01%, and false negative rate (FN) is 2.05%. Table 4
displays the performance comparison of different
algorithms including the proposed one.
Table 4: The performance comparison of NID systems [8]
Unified detection
(w/ two-stage rule
pruning)
Unified detection
(w/o two-stage rule
pruning)
GNP-ba sed
anomaly detection
GNP-based misuse
detection
Subbulakshmi et al. [9] developed an Alert
Classification System using Neural Networks (NNs) and
Support Vector Machines (SVM) against Distributed
Denial of Service (DDoS) attacks. For simulating a real
DDoS attack, a virtual environment was used with " Snort"
tool for intrusion detection, and "packit" for generating
network packets and sending them to the target machine.
The alerts generated by the snort intrusion detection tool
were captured and fed into a back-propagation neural
network and support vector machines for classifying the
alerts as true-positives or false-positives. The researchers
claimed that this process reduced the total number of alerts
to process by 95%. The average accuracy of neural
network alert classification is 83% whereas for support
vector machines, it is 99%. A comparison of NNs and
SVM with the Threshold Based Method (TBM) and Fuzzy
Inference System (FIS) is shown in Table 5.
Table 5: The Comparison of NNs, SVM, TBM, and FIS [9]
Sedjelmaci and Feham [10] propose a hybrid solution for
detecting intrusions in a Wireless Sensor Network (WSN).
A clustering technique is employed for reducing the
amount of information to process and the energy to
consume. In addition, Support Vector Machines (SVMs)
with misuse detection techniques are used for identifying
network anomalies. The system consists of many
distributed intrusion detection nodes that communicate
with each other to identify attacks. The efficient algorithm
for choosing optimal distributed SVMs is shown in Figure
4. Denial of Service and Probe attacks were considered for
testing which are most common in WSN environment than
any other ones. The performance evaluation of the
proposed distributed system is displayed in Table 6.
Figure 4: Optimal Distributed SVMs Selection Process [10]
Table 6: Performance Evaluation of the Distributed IDS
[10]
In comparison with a centralized intrusion detection
system [11], the proposed solution obtains a higher
accuracy when there is not enough training data (the
accuracy rate is 98%). Also, the proposed approach claims
to reduce energy consumption.
2.3 Authentication with Keystroke
Dynamics
Revett et al. [12] proposed applying a Probabilistic
Neural Network (PNN) for keystroke dynamics. Generally,
keystroke dynamics represents "a class of behavioral
biometrics that captures the typing style of a user". The
system was evaluated on a dataset containing
login/password keystrokes of 50 people. Revett et al. asked
30 of them to login as imposters multiple times instead of
legitimate users. Eight different attributes were monitored
during enrollment and authentication attempts. These
attributes were: digraphs (DG, two-letter combinations),
trigraphs (TG, three-letter combinations), total username
time, total password time, total entry time, scan code,
speed, and edit distance. Subsequently, the data was fed
into the PNN system and tested. The accuracy of
classification of legitimate/imposter equaled 90%. Also,
PNN was compared to a multi-layer perceptron neural
network (MLPNN) with back-propagation and it was
found that PNN training time is 4 times less than MLPNN
one. The summation of False Acceptance and False
Rejection Rates of PNN is 1.5 times less than MLPNN
one. The comparison of the MLPNN and PNN algorithms
can be seen in Table 7. The values of this table are the
summation of the False Acceptance Rate (FAR) and False
Recognition Rate (FRR).
Table 7: FAR + FRR of PNN and MLPNN [12]
2.4 Testing Security of Protocol
Implementation
Train and test many SVMs according to selected
features in a distributed fashion
(deleting one feature at the time)
Select the SVMs with the rate of accuracy > 95 %
Select the SVM with less input features
We embed the selected training model in the IDS
nodes
Shu and Lee [13] a new notion of applying machine
learning for "testing security of protocol implementation".
The researchers mainly focused their research on "Message
Confidentiality (secrecy) under Dolev-Yao model of
attackers" that tries to inject a message to the original one
[14]. Generally, there is no comprehensive solution for a
holistic testing of a protocol implementation security.
However, experiments can be fulfilled with respect to a
problem restricted to a finite number of messages. And the
main goal of their paper is to find some weak spots (that
violate security) in a protocol black-box implementation,
deploying L* learning algorithm [15]. In this algorithm the
researchers created a teacher that performs three principal
actions: 1) Generating an output query given an input
sequence; 2) Generating a counterexample that a system
outputs as an incorrect result when analyzing it;
3) Augmenting the alphabet, appending new input symbols
in addition to the existing ones. They showed the
effectiveness of their proposed technique on testing three
real protocols: Needham-Schroeder-Lowe (N-S-L) mutual
authentication protocol, TMN key exchange protocol, and
SSL 3.0 handshake protocol. As a result, their system
identified the introduced flaws in N-S-L and TMN. Also, it
confirmed that SSL is secured.
2.5 Breaking Human Interaction Proofs
(CAPTCHAs)
Chellapilla and Simard [16] discuss how the Human
Interaction Proofs (or CAPTCHAs) can be broken by
utilizing machine learning. The researchers experimented
with seven various HIPs and learned their common
strengths and weaknesses. The proposed approach is aimed
at locating the characters (segmentation step) and
employing neural network [17] for character recognition.
Six experiments were conducted with EZ-Gimpy/Yahoo,
Yahoo v2, mailblocks, register, ticketmaster, and Google
HIPs. Each experiment was split into two parts: (a)
recognition (1,600 HIPs for training, 200 for validation,
and 200 for testing) and (b) segmentation (500 HIPs for
testing segmentation). On the recognition stage, different
computer vision techniques like converting to grayscale,
thresholding to black and white, dilating and eroding, and
selecting large CCs with sizes close to HIP char sizes were
applied. Figure 5 demonstrates some of those algorithms in
operation.
Figure 5: Examples of segmentation [16]
The following Table 8 compiles the experimentation
results:
Table 8: Success Rates for Segmentation and
Recognition steps
Success rate
for
segmentation
Success rate for
recognition
given correct
segmentation
It was reported that the segmentation stage is relatively
difficult for the following reasons: (a) computationally
expensive; (b) complex segmentation function because of
an immense non-valid pattern space; and (c) difficulty in
identification of valid characters.
2.6 Cryptography
Yu and Cao [18] developed a fast and efficient
cryptographic system based on delayed chaotic Hopfield
neural networks. The researchers claim that the proposed
system is secured due to "the difficult synchronization of
chaotic neural networks with time varying delay".
Kinzel and Kanter [20] show how two synchronized
neural networks can be used for a secret key exchange over
a public channel. Basically, on the training stage two
neural networks start with random weight vectors and
receive an arbitrary identical input sequence every cycle.
The weights are changed only if the outputs of both neural
networks are the same. And after a short period of time the
corresponding weight vectors of both neural networks
become identical. The researchers have demonstrated that
it is computationally infeasible to perform some attacks.
2.7 Social Network Spam Detection
K. Lee et al. [7] observed that spammers exploit social
systems for employing phishing attacks, disseminating
malware, and promoting affiliate websites. For protecting
social systems against those attacks, a social honeypot was
developed for detecting spammers in social networks like
Twitter and Facebook. The proposed solution is based on
Support Vector Machine (SVM) and has a high precision
as well as low false positive rate. A social honeypot
represents a legitimate user profile and a corresponding
bot, which gathers both legitimate and spam profiles and
feeds them into the SVM classifier. For evaluating the
performance of the proposed machine learning system, the
researchers examined MySpace and Twitter networks.
Various legitimate user accounts were created in both
social networks and data were collected over several
months. Deceptive spam profiles, like click traps, friend
infiltrators, duplicate spammers, promoters, and phishers
were manually singled out into several groups. The SVM
was fed with the data (for MySpace: 388 legitimate
profiles and 627 deceptive spam profiles; for Twitter: 104
legitimate profiles, 61 spammers' and 107 promoters'
profiles. Results demonstrate spam precision to be 70% for
MySpace and 82% for Twitter.
2.8 Smart Meter Data Profiling
In our recent work, we have applied fuzzy c-means
clustering for smart meter data profiling [24]. Our research
demonstrates that by having access to energy consumption
traces captured by smart meters, one can implement a
disaggregation technique for deducing consumer energy
consumption profiles, which can compromise privacy of
consumers and have the potential to be used in undesirable
ways. Time frame between when the customer leaves and
returns home offers opportunities for home invasion,
marketing by phone, or even children behavior profiling.
For instance, our analysis of a three-day data sequence
for a smart meter (Figure 6) reveals certain pattern of
energy consumption behavior. Here axis X denotes
date/time of the measurement, and axis Y denotes energy
consumption value in kW/h. From these observations, it
can be inferred that the consumer is a service providing
business (like a store/eatery) rather than a household as the
energy consumption is at its peak consistently between
8:30 A.M. until 10:00 P.M. (Figure 7). It can further be
inferred that it is using certain types of appliances
consuming 0.55 kW/h during a nighttime period every half
an hour. It is likely that these appliances would be security
and/or fire detecting devices with periodic small and
persistent energy consumption.
Figure 8 represents another pattern observed for a single
customer randomly chosen from the dataset. It shows that
the value of the consumed energy varies from 0 to 0.1
kW/h between 1 A.M. to 8 A.M. Therefore, it can be
inferred that during this time period, the customer does not
usually use any appliances. This maybe because: 1) if it is
a residential home, the customer sleeps at that time; 2) if it
is a business, it is not active at that period of time. Taking
into consideration the fact that the customer usually
consumes from 0.358 to 0.548 kW/h during 8 P.M. – 12
A.M., it can be deduced that we are looking at a typical
working household (where people sleep at night, go to
work all day and come back to have dinner, watch TV and
then go to bed again).
Also, by having access to detailed energy consumption
data, one can infer information about appliances usage and
spammers can exploit such information for their own
benefit. On the other side, utility corporations can make
use of such knowledge to detect abrupt changes in
consumer usage patterns, which can be used to detect
energy fraud – an important issue in the smart grid.
Figure 6: Energy Consumption Profile for
One Smart Meter for Three Consecutive days [24]
Figure 7: Mean Energy Consumption per Half an Hour [24]
Figure 8: Energy Consumption Profile for
Single Customer [24]
2.9 Security of Machine Learning
M. Bareno et al. [21] discuss many diverse ways for
compromising machine learning system. The researchers
provides a comprehensive taxonomy of different attacks
aimed at exploiting machine learning systems:
(a) Causative attacks altering the training process;
(b) Attacks on integrity and availability , making false
positives as a breach into a system; (c) Exploratory attacks
exploiting the existing vulnerabilities; (d) Targeted attacks
directed to a certain input; (e) Indiscriminate attacks in
which inputs fail.
The researchers proposed the Reject On Negative Impact
(RONI) defense. RONI ignores all the training data points
that have a substantial negative impact on the classification
accuracy.
There are two main types of defenses they discussed.
First type is a defense against exploratory attacks, in which
an attacker can create an evaluation distribution that the
learner predicts poorly. For defending against this attack,
the defender can limit the access to the training procedure
and data, making it harder for an attacker to apply reverse
engineering. Also, the more complicated a hypothesis
space is, the harder for an attacker to infer the learned
hypothesis. In addition, a defender can limit the feedback
(or send the deceitful one) given to an attacker so that it
becomes harder to break into the system.
Second type is a defense against causative attacks, in
which an attacker can manipulate both training and
evaluation distributions. In this scenario, the defender can
deploy the RONI defense in which the system has two
classifiers. One classifier is trained using a base training
set; another is trained with not only a base set but also the
candidate instance. If the errors of those two classifiers
significantly differ from each other, the candidate instance
is treated as a malicious one.
As an example of applying the defensive RONI
algorithm, the researchers simulated attacking the
SpamBayes spam detection system [23] and showed the
effectiveness of the system against Indiscriminate
Causative Availability attacks.
3 Conclusion
Machine learning is an effective tool that can be
employed in many areas of information security. There
exist some robust anti-phishing algorithms and network
intrusion detection systems. Machine learning can be
successfully used for developing authentication systems,
evaluating the protocol implementation, assessing the
security of human interaction proofs, smart meter data
profiling, etc. Although machine learning facilitates
keeping various systems safe, the machine learning
classifiers themselves are vulnerable to malicious attacks.
There has been some work directed to improving the
effectiveness of machine learning algorithms and
protecting them from diverse attacks. There are many
opportunities in information security to apply machine
learning to address various challenges in such complex
domain. Spam detection, virus detection, and surveillance
camera robbery detection are only some examples.
References
[1] S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair, "A Comparison of
Machine Learning Techniques for Phishing Detection", APWG
eCrime Researchers Summit, October 4-5, 2007, Pittsburg, PA.
[2] Anti-Phishing Working Group, "Phishing and Fraud solutions".
[Online]. Available: http://www.antiphishing.org/. [Accesses: April
4, 2013].
[3] M. Wu, R. C. Miller, and S. L. Garnkel, "Do security toolbars
actually prevent phishing attacks?" in Proceedings of the SIGCHI
conference on Human Factors in computing systems, 2006.
[4] L. F. Cranor, S. Egelman, J. Hong, and Y. Zhang, " Phinding phish:
An evaluation of anti- phishing toolbars", Technical Report CMU-
CyLab-06-018, CMU, November 2006.
[5] M. Chandrasekaran, K. Narayanan, and S. Upadhyaya, "Phishing
email detection based on structural properties", in NYS Cyber
Security Conference, 2006.
[6] W. Zhuang, Y. Ye, Y. Chen, and T. Li, "Ensemble Clustering for
Internet Security Applications", in IEEE xplore, December 17,
2012.
[7] K. Lee, J. Caverlee, and S. Webb, "Uncovering social spammers:
social honeypots + machine learning", SIGIR'10 , July 19-23, 2010,
Geneva, Switzerland.
[8] N. Lu, S. Mabu, T. Wang, and K. Hirasawa, "An Efficient Class
Association Rule-Pruning Method for Unified Intrusion Detection
System using Genetic Algorithm", in IEEJ Transactions on
Electrical and Electronic Engineering, Vol. 8, Issue 2, pp. 164 –
172, January 2, 2013.
[9] T. Subbulakshmi, S. M. Shalinie, and A. Ramamoorthi, "Detection
and Classification of DDoS Attacks using Machine Learning
Algorithms", European Journal of Scientific Research, ISSN 1450-
216X, Volume 47, No. 3, pp. 334 – 346, 2010.
[10] H. Sedjelmaci, and M. Feham, "Novel Hybrid Intrusion Detection
System for Clustered Wireless Sensor Network", International
Journal of Network Security & Its Applications (IJNSA), Vol.3,
No.4, July 2011.
[11] T. H. Hai, E. N. Huh and M. Jo, "A Lightweight Intrusion Detection
Framework for Wireless Sensor Networks", Wireless
Communications and mobile computing, Vol.10, Issue 4,
pp. 559-572, 2010.
[12] K. Revett et al., "A machine learning approach to keystroke
dynamics based user authentication", International Journal of
Electronic Security and Digital Forensics, Vol. 1, No. 1, 2007.
[13] G. Shu and D. Lee, "Testing Security Properties of Protocol
Implementations – a Machine Learning Based Approach", in
Proceedings of 27th International Conference on Distributed
Computing Systems (ICDCS'07), 2007.
[14] D. Dolev and A. Yao, "On the security of public-key protocols",
IEEE Transaction on Information Theory 29, pages 198-208, 1983.
[15] D. Angulin, "Learning regular sets from queries and
counterexamples", Information and Computation, 75, pp. 87-106,
1987.
[16] K. Chellapilla and P. Y. Simard, "Using Machine Learning to Break
Visual Human Interaction Proofs (HIPs)", in Advances in Neural
Information Processing Systems 17, pp. 265-272, 2005.
[17] Simard PY, Steinkraus D, and Platt J, (2003) "Best Practice for
Convolutional Neural Networks Applied to Visual Document
Analysis," in International Conference on Document Analysis and
Recognition(ICDAR), pp. 958-962, IEEE Computer Society, Los
Alamitos.
[18] W. Yu and J. Cao, "Cryptography based on delayed chaotic neural
networks", Physics Letters A, Vol. 356, Issues 4–5, pp. 333-338,
ISSN 0375-9601, August 14, 2006.
[19] J. Yang et al., "Cryptanalysis of a crypt ographic scheme based on
delayed chaotic neural networks", Chaos, Solitons & Fractals, Vol.
40, Issue 2, pp. 821-825, ISSN 0960-0779, April 30, 2009.
[20] W. Kinzel and I. Kanter, "Neural Cryptography", in Proceedings of
the 9th International Conference on Neural Information Processing,
Vol. 3, pp. 1351-1354, November 18-22, 2002.
[21] M. Barreno et al., "The security of machine learning" , Journal
Machine Learning, Vol. 81, Issue 2, pp. 121-148, November 2010.
[22] Knowledge Discovery and Data Mining group, "KDD cup 1999".
[Online]. Available: http://www.kdd.org/kddcup/index.php.
[Accessed: March 3, 2013].
[23] SpamBayes Project Group, "SpamBayes". [Online]. Available:
http://spambayes.sourceforge.net/. [Accessed: February 15, 2013].
[24] V. Ford and A. Siraj, "Clustering of smart meter data for
disaggregation", in Proceedings of IEEE Global Conference on
Signal and Information Processing, December, 2013.
... Over the past decade ML techniques have been widely used to enable systematic learning and building of enterprise systems' normal profiles to detect anomalies and zero-day threats (Conti et al., 2018). ML includes a large variety of models in continuous evolution, presenting weak boundaries and cross relationships, and has already been successfully applied within various contexts in cybersecurity (Dua and Du, 2011;Ford and Siraj, 2014;Singh and Silakari, 2015;Buczak and Guven, 2016;Fraley and Cannady, 2017;Ghanem et al., 2017;Yadav et al., 2017;Apruzzese et al., 2018). The book by Dua and Du (2011) provides a comprehensive guide to how ML and data mining are incorporated in cybersecurity tools, and in particular, it provides examples of anomaly detection, misuse detection, profiling detection, etc. ...
... The study by Ghanem et al. (2017) develop an intrusion detection system which is enhanced by support vector machines. Among other cybersecurity issues, the study by Ford and Siraj (2014) investigates ML approaches for detecting phishing, intrusions, spam detection, etc. ...
Despite the significant increase in cybersecurity solutions investment, organizations are still plagued by security breaches, especially data breaches. As more organizations experience crippling security breaches, the wave of compromised data is growing significantly. The financial consequences of a data breach are set on the rise, but the cost goes beyond potential fines. Data breaches could have a catastrophic impact not only in loss of company's reputation and stock price, but also in economic terms. Threat Intelligence has been recently introduced to enable greater visibility of cyber threats, in order to better protect organizations' digital assets and prevent data breaches. Threat intelligence is the practice of integrating and analyzing disjointed cyber data to extract evidence-based insights regarding an organization's unique threat landscape. This helps explain who the adversary is, how and why they are comprising the organization's digital assets, what consequences could happen following the attack, what assets actually could be compromised, and how to detect or respond to the threat. Every organization is different and threat intelligence frameworks are custom-tailored to the business process itself and the organization's risks, as there is no "one-size-fits-all" in cyber. In this paper, we review the problem of data breaches and discuss the challenges of implementing threat intelligence that scales in today's complex threat landscape and digital infrastructure. This is followed by an illustration of how the future of effective threat intelligence is closely linked to efficiently applying Artificial Intelligence and Machine Learning approaches, and we conclude by outlining future research directions in this area.
... Çünkü özellikle otomasyonel düzeyde insanların yaptığı işlerin, çeşitli algoritmalarla makinelerce yapılmaya başlanması ve hala yeni yeni sektörlerde makine öğrenimine yönelik adımlar atılmasının iş süreçlerini optimize ettiği gibi toplumda sosyolojik ve hatta bireylerde psikolojik birçok değişimi beraberinde getirmesi beklenmektedir. Hâlihazırda siber fiziksel sistemlerle birlikte makine öğreniminin endüstriyel üretim yapan birçok işyerinde kullanılmasının yanı sıra finansal hizmetler (Ghoddusi vd., 2019), hastalık teşhisi (Rajkomar vd., 2019), siber güvenlik (Dua & Du, 2016), suç tespiti ve tahmini (Ateş vd., 2020), ulaşım hizmetleri (Zantalis vd., 2019) ve görüntü işleme (Wäldchen & Mäder, 2018) gibi birçok alanda kullanım etkinliği giderek artmakta olup, yakın gelecekte ordu yapısını robotların oluşturduğu yeni tip asker tasarımı çalışmaları (Lee vd., 2018;Modanval vd., 2021) (Ford & Siraj, 2014;Talpur & O'Sullivan, 2020;Ch vd., 2020). Bu kapsamda makine öğrenimi üzerinde kurulan modellemelerin siber suç ve güvenlik alanında yadsınamaz bir öneme haiz olduğu ve giderek dijitalleşmeye başlayan dünyamızda artan veri miktarının başka türlü efektif bir analize imkân vermemesinden kaynaklı yapay zekâ temelinde makine öğreniminin siber güvenlikteki öneminin daha da artacağı değerlendirilmektedir. ...
... Security practitioners have implemented protective measures against malicious URLs in the form of blacklisting and heuristic techniques using reputation systems [2]. ML has been recognised as a technology that can facilitate the scalability of security solutions, as well as having the potential to adaptively detect novel attacks [15]. URLs are either benign or malicious and are labelled as such in the datasets that exist in literature. ...
Web addresses, or Uniform Resource Locators (URLs), represent a vector by which attackers are able to deliver a multitude of unwanted and potentially harmful effects to users through malicious software. The ability to detect and block access to such URLs has traditionally been enabled through reactive and labour intensive means such as human verification and whitelists and blacklists. Machine Learning has shown great potential to automate this defence and position it as proactive through the implementation of classifier models. Work in this area has produced numerous high-accuracy models, though the algorithms themselves remain fragile to adversarial manipulation if implemented without consideration being given to their security. Our work aims to investigate the robustness of several classifiers for malicious URL detection by randomly perturbing samples in the training data. It is shown that without a measure of defence to adversarial influence, highly accurate malicious URL detection can be significantly and adversely affected at even low degrees of training data perturbation.
... Adversaries are using cyberattacks such as cross site scripting, cross site request forgeries, session hijacking and remote access trojan attacks to commit cybercrimes such as modification of software, manipulating of online services, manipulations electronic products, diverting e-products and other security misconfigurations. Ford and Siraj 2015, highlighted different issues in the applications of machine learning in cybersecurity by detecting phishing, network intrusion, testing security properties of protocols and smart energy consumptions profiling [2]. ...
Predicting cyber attacks using machine learning has become imperative since cyberattacks have increased exponentially due to the stealthy and sophisticated nature of adversaries. To have situational awareness and achieve defence in depth, using machine learning for threat prediction has become a prerequisite for cyber threat intelligence gathering. Some approaches to mitigating malware attacks include the use of spam filters, firewalls, and IDS/IPS configurations to detect attacks. However, threat actors are deploying adversarial machine learning techniques to exploit vulnerabilities. This paper explores the viability of using machine learning methods to predict malware attacks and build a classifier to automatically detect and label an event as "Has Detection or No Detection". The purpose is to predict the probability of malware penetration and the extent of manipulation on the network nodes for cyber threat intelligence. To demonstrate the applicability of our work, we use a decision tree (DT) algorithms to learn dataset for evaluation. The dataset was from Microsoft Malware threat prediction website Kaggle. We identify probably cyberattacks on smart grid, use attack scenarios to determine penetrations and manipulations. The results show that ML methods can be applied in smart grid cyber supply chain environment to detect cyberattacks and predict future trends.
... Authors in [26] analyzed the applications of widely used machine learning techniques to protect the cyberspace from cybercriminals. The authors also depicted various obstacles faced during the implementation of machine learning techniques. ...
The present-day world has become all dependent on cyberspace for every aspect of daily living. The use of cyberspace is rising with each passing day. The world is spending more time on the Internet than ever before. As a result, the risks of cyber threats and cybercrimes are increasing. The term 'cyber threat' is referred to as the illegal activity performed using the Internet. Cybercriminals are changing their techniques with time to pass through the wall of protection. Conventional techniques are not capable of detecting zero-day attacks and sophisticated attacks. Thus far, heaps of machine learning techniques have been developed to detect the cybercrimes and battle against cyber threats. The objective of this research work is to present the evaluation of some of the widely used machine learning techniques used to detect some of the most threatening cyber threats to the cyberspace. Three primary machine learning techniques are mainly investigated, including deep belief network, decision tree and support vector machine. We have presented a brief exploration to gauge the performance of these machine learning techniques in the spam detection, intrusion detection and malware detection based on frequently used and benchmark datasets.
- G D Asyaev
The paper presents an approach that allows increasing the training sample and reducing class imbalance for traffic classification problems. The basic principles and architecture of generative adversarial networks are considered. The mathematical model of network traffic classification is described. The training sample taken to solve the problem has been analyzed. The data proprocessing is carried out and justified. An architecture of the generative-adversarial network is constructed and an algorithm for generating new features is developed. Machine learning models for traffic classification problem were considered and built: Logistic regression, k Nearest Neighbors, Decision tree, Random forest. A comparative analysis of the results of machine learning models without and with the generation of new features is conducted. The obtained results can be applied both in the tasks of network traffic classification, and in general cases of multiclass classification and exclusion of unbalanced features.
The adoption of the Internet of Things (IoT) has raised a significant concern of cyber-attacks at the edge of the network. As the existing traditional intrusion detection (IDS) solutions cannot be applied for the IoT, lightweight IDS schemes are essential to address the security challenges in severely resource-constrained and heterogeneous IoT systems. Three requirements are critical for designing and implementing such schemes successfully: handling distribution (scalability), managing resource constraints, and designing accurate and robust algorithms. The large-scale IoT traffic is so massively distributed in nature that centralized IDS architectures such as the cloud do not scale up and suffer from a high delay for the real-time requirements of the IoT. In this regard, the emergence of fog computing provides a tremendous opportunity to detect suspicious events closer to things in distributed manner, and enables to offload processing, storage and communication overheads from the IoT for intrusion monitoring operations. Apart from employing fog nodes as a collaborative spot for intrusion detection, it is essential to adopt recent algorithms that provide lightweight, robust and autonomous operations for intrusion detection at the fog level since the existing intrusion detection systems fail to provide these requirements. In this case, deep learning (DL) approaches have been found to be promising in securing the IoT by providing compressed data representations and fast processing. Thus, the proliferation of fog nodes coupled with DL techniques could provide lightweight, autonomous and efficient schemes with an improved level of robustness.
- Michael S. Gibson
Most interactions or relationships among objects or entities can be modelled as graphs. Some classes of entity relationships have their own name due to their popularity; social graphs look at people's relationships, computer networks show how computers (devices) communicate with each other and molecules represent the chemical bonds between atoms. Some graphs can also be dynamic in the sense that, over time, relationships change. Since the entities can, to a certain extent, manage their relationships, we say any changes in relationships reflect a change in entity behaviour. By comparing the relationships of an entity at different points in time, we can say there has been a change in behaviour. In this paper, we attempt to detect malicious devices in a network by showing a significant change in behaviour through analysing traffic data.
Machine learning is more closely linked to data privacy and has obtained rapid development in recent years. As for data privacy, searchable encryption (SE) is widely used as a ciphertext search technology, protecting the privacy of users. However, existing schemes usually support single keyword search, so the remote cloud server (CS) may return some irrelevant results. To address this problem, we propose a server-aided public key encryption scheme with multi-keyword search (SA-PEMKS) scheme. The scheme allows a data user to conduct multiple keyword search in a single search query. Security and performance analysis show that the SA-PEMKS scheme is secure and efficient.
Pervasive growth and usage of the Internet and mobile applications have expanded cyberspace. The cyberspace has become more vulnerable to automated and prolonged cyberattacks. Cyber security techniques provide enhancements in security measures to detect and react against cyberattacks. The previously used security systems are no longer sufficient because cybercriminals are smart enough to evade conventional security systems. Conventional security systems lack efficiency in detecting previously unseen and polymorphic security attacks. Machine learning (ML) techniques are playing a vital role in numerous applications of cyber security. However, despite the ongoing success, there are significant challenges in ensuring the trustworthiness of ML systems. There are incentivized malicious adversaries present in the cyberspace that are willing to game and exploit such ML vulnerabilities. This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade. It also provides brief descriptions of each ML method, frequently used security datasets, essential ML tools, and evaluation metrics to evaluate a classification model. It finally discusses the challenges of using ML techniques in cyber security. This paper provides the latest extensive bibliography and the current trends of ML in cyber security.
Distributed Denial of Service (DDoS) attacks falls in the category of critical attacks that compromises the availability of the network resources and detection of these attacks is also a challenging task. The objective of this paper is to develop an Alert Classification System using Machine Learning Algorithms Artificial Neural Networks and Support Vector Machines. An Experimental testbed with 20 nodes and a Server is used to Generate, Detect and Classify these Attacks along with normal Traffic. The System is tested with real traffic and the classification Accuracy is found to be greater than the Rule based and Threshold Methods.
-
- Ambareen Siraj
This research addresses privacy concerns in smart meter data. Smart meter data is analyzed for learning normal consumer usage of electricity. Clustering technique such as Fuzzy C-Means is used to disaggregate and learn energy consumption patterns in smart meter data. Results of experimentation with real world meter data demonstrate that it is realistically possible to profile the electricity consumption behavior of consumers analyzing their usage captured by smart meters.
- Kumar Chellapilla
-
Machine learning is often used to automatically solve human tasks. In this paper, we look for tasks where machine learning algorithms are not as good as humans with the hope of gaining insight into their current limitations. We studied various Human Interactive Proofs (HIPs) on the market, because they are systems designed to tell computers and humans apart by posing challenges presumably too hard for computers. We found that most HIPs are pure recognition tasks which can easily be broken using machine learning. The harder HIPs use a combination of segmentation and recognition tasks. From this observation, we found that building segmentation tasks is the most effective way to confuse machine learning algorithms. This has enabled us to build effective HIPs (which we deployed in MSN Passport), as well as design challenging segmentation tasks for machine learning algorithms.
- Min Wu
- Robert C. Miller
-
Security toolbars in a web browser show security-related information about a website to help users detect phishing attacks. Because the toolbars are designed for humans to use, they should be evaluated for usability - that is, whether these toolbars really prevent users from being tricked into providing personal information. We conducted two user studies of three security toolbars and other browser security indicators and found them all ineffective at preventing phishing attacks. Even though subjects were asked to pay attention to the toolbar, many failed to look at it; others disregarded or explained away the toolbars' warnings if the content of web pages looked legitimate. We found that many subjects do not understand phishing attacks or realize how sophisticated such attacks can be. Author Keywords
- Weiwei Zhuang
- Yanfang Ye
- Yong Chen
- Tao Li
Due to their damage to Internet security, malware and phishing website detection has been the Internet security topics that are of great interests. Compared with malware attacks, phishing website fraud is a relatively new Internet crime. However, they share some common properties: 1) both malware samples and phishing websites are created at a rate of thousands per day driven by economic benefits; and 2) phishing websites represented by the term frequencies of the webpage content share similar characteristics with malware samples represented by the instruction frequencies of the program. Over the past few years, many clustering techniques have been employed for automatic malware and phishing website detection. In these techniques, the detection process is generally divided into two steps: 1) feature extraction, where representative features are extracted to capture the characteristics of the file samples or the websites; and 2) categorization, where intelligent techniques are used to automatically group the file samples or websites into different classes based on computational analysis of the feature representations. However, few have been applied in real industry products. In this paper, we develop an automatic categorization system to automatically group phishing websites or malware samples using a cluster ensemble by aggregating the clustering solutions that are generated by different base clustering algorithms. We propose a principled cluster ensemble framework to combine individual clustering solutions that are based on the consensus partition, which can not only be applied for malware categorization, but also for phishing website clustering. In addition, the domain knowledge in the form of sample-level/website-level constraints can be naturally incorporated into the ensemble framework. The case studies on large and real daily phishing websites and malware collection from the Kingsoft Internet Security Laboratory demonstrate the effectiveness and efficiency of our proposed method.
Genetic network programming (GNP)‐based class association rule mining has been demonstrated to be efficient for misuse and anomaly detection. However, misuse detection is weak in detecting brand new attacks, while anomaly detection has a defect of high positive false rate. In this paper, a unified detection method is proposed to integrate misuse detection and anomaly detection to overcome their disadvantages. In addition, GNP‐based class association rule mining method extracts an overwhelming number of rules which contain much redundant and irrelevant information. Therefore, in this paper, an efficient class association rule‐pruning method is proposed based on matching degree and genetic algorithm (GA). In the first stage, a matching degree‐based method is applied to preprune the rules in order to improve the efficiency of the GA. In the second stage, the GA is implemented to pick up the effective rules among the rules remaining in the first stage. Simulations on KDDCup99 show the high performance of the proposed method. © 2012 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
- Marco A. Barreno
Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a broadly useful tool set for building applications. Second, the need to protect systems against malicious adversaries continues to increase across computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions. When a learning algorithm succeeds in adversarial conditions, it is an algorithm for secure learning. The crucial task is to evaluate the resilience of learning systems and determine whether they satisfy requirements for secure learning. In this thesis, we show that the space of attacks against machine learning has a structure that we can use to build secure learning systems. This thesis makes three high-level contributions. First, we develop a framework for analyzing attacks against machine learning systems. We present a taxonomy that describes the space of attacks against learning systems, and we model such attacks as a cost-sensitive game between the attacker and the defender. We survey attacks in the literature and describe them in terms of our taxonomy. Second, we develop two concrete attacks against a popular machine learning spam filter and present experimental results confirming their effectiveness. These attacks demonstrate that real systems using machine learning are vulnerable to compromise. Third, we explore defenses against attacks with both a high-level discussion of defenses within our taxonomy and a multi-level defense against attacks in the domain of virus detection. Using both global and local information, our virus defense successfully captures many viruses designed to evade detection. Our framework, exploration of attacks, and discussion of defenses provides a strong foundation for constructing secure learning systems.
In this Letter, a novel approach of encryption based on chaotic Hopfield neural networks with time varying delay is proposed. We use the chaotic neural network to generate binary sequences which will be used for masking plaintext. The plaintext is masked by switching of chaotic neural network maps and permutation of generated binary sequences. Simulation results were given to show the feasibility and effectiveness in the proposed scheme of this Letter. As a result, chaotic cryptography becomes more practical in the secure transmission of large multi-media files over public data communication network.
Recently, W. Yu and J. Cao [Phys. Lett. A 356, No. 4-5, 333–338 (2006; Zbl 1160.81356)] presented a new cryptographic scheme based on delayed chaotic neural networks. In this letter, a fundamental flaw in Yu's scheme is described. By means of chosen plaintext attack, the secret keystream used can easily be obtained. Editorial remark: There are doubts about a proper peer-reviewing procedure of this journal. The editor-in-chief has retired, but, according to a statement of the publisher, articles accepted under his guidance are published without additional control.
Posted by: carmelmathene06815.blogspot.com
Source: https://www.researchgate.net/publication/283083699_Applications_of_Machine_Learning_in_Cyber_Security
Posting Komentar untuk "Machine Learning And Security Pdf Download"