Современные
информационные технологии/ 4.Информационная безопасность
Bilas O., Smetyukh L.
Lviv Polytechnic National University, Ukraine
Context-free grammar for identifying dangerous
hyperlinks
Introduction
Theoretical foundations
Users are working
with a large volume of electronic information received from various sources -
external and internal. This intruders organized ties with members of the
organization through the Internet remaining completely anonymous. Reports about
network attacks, based on e-mail, pop-up windows and etc., appeared regularly. Considering
attention to the threats associated with email, should note the spam that got
via corporate and private email system. Obviously, one of the most common ways
of stealing information is the user's transition with a hyperlink. Therefore
there is a necessity to verify the hyperlinks on the subject of threats [2].
One of the
Chomsky hierarchy of grammars is proposed to apply for implementing this
verification. The Chomsky hierarchy is a containment hierarchy of classes of
formal grammars. The Chomsky hierarchy, in essence, allows the possibility for
the understanding and use of a computer science model which enables a
programmer to accomplish meaningful linguistic goals systematically [3].
Described by Chomsky
four types of grammars come from the base, unlimited grammar (grammar type 0),
which consistently impose restrictions on production rules.
Depending on the
type of simple grammar that can generates a given formal language, formal
languages are divided into categories corresponding from the type
0 to type 3. Further formal grammar is denoted G = (N, Σ, P, S). N is the
set nonterminal symbols, Σ is a set of terminal symbols, P - a set of
inference rules and S - the initial
symbol.
Tables 1 is
providing an overview of four types of formal grammars, rules of inference,
formal languages and automaton that are able to recognize them[4].
Table 1. Grammar's classification
|
Grammar |
Languages |
Automaton |
Production rules |
|
Type-0 |
Recursively enumerable |
Turing machine |
α → β α
∈ V* N V*, β ∈ V* |
|
Type-1 |
Context-sensitive |
Linear-bounded non-deterministic Turing machine |
α A β
→ α'γ'β A ∈ N, α, β ∈ V*, γ∈ V+ |
|
Type-2 |
Context-free |
Non-deterministic pushdown automaton |
A → γ A ∈ N, γ ∈ V* |
|
Type-3 |
Regular |
Finite state automaton |
S → ε A → aB or A → Ba |
Sets symbols:
α ,β,
γ - strings; α ,β – may be empty, γ – must be
nonempty
·
Σ:f
inite set of terminal symbols
·
N:
finite set of nonterminal symbols
·
V=Σ
∪ N:grammar vocabulary (finite set of
terminal &
nonterminal symbols)
Оperations:
·
C
= complement
·
K
= concatenation
of formal languages
·
S
= intersection
·
V
= union
·
*=
Kleene star
This example applies to a context-free grammar
(grammar type 2) as fully satisfying the requirements of the task. For
context-free grammars defined different normal forms. For Chomsky normal form
is cut the right side of inference rules, i.e., the right side may consist of
one terminal symbol or two nonterminal. If the left side has the original
symbol, right side may lead to an empty word.
In this context-free grammar rule for production:
А@a,
А Î Σ,
a Î
N
N - set of nonterminal symbols {words, letters, numbers, spaces,
punctuation marks}
Σ - the set of terminal symbols {a, b, c, ..., 0, 1, 2, ...,,,,.
,%,?,! ...}
P- set of inference rules
S – initial symbol
{ 0@digit,
1@digit,
2@digit,
3@digit,
4@digit,
…,
a@ letter,
b@ letter,
c@ letter,
d@ letter,
e@ letter,
…,
@Space
?@ punctuation,
%@ punctuation,
₴@ punctuation,
~@ punctuation,
…}
These rules can be effectively used for the analysis
of hyperlinks for threats or other selfish actions. For example, if an attacker replaces in the hyperlink symbol O (the
letter) to 0 (digit), and there isn't any deviation between, it is easy for a
rule to identify the difference.
There is a necessity for analyzing hyperlinks due to
the fact that along with the obvious success of the international community in
the field of information clearly observed increase number of criminal acts
committed by high information technologies, improved traditional types of fraud
in view of what is necessary to implement research in order to improve
anti-fraud policies online.
Use of a context-free grammar for hyperlinks checking
could be considered as practical solving for such problems.
[1] Сабадаш
В.П. Фішинг як найбільш поширений вид інтернет-шахрайства / Університські
наукові записки, 2006 №1(17), ст. 228-233.
[2] Білас О.Є., Герун Л.М. Методи та загрози соціальної інженерії для
безпеки конфіденційних даних// Матеріали III-ї
Всеукраїнської науково-практичної конференції “Сучасні
інформаційні технології в економіці, менеджменті та освіті”.
-
Львів, 2012. – С. 159-163.
[3] Chomsky hierarchy [electronic resource] / Access: http://en.wikipedia.org/wiki/Chomsky_hierarchy
[4] Chomsky
N., Schützenberger M.P. The algebraic
theory of context free languages, Computer Programming and Formal Languages /
P. Braffort, D. Hirschberg. — Amsterdam: 1963. — С. 118–161.