Современные информационные технологии/ 4.Информационная безопасность

 

Bilas O., Smetyukh L.

Lviv Polytechnic National University,  Ukraine

Context-free grammar for identifying dangerous hyperlinks

 

Introduction

The information in the scientific and technological progress becomes the object of specific social relations, which are arising from its inception in the accumulation, storage, handling and use. Formation, storage and elaboration of information resources is under the control  of various human activities through the use of modern information technology. Using of modern information technology has the potential for using in computer technology with a mercenary motive that promotes threat in informatization [1].

 

Theoretical foundations

Users are working with a large volume of electronic information received from various sources - external and internal. This intruders organized ties with members of the organization through the Internet remaining completely anonymous. Reports about network attacks, based on e-mail, pop-up windows and etc., appeared regularly. Considering attention to the threats associated with email, should note the spam that got via corporate and private email system. Obviously, one of the most common ways of stealing information is the user's transition with a hyperlink. Therefore there is a necessity to verify the hyperlinks on the subject of threats [2].

One of the Chomsky hierarchy of grammars is proposed to apply for implementing this verification. The Chomsky hierarchy is a containment hierarchy of classes of formal grammars. The Chomsky hierarchy, in essence, allows the possibility for the understanding and use of a computer science model which enables a programmer to accomplish meaningful linguistic goals systematically [3].

Described by Chomsky four types of grammars come from the base, unlimited grammar (grammar type 0), which consistently impose restrictions on production rules.

Depending on the type of simple grammar that can generates a given formal language, formal languages ​​are divided into categories corresponding from the type 0 to type 3. Further formal grammar is denoted G = (N, Σ, P, S). N is the set nonterminal symbols, Σ is a set of terminal symbols, P - a set of inference rules and S -  the initial symbol.

Tables 1 is providing an overview of four types of formal grammars, rules of inference, formal languages ​​and automaton that are able to recognize them[4].

 

Table 1. Grammar's classification

Grammar

Languages

Automaton

Production rules

Type-0

Recursively enumerable

Turing machine

α → β

α V* N V*, β V*

Type-1

Context-sensitive

Linear-bounded non-deterministic Turing machine

α A β → α'γ'β

A N, α, β V*, γ V+

Type-2

Context-free

Non-deterministic pushdown automaton

A → γ

A N, γ V*

Type-3

Regular

Finite state automaton

S → ε

A → aB  or

A → Ba

 

Sets symbols:

α ,β, γ -  strings; α ,β – may be empty, γ – must be nonempty

·       Σ:f inite set of terminal symbols

·       N: finite set of nonterminal symbols

·       V=Σ N:grammar vocabulary (finite set of terminal & nonterminal symbols)

Оperations:

·       C = complement

·       K = concatenation of formal languages

·       S = intersection

·       V = union

·       *=  Kleene star

This example applies to a context-free grammar (grammar type 2) as fully satisfying the requirements of the task. For context-free grammars defined different normal forms. For Chomsky normal form is cut the right side of inference rules, i.e., the right side may consist of one terminal symbol or two nonterminal. If the left side has the original symbol, right side may lead to an empty word.

In this context-free grammar rule for production:

А@a,

А Î Σ,

a Î N

N - set of nonterminal symbols {words, letters, numbers, spaces, punctuation marks}

Σ - the set of terminal symbols {a, b, c, ..., 0, 1, 2, ...,,,,. ,%,?,! ...}

P- set of inference rules

S – initial symbol

 


{ 0@digit,

1@digit,

2@digit,

3@digit,

4@digit,

…,

a@ letter,

b@ letter,

c@ letter,

d@ letter,

e@ letter,

…,

@Space

?@ punctuation,

%@ punctuation,

₴@ punctuation,

~@ punctuation,

…}

 


These rules can be effectively used for the analysis of hyperlinks for threats or other selfish actions. For example, if an attacker replaces in the hyperlink symbol O (the letter) to 0 (digit), and there isn't any deviation between, it is easy for a rule to identify the difference.

Conclusion

There is a necessity for analyzing hyperlinks due to the fact that along with the obvious success of the international community in the field of information clearly observed increase number of criminal acts committed by high information technologies, improved traditional types of fraud in view of what is necessary to implement research in order to improve anti-fraud policies online.

Use of a context-free grammar for hyperlinks checking could be considered as practical solving for such problems.

 

References

[1] Сабадаш В.П. Фішинг як найбільш поширений вид інтернет-шахрайства / Університські наукові записки, 2006 №1(17), ст. 228-233.

[2] Білас О.Є., Герун  Л.М. Методи та загрози соціальної інженерії для безпеки конфіденційних даних// Матеріали III-ї Всеукраїнської науково-практичної конференції Сучасні інформаційні технології в економіці, менеджменті та освіті. - Львів, 2012. – С. 159-163.

[3] Chomsky hierarchy [electronic resource] / Access: http://en.wikipedia.org/wiki/Chomsky_hierarchy

[4] Chomsky N., Schützenberger M.P. The algebraic theory of context free languages, Computer Programming and Formal Languages / P. Braffort, D. Hirschberg. — Amsterdam: 1963. — С. 118–161.