0% found this document useful (0 votes)

60 views

ATFL Assignment 1

Regular expressions are used in many areas related to software engineering and web technologies. They were originally used in early web search engines to search through indexes and databases. Regular expressions allow converting search queries into patterns that can match relevant strings. They are also used for software specification and testing by precisely defining component interfaces and characterizing test cases. Additionally, regular expressions play a key role in lexical analysis by defining tokens through patterns that segment text into meaningful units.

Uploaded by

Crack Tunes

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

ATFL Assignment 1

Uploaded by

Crack Tunes

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

ATFL Assignment 1

Asad Nasir
18-SE-37
Applications of Regular Expressions
Regular Expressions in Web Search Engines
One use of regular expressions that used to be very common was in web search engines. Archie, one of
the first search engines, used regular expressions exclusively to search through a database of filenames
on public FTP servers[1]. Once the World Wide Web started to take form, the first search engines for it
also used regular expressions to search through their indexes. Regular expressions were chosen for
these early search engines because of both their power and easy implementation.

It is a fairly trivial task to convert search strings into regular expressions that accept only strings that
have some relevance to the query. In the case of a search engine, the strings input to the regular
expression would be either whole web pages or a pre-computed index of a web page that holds only the
most important information from that web page

The results returned to the user would be the set of web pages that were accepted by this regular
expression. Many other features commonly seen in search engines are also easy to convert into regular
expressions. One example of this is adding quotes around a query to search for the whole string

Most of the other common features can also be easily converted into regular expressions.

Regular expressions are not used anymore in the large web search engines because with the growth of
the web it became impossibly slow to use regular expressions. They are however still used in many
smaller search engines such as a find/replace tool in a text editor or tools such as grep.

Regular Expressions in Software Engineering

Software applications are nowadays component-based for reusability. A problem with component based
applications is how to precisely define the interplay on the component interfaces. Since a programmer
does not have complete control over the integrating components, it can lead to unpredictable behavior.
Solving this problem would make it easy to understand the specifications of the interfaces, as well as the
correctness in implementation in terms of whether it adheres to the specifications. A precise and formal
specification for component behavior is a necessity in order to automate black-box testing. From textual
specifications, a finite state machine is produced, with its transitions labeled by messages, their data,
and the constraints on their data. The usefulness of regular languages in this comes from the fact that
regular languages deal in the finite which is true in reference to a computer.

Regular expressions are also used in test case characterizations. These test case characterizations are
related to the programs directly, or to the corresponding models for it. A generic framework is
developed where test cases are characterized, and coverage criteria are defined for test sets. Coverage
analysis can then be done, and test cases and test sets can be generalized. Regular language theory is
used to handle paths, and regular expressions are used over the terminals and nonterminals of the
paths, which are called “regular path expressions.” Operations done on the paths are restricted to the
level of regular expressions, and the only paths of interest are the regular path expressions that are
feasible. A regular expression is sufficient enough to abstractly describe a test case or a class of test
cases, but sets of expressions require their own criteria. The use of regular language theory makes it
easy for coverage analysis and test set generation.

The most expenditure for software stems from maintaining it rather than its actual development, and
much of that expenditure goes to testing. Regression testing is an important part of the software
development cycle. It is a process that is used to determine if a modified program still meets its
specifications or if new errors have been found. There is research being done to improve regression
testing to make it more efficient and specifically more economical. Regular language theory does not
play a huge part in regression testing, but on an integration level, a relation can be established to finite
automaton and regular languages and their properties.

Regular Expressions in Lexical Analysis

Lexical analysis is the process of tokenizing a sequence of symbols to eventually parse a string[2]. To
perform lexical analysis, two components are required: a scanner and a tokenizer.

A token is simply a block of symbols, also known as a lexeme. The purpose of tokenization is to
categorize the lexemes found in a string to sort them by meaning. For example, the C programming
language could contain tokens such as numbers, string constants, characters, identifiers (variable
names), keywords, or operators.

The best way to define a token is by a regular expression. We can simply define a set of regular
expressions, each matching the valid set of lexemes that belong to this token type. This is the process of
scanning. Often, this process can be quite complex and may require more than one pass to complete.
Another option is to use a process known as backtracking—that is, rereading an earlier part of a string to
see if it matches a regular expression based on some information that could only be obtained by
analyzing a later part of the string. It is important to note, however, that the process of scanning does
not produce the set of tokens in the document; it simply produces a set of lexemes. The tokenizer must
assign these lexemes to tokens.

In tokenization, we generally use a finite state machine to define the lexical grammar of the language we
are analyzing. To generate this finite state machine, we again turn to regular expressions to define which
tokens may be composed of which lexemes.

This regular expression says that identifiers must begin with a Roman letter or an underscore and may
be followed by any number of letters, underscores, or numbers.

However, there is one problem with the process of tokenization: we are unable to use regular
expressions to match complex recursive patterns, such as matching opening and closing parentheses, for
example. This is because these strings are not in a regular language and therefore cannot be matched by
a regular expression. To deal with this problem, we must invoke the use of a parser—this is beyond the
scope of this document.

After we have our text broken up into a set of tokens, we must pass the tokens on to the parser so that
it can continue to analyze the text. Numerous programs exist to automate this process. For example, we
could use yacc to convert BNF-like1 grammar specifications to a parser which can be used to deal with
the tokens produced through lexical analysis. Similarly, many lexical analyzers (often called simply
lexers) exist to automate the process of scanning and tokenization—one of the best known is lex.

Extracting emails from a Text Document

A lot of times, the sales and marketing teams might require finding/extracting emails and
other contact information from large text documents.

Now, this can be a cumbersome task if you are trying to do it manually! This is exactly the
kind of situations when Regex really shines.

Regular Expressions for Web Scraping (Data Collection)

Data collection is a very common part of a Data Scientist’s work and given that we are living
in the age of internet, it is easier than ever to find data on the web. One can simply scrape
websites like Wikipedia etc. to collect/generate data.

But web scraping has its own issues – the downloaded data is usually messy and full of
noise. This is where Regex can be used effectively!

Working with Date-Time features

Most of the real world data has some kind of Date or Time column associated with it. Such
columns carry useful information for the model, but since Date and Time have multiple
formats available it becomes difficult to work with such data.

Using Regex for Text Pre-processing (NLP)

When working with text data, especially in NLP where we build models for tasks like text
classification, machine translation and text summarization, we deal with a variety of text that
comes from diverse sources.

For instance, we can have web scraped data, or data that’s manually collected, or data
that’s extracted from images using OCR techniques and so on!

Petrobowl 2023 1
100% (1)
Petrobowl 2023 1
14 pages
Pharmacology MCQs & Past Papers 4 MBBS, DPT, Pharm D (Password Protected)
100% (1)
Pharmacology MCQs & Past Papers 4 MBBS, DPT, Pharm D (Password Protected)
247 pages
Tourism Planning Development The Jed'S Island Resort: Inquiries, Investigation, and Immersion
100% (1)
Tourism Planning Development The Jed'S Island Resort: Inquiries, Investigation, and Immersion
25 pages
Regular Expressions and Its Applications
No ratings yet
Regular Expressions and Its Applications
6 pages
Python Learn Python Regular Expressions FAST The Ultimate Crash Course To Learning The Basics of Python Regular Expressions - (Acodemy)
100% (1)
Python Learn Python Regular Expressions FAST The Ultimate Crash Course To Learning The Basics of Python Regular Expressions - (Acodemy)
127 pages
Applications of Regular Expressions
No ratings yet
Applications of Regular Expressions
2 pages
Regular Expressions Demystified: A Practical Guide with Examples
From Everand
Regular Expressions Demystified: A Practical Guide with Examples
William E. Clark
No ratings yet
Introduction to regular expressions
No ratings yet
Introduction to regular expressions
18 pages
Regular Expression
No ratings yet
Regular Expression
11 pages
Regular Expressions To Identify A String
No ratings yet
Regular Expressions To Identify A String
1 page
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
Expression (Abbreviated Regex or Regexp) and Sometimes Called A Rational Expression
No ratings yet
Expression (Abbreviated Regex or Regexp) and Sometimes Called A Rational Expression
4 pages
03 Regular Expressions and Grammars Parser Generators 16102023 041542pm
No ratings yet
03 Regular Expressions and Grammars Parser Generators 16102023 041542pm
32 pages
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
No ratings yet
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
42 pages
Regex
100% (1)
Regex
42 pages
Manipulating Text
No ratings yet
Manipulating Text
13 pages
Regex
No ratings yet
Regex
24 pages
Application of Regular Expression
No ratings yet
Application of Regular Expression
7 pages
Applications of Regular Expressions
No ratings yet
Applications of Regular Expressions
1 page
Lesson 1: Introducing Regular Expressions
No ratings yet
Lesson 1: Introducing Regular Expressions
4 pages
Regular Expression - Wik..., The Free Encyclopedia PDF
No ratings yet
Regular Expression - Wik..., The Free Encyclopedia PDF
9 pages
Natural Language Processing - Session 3 - Regular Expressions
No ratings yet
Natural Language Processing - Session 3 - Regular Expressions
39 pages
Co Data
No ratings yet
Co Data
76 pages
Usage of Regular Expressions in NLP
No ratings yet
Usage of Regular Expressions in NLP
7 pages
Usage of Regular Expressions in NLP
No ratings yet
Usage of Regular Expressions in NLP
7 pages
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
regular expressions - Pattern matching
No ratings yet
regular expressions - Pattern matching
107 pages
REGEX in Data Analytics
No ratings yet
REGEX in Data Analytics
5 pages
CC 2
No ratings yet
CC 2
65 pages
Regular Expressions
No ratings yet
Regular Expressions
4 pages
Term Paper On Regular Expression
100% (1)
Term Paper On Regular Expression
8 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Text-Processing-For-NLP-Understanding-Regex (7)
No ratings yet
Text-Processing-For-NLP-Understanding-Regex (7)
16 pages
An Introduction To Regular Expressions (9781492082569)
100% (1)
An Introduction To Regular Expressions (9781492082569)
17 pages
101 PDFsam Matlab Prog
No ratings yet
101 PDFsam Matlab Prog
20 pages
Regular Expression
No ratings yet
Regular Expression
11 pages
2
No ratings yet
2
29 pages
Regular Expressions
No ratings yet
Regular Expressions
20 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
CS312 NLP Lecture 2 Basic Text Processing
No ratings yet
CS312 NLP Lecture 2 Basic Text Processing
10 pages
Compiler Construction Lecture Notes: Why Study Compilers?
No ratings yet
Compiler Construction Lecture Notes: Why Study Compilers?
16 pages
1586345305compiler Construction Lecture 1
No ratings yet
1586345305compiler Construction Lecture 1
4 pages
WWW Freecodecamp Org News Regular-Expressions-For-Javascript-Developers
No ratings yet
WWW Freecodecamp Org News Regular-Expressions-For-Javascript-Developers
50 pages
Regular Expressions, Text Normalization, Edit Distance
No ratings yet
Regular Expressions, Text Normalization, Edit Distance
30 pages
Ayan Saha - 10700121101
No ratings yet
Ayan Saha - 10700121101
10 pages
Lightgrep
100% (1)
Lightgrep
17 pages
Regular Expressions & Automata
No ratings yet
Regular Expressions & Automata
62 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Learning REGEX
No ratings yet
Learning REGEX
94 pages
Py Regex
No ratings yet
Py Regex
50 pages
Mastering Regular Expressions: Jeffrey E. F. Friedl
No ratings yet
Mastering Regular Expressions: Jeffrey E. F. Friedl
10 pages
CS 491 Natural Language Processing Module 2: Basic Text Processing
No ratings yet
CS 491 Natural Language Processing Module 2: Basic Text Processing
24 pages
9Python-Simple-Character-Matches
No ratings yet
9Python-Simple-Character-Matches
19 pages
Regex Slides PDF
No ratings yet
Regex Slides PDF
435 pages
NLP Unit1Content
No ratings yet
NLP Unit1Content
106 pages
Regular Expressions: SESSION - 14 - 15 - 16
No ratings yet
Regular Expressions: SESSION - 14 - 15 - 16
42 pages
Regular Expressions
100% (5)
Regular Expressions
94 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Lecture-2n-04032024-081220pm-19022025-105409am
No ratings yet
Lecture-2n-04032024-081220pm-19022025-105409am
38 pages
Chapter Two
No ratings yet
Chapter Two
72 pages
Programming in Star
From Everand
Programming in Star
Francis McCabe
No ratings yet
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Assignment
No ratings yet
Assignment
49 pages
#Include #Include #Include #Include
No ratings yet
#Include #Include #Include #Include
2 pages
AI Assignment: Asad Nasir - 37 Muhammad Usman Ali - 29 Momin - 49
No ratings yet
AI Assignment: Asad Nasir - 37 Muhammad Usman Ali - 29 Momin - 49
7 pages
Ai Problems
No ratings yet
Ai Problems
1 page
WE Lab Assignment 2
No ratings yet
WE Lab Assignment 2
5 pages
Performance Simulation of Turbofan
No ratings yet
Performance Simulation of Turbofan
19 pages
Mavi Annual Report 2022
No ratings yet
Mavi Annual Report 2022
201 pages
Checklist For Course Modules Devt - With BiPSU Logo
No ratings yet
Checklist For Course Modules Devt - With BiPSU Logo
2 pages
TheOneMinuteTo DoList Ed2.2 Free Edition
No ratings yet
TheOneMinuteTo DoList Ed2.2 Free Edition
127 pages
NCCM-2015 Brochure Final
No ratings yet
NCCM-2015 Brochure Final
4 pages
Hamburger
No ratings yet
Hamburger
2 pages
Personification Vs Hyperbole
No ratings yet
Personification Vs Hyperbole
2 pages
ELISA Worksheet 5
No ratings yet
ELISA Worksheet 5
3 pages
A Thermodynamic Study of (2+1) - Dimensional Analytic Charged Hairy Black Holes With Born-Infeld Electrodynamics
No ratings yet
A Thermodynamic Study of (2+1) - Dimensional Analytic Charged Hairy Black Holes With Born-Infeld Electrodynamics
24 pages
K00228 - 20191014013308 - Lecture Note - Chapter 3 Vector Space Part 1
No ratings yet
K00228 - 20191014013308 - Lecture Note - Chapter 3 Vector Space Part 1
65 pages
Module 03 Tute Que Perf Comp & Oligopoly
No ratings yet
Module 03 Tute Que Perf Comp & Oligopoly
4 pages
EMC Classic Performance
No ratings yet
EMC Classic Performance
1 page
Workshop Unit 3 - Avanzado
No ratings yet
Workshop Unit 3 - Avanzado
31 pages
BME01.HM3C.chapter 2 - Roa - Giovanni
No ratings yet
BME01.HM3C.chapter 2 - Roa - Giovanni
10 pages
Test For Unit 10
100% (2)
Test For Unit 10
4 pages
Swift Reply - The Newsletter of No 72 SQN Association
No ratings yet
Swift Reply - The Newsletter of No 72 SQN Association
44 pages
Complete the Sentences With the Verb in Brackets in the Past Simple or Past Continuous
No ratings yet
Complete the Sentences With the Verb in Brackets in the Past Simple or Past Continuous
3 pages
Report Encryption Decryption
100% (4)
Report Encryption Decryption
50 pages
The Philippines in The South China Sea Dispute
No ratings yet
The Philippines in The South China Sea Dispute
2 pages
INTRODUCTION
No ratings yet
INTRODUCTION
17 pages
Retail Food Sector Jakarta Indonesia 1-7-2010
No ratings yet
Retail Food Sector Jakarta Indonesia 1-7-2010
28 pages
MA45DS1201
No ratings yet
MA45DS1201
6 pages
Unit 10 In-Store Objectives, Advantages and Limitations
No ratings yet
Unit 10 In-Store Objectives, Advantages and Limitations
11 pages
All About NPD (Narcissistic Personality Disorder) : Abnormal Psychology
No ratings yet
All About NPD (Narcissistic Personality Disorder) : Abnormal Psychology
4 pages
1 - Delaware Park RETROSPECTOS
No ratings yet
1 - Delaware Park RETROSPECTOS
8 pages
EAL Sequence of Learning Overview - Level B2
100% (1)
EAL Sequence of Learning Overview - Level B2
8 pages
June 2022 MS (1)
No ratings yet
June 2022 MS (1)
27 pages
OCA vs. Necessario
100% (1)
OCA vs. Necessario
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ATFL Assignment 1

Uploaded by

ATFL Assignment 1

Uploaded by

ATFL Assignment 1

Regular Expressions in Software Engineering

Regular Expressions in Lexical Analysis

Extracting emails from a Text Document

Regular Expressions for Web Scraping (Data Collection)

Working with Date-Time features

Using Regex for Text Pre-processing (NLP)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.