0% found this document useful (0 votes)

254 views

Data Extraction From Hand-Filled Form Using Form Template

Database is very vital for taking the day to day decision and in the long run it helps in formulation of policies, strategies of an organization. Numerous efforts, time and money are spent to get, store and process the data. To get the data from a user, an interface is designed which is known as form. The forms may vary from paper based to online. Manually processing paper based form is prone to errors. Therefore, it will be useful to deploy automated systems for reading data from paper based forms and storing it in the database. Further, this data can be modified, processed and analyzed. In this paper, we have proposed a method to extract data from hand-filled pre-designed form based on form templates.

Uploaded by

Editor IJRITCC

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

254 views

Data Extraction From Hand-Filled Form Using Form Template

Uploaded by

Editor IJRITCC

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

International Journal on Recent and Innovation Trends in Computing and Communication

ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________

Data Extraction from Hand-filled Form using Form Template

Rohit Sachdeva

Dharam Veer Sharma

Department of Computer Science

Multani Mal Modi College,
Patiala, India
rsachu.147@gmail.com

Department of Computer Science

Punjabi University,
Patiala, India
dveer72@hotmail.com

Abstract- Database is very vital for taking the day to day decision and in the long run it helps in formulation of policies, strategies
of an organization. Numerous efforts, time and money are spent to get, store and process the data. To get the data from a user, an
interface is designed which is known as form. The forms may vary from paper based to online. Manually processing paper based
form is prone to errors. Therefore, it will be useful to deploy automated systems for reading data from paper based forms and
storing it in the database. Further, this data can be modified, processed and analyzed. In this paper, we have proposed a method to
extract data from hand-filled pre-designed form based on form templates.
Keywords-Data Extraction, Hand-filled form, Form Tempalte, Color Drop

__________________________________________________*****_________________________________________________
I.

INTRODUCTION

In daily routine, forms are used to get data from users.

Form is a user interface for data collection. In many offices
various types of form documents are processed for collecting
data. The data obtained through the forms is a static
representation of handwriting. Scanning of these forms will
only produce an image copy of the document. That copy may
be displayed on the screen and printed, but that cannot be
changed or re-formatted. These forms are manually entered in
the system, which is time consuming and arduous task.
Moreover, it also puts a lot of strain on the financial resources
of the organization. Further, it gives rise to many non-sampling
errors.
Requirement of the hour is to devise an automated solution
for extraction and recognition of data from paper based forms.
Taking due consideration of the facts an algorithm has been
developed, extracts the handwritten data from paper based
form. Extracted data may be further converted into an editable
form. As compared to manual data feeding system, proposed
system reduces the number of human resources, work at
maximum speed, less error-prone, gives better performance and
more reliable.
The application areas include: railway
reservation, census data collection, banking system, tax
payments, postal systems, educational institutes, office
automation etc.
II.

REVIEW OF LITERATURE

A general system for extraction and cleaning of data from

handwritten forms has been proposed by Ye et al.[1]. The items
of interest are located from the form for which a model
template is generated from a blank form, which is used to
remove the form frame from the actual forms to be used for
recognition. To clean the handwriting touching the pre-printed
text morphological operations based on statistical features are
used. Authors reported 95.5% of recognition rate.
Sako et al.[2] have proposed a form reading technology
based on form type identification and form-data recognition. A
recognition rate of 97 % has been reported by the author.
An algorithm for removal of the field frame boundary of the
hand filled forms in Gurmukhi Script proposed by Sharma et
al.[3]. In their paper authors discussed about the characteristics
of Gurmukhi script such as use of headline and varied writing

styles and also discussed problems related to it, such as filled

data may overlap or get merged with the field frame
boundaries. A novel approach has been proposed to remove the
form field frame boundary, while preserving the data contained
therein.
While writing in the form, a person may write within a box
or outside the prescribed box. The form field line removal
methods extract the data only from the prescribed box. There is
need to overcome this problem of extracting the words, even
outside the boundary area (box) of the form. In form
recognition, frame line detection is a vital and difficult step.
Existing methods of form field line removal in roman script are
given in[4-13]. Two most common methods used for line
detection are Hough transform discussed by Illingworth et
al.[4] and vectorization by Liu et al.[5].
Simoncini et al.[6] developed a system in which a set of
regions of each edge are extracted and a standard line fitting
method is used to parameterize them. The deletion of the lines
is carried out, leading to an excessive erosion of the crossing
strokes. At the end, they must be repaired and the crossing
characters re-constructed.
Hough transforms method has great advantage that it can
detect dashed or broken lines. However, it is not applied in
form recognition as it is too slow. Generally in forms, the frame
lines are horizontal and vertical. Liu et al. [7] and Chen et al.[8]
are used, fast modified Hough transforms method which are
just projection methods. These methods have the drawback that
they are unable to detect diagonal lines and frame lines with
large skew angles. The projections of frame lines are
overwhelmed in the projection of characters and cannot be
correctly detected, when the characters are merged or
overlapped with the frame lines. Vectorization method, which
is bottom up approach used by Liu et al.[5] can solve the
problems of projection methods. Firstly, vectors are extracted
from images. The whole objects are detected by merging all the
extracted vectors.
Yoo et al. [9] tested and analyzed their proposed system on a
large number of real handwritten form samples and concise out
13 types, 34 subtypes overlapping modes formed by Korean
characters overlapping with frame lines. Different techniques
are used to deal with every overlapped mode whenever it is
detected. This method is very tedious, and cannot cover all
5311

IJRITCC | August 2015, Available @ http://www.ijritcc.org

________________________________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication

ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________

overlapping modes. Moreover, it is more difficult to detect

overlap mode in form document which contains noise.
In most of the literature, frame line detection procedures
depend on a critical threshold representing the character size.
This threshold value is a constant value as per Liu et at.[7] and
Chen et al.[8] and it is input by users as assumed by Pan[10].
Zheng et al.[11] used vectorization method by using novel
image structure element named "Directional Single-Connected
Chain (DSCC)", as the element vector. Most of the frame lines
are detected correctly by merging DSCCs under some
constraints and it can also solve most types of character-line
crossing problems but the pseudo lines and the broken lines
cannot detect correctly. Vectorization methods are used the
large number of vectors so these methods are much slower than
projection methods. Shimamura et al.[12] used the erosion and
dilation method of removing field frame lines. This method is
not possibly practical due to variation of thickness in
handwritten data and in some cases it may be thinner than the
frame boundaries.
These methods are not suitable for Indic scripts such as
Devanagari, Gurumukhi scripts where characters are connected
with the headline. Since, if the headline is merged with the field
frame boundaries, then removal of field frame boundaries will

Figure 1-1:

inadvertently remove the headline from the text and produce

wrong recognition results.
III.

PROPOSED METHOD OF DATA EXTRACTION

A. Form Designing
Before designing forms, determines which types of data and
how many various parts of data required. Then the sequence
from one field to the next field, for example, in personal
information first field should be name and the second field
must be fathers name. That data must be captured in the fields
of the form in the logical arrangement. These fields must have
their captions on a form which helps in entering data in the
appropriate fields. Two blocks of square shape are used for
reference points for marking the starting and ending of form.
For the starting reference point, on the left-top corner square is
placed and for ending reference point on the right bottom
corner square is placed as shown in figure 1. Once these two
reference points are identified, then the relative distances of
other fields from the starting reference point are used in
measuring the absolute coordinates of the fields on the forms.

A sample of designed form

B. Collection of data
The sample data collection form collects data about
student and covers different types of data to be supported by
the system. A total of 235 forms of the same type have been

used. Each form consisted of 22 fields (figure 1.1). The

forms were filled by different students with their natural
handwriting one of the sample as shown in figure 1.2.

5312
IJRITCC | August 2015, Available @ http://www.ijritcc.org

________________________________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication

ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________

Figure 1-2:

A sample of filled form

C. Template Generation
Generating a form definition is backbone of the present
form processing systems. The form contains the field
captions which are printed on the form and data areas where
the user writes the data in their handwriting. From a blank
form a template is created which is used to separate the preprinted matters i.e. captions and data areas. This template
helps to remove the form frame from the actual forms to get
the required printed data, such as barcode and handwritten
data used for recognition.
For customized form processing a new system has to be
developed. The paper presents a method to extract data from
hand-filled pre-designed form (Figure1.1) based on the form
template.
The recognizable fields of the form are highlighted to
generate a form template as shown in figure 1.3. A
definition of the form is generated by using the form
template. It contains the information about starting and
ending reference points for skew assessment, number of
fields, fields data types their sequences and locations, for
validation of numeric field (where applicable) its domain
registration, for post processing contextual dictionaries
applicable on text fields. The data types, considered for the
system, are given in Table 1.1 and have been marked
accordingly in the Figure 1.3.

Table 1-1: Table of data types used in the system

Sno
1.

Data Type
Name
Numeric

Alpha

Alphanum

Date

Choice

Picture

Purpose
For storing digits only as in case of age, pin
code, phone number etc.
For storing data consisting of alphabets only
like parts of name, city, state etc.
For storing data consisting of alphabets and
numbers like house number, class etc.
For storing date type data e.g. date of birth, date
of joining, date of purchase etc.
It is used where multiple options are available
as in case of gender which can be (1) male or
(2) female.
For sub-images in the form document, like
photographs, signatures, barcodes etc.

In Figure 1.3, starting and ending reference points (refereed

as service fields) is marked as 0 fields, which are also used
for detection of form level skewness and forms top and
bottom. For every new type of form, a new form definition
is generated. It is a one-time process and stored in a file. It
can be reused for further modification purpose. At the time
of form definition, all the validation sources are provided
such as a domain for numeric fields, dictionaries for
alphabetic fields etc. During the post processing these
sources are used to validate data. The system is flexible that
provides the facility of creating and dynamically adding new
dictionaries.

5313
IJRITCC | August 2015, Available @ http://www.ijritcc.org

________________________________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication

ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________

Figure 1-3: Template of a sample form with field types marked on it

A module has been developed, to detect the fields on the

form by using a form template image which interactively
generates the form definition (Figure 1-4).

of the field, dictionary to be used for post-processing in case

of alphabetic fields, range of values for numeric fields.
Picture fields (photograph and signature) are the sub-images
which are extracted and stored separately.

Positions of the fields on the form are calculated and user is

required to supply information regarding type of field, name

Figure 1-4: Screen short of module while genrating form defination

D. Digitization
Digitization means scanning the original paper based
form document and storing it as a digital image. Brightness,
contrast and scanning resolution measured in terms of dots

per inches are the key factors while digitization of paper

based form. In the present system, the forms have been
scanned with 300 dpi resolution, the 100 threshold for
brightness and 100 for contrast (these values have been
5314

IJRITCC | August 2015, Available @ http://www.ijritcc.org

________________________________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication

ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________

computed after experimentation) which is without any

distortions.
E. Form Level Skew Correction
The form image may get skewed during scanning of form
documents. This may happen because of improper
alignment of paper on the scanner. It results in the wrong
alignment of text on the form document image. Therefore,
before the data extracted from the form, skewness of the
forms are checked and removed by using skew correction.
As discussed earlier in form designing, reference points
for marking the starting and ending of forms were placed.
These points are located and their distance is calculated and
compared with the template forms starting and ending
reference points distance. If any deviation is found between
them, then it indicates skew in the form image. This is used
for the calculation of skew angle. To save the time, the skew
is only detected and correction is deferred till the time of
field data extraction.
The skew parameters XFactor and YFactor, calculated by
using a technique given by Sharma et al.[13], are used for
filtering the actual location. By using the following formulae
we get the new coordinates of the rectangle of the field.
newrect.left = rect.left + rect.top/XFactor
newrect.top = rect.top + rect.left/YFactor
newrect.right = rect.right + rect.bottom/XFactor
newrect.bottom = rect.bottom + rect.right/YFactor
However, form images having large skew, which is
detected based on the values of XFactor and YFactor, are
not corrected but only detected. If the value of XFactor and
YFactor are lesser than 12% (chosen after experimentation)
of the value of the image height and image width
respectively, the form image is highly skewed and is not
processed.
F. Field data extraction
In this step, from the form, fields are traced and their
boundaries are removed and data are extracted from it. The
boundaries of the fields are removed by using various
techniques such color dropout, form template, removing
boundary lines, etc. In monochrome forms, by continuously
removing the boundary lines, boundary of the field is
removed. While in colored form, the boundaries are
removed by dropping the color, which is used while
designing the form fields.
In the proposed method, to extract the hand filled data
from the form, location of fields are located by using the
coordinates of designed form fields stored during the form
template generation as shown in Figure 1.3.
Except the picture fields this is applied to a rectangular
area of each field. The first step is to store picture fields as
an image in the database, it is directly extracted and stored.
The second step is to find the correct location and the field

size on the form. The size of the actual bounding field

rectangle may be more than identified in the form definition
phase because of overlapping of filled data with the field
boundaries (Figure 1-5). The actual rectangular of the field
is identified by finding the bounding rectangle of the field.

Figure 1-5: Examples of overlapping on all four sides

With the physical location of field of template form, we

are able to locate the fields on the form. To locate the field,
the starting and ending reference point plays a key role. To
rectify the value of field location of the source form, the
difference between the value of starting reference of
template form and source form calculated and processed.
After finding the location of the field next step is to extract
data from source form.
Sharma et al.[3] proposed a method for form field frame
boundary removal for form processing system in Gurmukhi
script by using some assumptions. This method removes the
form field frame boundary, while preserving the data
contained therein. The drawback of the technique is that if
the word of the field contains a character without a headline,
like , , etc. then the headline is added to such characters
and these can be wrongly recognized. Another problem is
overlapping of filled data with the field caption of the form.
By using the color drop method we can eliminate these
problems in color forms. Color drop is a better alternative
than the form field frame boundary removal in color forms.
In this method, forms field captions and boundaries are
printed using lighter tones of any some specific color,
usually Red as per the convention. The user can fill data
using a dark color pen (Blue or Black), other than Red. By
dropping the caption color, which in this case is Red, the
form field boundary is removed. The main advantage of
using color dropout is that the field boundary does not get
mixed up with the data which results accurate data
extraction. This method is a little costly as compare to form
filed boundary removal because in this case the form must
be colored printed. Though, its cost is more, but the
advantage derived over weighs the cost involved.
IV.

Results

A colored form is designed which contain 22 fields as

shown in figure 1.1. A total of 235 students filled forms with
their natural handwriting (one of the samples is shown in
figure 1.2). A template of the form is generated as shown
figure 1.3. Then these forms are scanned by using resolution
at 300 dpi, threshold at 100 and contrast at 100. By using the
color drop method, red color is dropped and the output of
that is shown in figure 1.6.

5315
IJRITCC | August 2015, Available @ http://www.ijritcc.org

________________________________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication

ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________

Figure 1-6: Form after color drop

Then by using template matching method data is

extracted. Picture fields are the sub-images which are
extracted and stored as images. For other data types fields,
field positions are located by using template matching
method and fields are extracted and stored as images as
shown in Table 1.2.

Field name
Date of Birth
Date
Month

Field name
Roll No

Class

Extracted Image

Students First Name

Year

Department

Students
Middle
Name
Students Last Name
Fathers First Name
Fathers
Middle
Name
Fathers Last Name
Mothers First Name

Extracted Image

Table 1-2: Results of data extraction from hand-filled form

Conclusion

This paper proposes a method for data extraction from the

hand-filled form by using color drop and template matching
methods. For this purpose colored form is designed with
caption written in red color. We obtained encouraging
results and extracted data fields are further used as the input
for the segmentation process. .
REFERENCES

Mothers
Name

Middle

Mothers Last Name

House No
Address
City
State
Pin code
Telephone

X. Ye, M. Cheriet, C. Y. Suen, A Generic System to Extract

and Clean Handwritten Data From Business Forms, in the
Proceedings of the 7th International Workshop on Frontiers in
Handwriting Recognition (IWFHR), pp 63-72, 2000.
[2] H. Sako, M. Seki, N. Furukawa, H. Ikeda, A. Imaizumi,
Form Reading Based on Form-type Identification and Formdata Recognition, in the Proceedings of the 7th International
Conference on Document Analysis and Recognition
(ICDAR), pp. 926930, 2003.
[3] D.V. Sharma, G. S. Lehal, Form Field Frame Boundary
Removal for Form Processing System in Gurmukhi Script, in
the Proceedings Of the 10th International Conference on
Document Analysis and Recognition (ICDAR), pp. 256260,2009.
[4] J. Illingworth, J. Kittler, A Survey of the Hough Transform,
Computer Vision, Graphics & Image Processing, vol.44,
pp.87-116, 1988.
[1]

5316
IJRITCC | August 2015, Available @ http://www.ijritcc.org

________________________________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication

ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________
[5]

[6]

[7]

[8]

[9]

W. Liu, D. Dori, From Raster to Vectors: Extracting Visual

Information from Line Drawings, Pattern Analysis &
Application, No.2, pp.10-21, 1999.
L. Simoncini, V. Kovacs, M. Zs, A System for Reading USA
Census '90 Hand-Written Fields, in the Proceedings of the 3rd
International Conference on Document Analysis and
Recognition (ICDAR), vol. 1, pp. 86-90, 1995.
J. Liu, X. Ding, Y. Wu, Description and Recognition of
Form and Automated Form Data Entry, in the Proceedings of
the 3rd International Conference on Document Analysis and
Recognition (ICDAR), pp. 579-582, 1995
J. L. Chen, H. J. Lee, An Efficient Algorithm for Form
Structure Extraction Using Strip Projection, Pattern
Recognition, Vol.31, No.9, pp.1353-1368, 1998.
J. Y. Yoo, M. K. Kim, S. Y. Han, Y. B. Kwon, Line
Removal and Restoration of Handwritten Characters on the
Form Documents, in the Proceedings of the 4th International
Conference on Document Analysis and Recognition
(ICDAR), vol. 1, pp. 128-131, 1997.

[10] S. Pan, Research and Realization of a General Form

Recognition System, Master Thesis of Tsinghua University,

1999
[11] Y. Zheng, C. Liu, X. Ding, S. Pan, Form Frame Line
Detection with Directional Single-Connected Chain, in the
Proceedings of the 6th International Conference on Document
Analysis and Recognition (ICDAR), pp. 699-704, 2001
[12] T. Shimamura, B. Zhu, A. Masuda, M. Onuma, T. Sakurada,
M. Nakagawa, A Prototype of an Active Form System, in
the Proceedings of the 7th International Conference on
Document Analysis and Recognition (ICDAR), vol. 2, pp.
921-926, 2003.
[13] D. V. Sharma, G. S. Lehal, A Fast Skew Detection and
Correction Algorithm for Machine Printed Words in
Gurmukhi Script, in the proceedings of the International
Workshop on Multilingual OCR, Article No.15, ACM, NY,
2009.

5317
IJRITCC | August 2015, Available @ http://www.ijritcc.org

________________________________________________________________________________________________________

STP of HP
80% (5)
STP of HP
4 pages
PWM Signal Converter - Caterpillar
100% (8)
PWM Signal Converter - Caterpillar
4 pages
Handwritten Characters Extraction From Form Based On Line Shape Characteristics
No ratings yet
Handwritten Characters Extraction From Form Based On Line Shape Characteristics
6 pages
8
No ratings yet
8
5 pages
Geometric Primitive: Exploring Foundations and Applications in Computer Vision
From Everand
Geometric Primitive: Exploring Foundations and Applications in Computer Vision
Fouad Sabry
No ratings yet
Handwritten Manuscript Digitizer: Kaushil Ruparelia Ashay Shah Shah - Ashay@yahoo. Com Seema Wadhwani Dr. M Mani Roja
No ratings yet
Handwritten Manuscript Digitizer: Kaushil Ruparelia Ashay Shah Shah - Ashay@yahoo. Com Seema Wadhwani Dr. M Mani Roja
3 pages
Computer Vision CH4
No ratings yet
Computer Vision CH4
9 pages
Text Recognition Handwritten Words
No ratings yet
Text Recognition Handwritten Words
18 pages
BT4344 - Project Report
No ratings yet
BT4344 - Project Report
40 pages
FORM-based Document Understanding Sequential Model
No ratings yet
FORM-based Document Understanding Sequential Model
7 pages
FAST IMAGE RETRIEVAL - Article
No ratings yet
FAST IMAGE RETRIEVAL - Article
9 pages
OpenCV Real-Time Face Recognition Attendance System To Online-School Attendances
No ratings yet
OpenCV Real-Time Face Recognition Attendance System To Online-School Attendances
4 pages
Face Detection Word
No ratings yet
Face Detection Word
119 pages
Research Paper
No ratings yet
Research Paper
4 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Uhdojsadmin,+Journal+Manager,+UHDJST Twana 20210417 V1
No ratings yet
Uhdojsadmin,+Journal+Manager,+UHDJST Twana 20210417 V1
7 pages
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
From Everand
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
Fouad Sabry
No ratings yet
Paper 16-Localisation of Numerical Date Field in An Indian Handwritten Document
No ratings yet
Paper 16-Localisation of Numerical Date Field in An Indian Handwritten Document
4 pages
REPORT
No ratings yet
REPORT
82 pages
A System For Reading USA Census Hand-Written Fields
No ratings yet
A System For Reading USA Census Hand-Written Fields
6 pages
Handwriting Recognition Methods Using Artificial Neural Networks
No ratings yet
Handwriting Recognition Methods Using Artificial Neural Networks
10 pages
Automatic Traffic Rule Violation Detection and Number Plate Recognition
No ratings yet
Automatic Traffic Rule Violation Detection and Number Plate Recognition
5 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
A Survey On Handwritten Character Recognition (HCR) Techniques For English Alphabets
No ratings yet
A Survey On Handwritten Character Recognition (HCR) Techniques For English Alphabets
12 pages
facial-recognition-attendance-system-using-flask-and-open-cv-ICCIDT2K23-224
No ratings yet
facial-recognition-attendance-system-using-flask-and-open-cv-ICCIDT2K23-224
4 pages
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Hybrid Student Authentication System Using RFID Reader and Face Biometrics Using Deep Learning Techniques
No ratings yet
Hybrid Student Authentication System Using RFID Reader and Face Biometrics Using Deep Learning Techniques
7 pages
Handwriting Recognition System-A Review: Pooja Yadav Nidhika Yadav
No ratings yet
Handwriting Recognition System-A Review: Pooja Yadav Nidhika Yadav
5 pages
Diagonal Based Feature Extraction For Handwritten Alphabets Recognition System Using Neural Network
No ratings yet
Diagonal Based Feature Extraction For Handwritten Alphabets Recognition System Using Neural Network
12 pages
Offline Handwritten Character Recognition Techniques Using Neural Network A Review
100% (1)
Offline Handwritten Character Recognition Techniques Using Neural Network A Review
8 pages
Students Attendance Monitoring & Accessing Control Based Raspberry - PI
No ratings yet
Students Attendance Monitoring & Accessing Control Based Raspberry - PI
7 pages
Students Attendance Monitoring & Accessing Control Based Raspberry - PI
No ratings yet
Students Attendance Monitoring & Accessing Control Based Raspberry - PI
7 pages
JAS SUT 15-16 Final-102-111
No ratings yet
JAS SUT 15-16 Final-102-111
10 pages
Attendance Management System Using Face-Recognitio
No ratings yet
Attendance Management System Using Face-Recognitio
6 pages
HRW
No ratings yet
HRW
28 pages
English Language Review Using Pattern Recognition and Machine Learning
No ratings yet
English Language Review Using Pattern Recognition and Machine Learning
12 pages
Face Recognition Based Attendance Monitoring System Using Python
No ratings yet
Face Recognition Based Attendance Monitoring System Using Python
6 pages
Texture Features From Handwritten Images For Writer Identification
No ratings yet
Texture Features From Handwritten Images For Writer Identification
4 pages
Review: Some Papers Have Investigated The Integration of Blockchain With Iot. For Example, The Ibm
No ratings yet
Review: Some Papers Have Investigated The Integration of Blockchain With Iot. For Example, The Ibm
2 pages
Character Recognition
No ratings yet
Character Recognition
5 pages
PROJECT_REPORT
No ratings yet
PROJECT_REPORT
24 pages
DOC-20231203-WA0038.
No ratings yet
DOC-20231203-WA0038.
5 pages
8 Implementation of IOT Based Attendance Management System On Raspberry Pi
No ratings yet
8 Implementation of IOT Based Attendance Management System On Raspberry Pi
4 pages
Offline Handwritten Kannada Numerals Recognition: Sushritha S N Lohitesh Kumar
No ratings yet
Offline Handwritten Kannada Numerals Recognition: Sushritha S N Lohitesh Kumar
4 pages
Bayesian Decision Theory Based Handwritten Character Recognition
No ratings yet
Bayesian Decision Theory Based Handwritten Character Recognition
8 pages
3586a977
No ratings yet
3586a977
6 pages
Sussman Anomaly: Fundamentals and Applications
From Everand
Sussman Anomaly: Fundamentals and Applications
Fouad Sabry
No ratings yet
MaCoSEP Example
No ratings yet
MaCoSEP Example
4 pages
a-review-on-handwritten-character-recognition-using-advanced-3kism6wv
No ratings yet
a-review-on-handwritten-character-recognition-using-advanced-3kism6wv
7 pages
Srs 1
No ratings yet
Srs 1
68 pages
Character Extraction Algorithm For Handwritten Character Recognition Systems
No ratings yet
Character Extraction Algorithm For Handwritten Character Recognition Systems
8 pages
View Synthesis: Exploring Perspectives in Computer Vision
From Everand
View Synthesis: Exploring Perspectives in Computer Vision
Fouad Sabry
No ratings yet
Building Support Structures, 2nd Ed., Analysis and Design with SAP2000 Software
From Everand
Building Support Structures, 2nd Ed., Analysis and Design with SAP2000 Software
Wolfgang Schueller
4.5/5 (15)
C Data Structures and Algorithms: Implementing Efficient ADTs
From Everand
C Data Structures and Algorithms: Implementing Efficient ADTs
Larry Jones
No ratings yet
Review Related Literature Foreign Literature
No ratings yet
Review Related Literature Foreign Literature
13 pages
Utilizing OCR To Retrieve Text From Identity Documents
No ratings yet
Utilizing OCR To Retrieve Text From Identity Documents
17 pages
IMPLEMENTATION OF CLASSROOM ATTENDANCE SYSTEM
No ratings yet
IMPLEMENTATION OF CLASSROOM ATTENDANCE SYSTEM
6 pages
Preprocessing Low Quality Handwritten Documents For OCR Models
No ratings yet
Preprocessing Low Quality Handwritten Documents For OCR Models
8 pages
Vamsikrishna 2019 IOP Conf. Ser. Mater. Sci. Eng. 590 012049
No ratings yet
Vamsikrishna 2019 IOP Conf. Ser. Mater. Sci. Eng. 590 012049
7 pages
Form Based Document Understanding Using Sequential Model
No ratings yet
Form Based Document Understanding Using Sequential Model
10 pages
Form Based Document Understanding Using Sequential Model
No ratings yet
Form Based Document Understanding Using Sequential Model
10 pages
Visualization and Interpretation: Humanistic Approaches to Display
From Everand
Visualization and Interpretation: Humanistic Approaches to Display
Johanna Drucker
No ratings yet
Importance of Similarity Measures in Effective Web Information Retrieval
No ratings yet
Importance of Similarity Measures in Effective Web Information Retrieval
5 pages
A Review of 2D &3D Image Steganography Techniques
No ratings yet
A Review of 2D &3D Image Steganography Techniques
5 pages
A Review of Wearable Antenna For Body Area Network Application
No ratings yet
A Review of Wearable Antenna For Body Area Network Application
4 pages
Channel Estimation Techniques Over MIMO-OFDM System
No ratings yet
Channel Estimation Techniques Over MIMO-OFDM System
4 pages
A Review of 2D &3D Image Steganography Techniques
No ratings yet
A Review of 2D &3D Image Steganography Techniques
5 pages
IJRITCC Call For Papers (October 2016 Issue) Citation in Google Scholar Impact Factor 5.837 DOI (CrossRef USA) For Each Paper, IC Value 5.075
No ratings yet
IJRITCC Call For Papers (October 2016 Issue) Citation in Google Scholar Impact Factor 5.837 DOI (CrossRef USA) For Each Paper, IC Value 5.075
3 pages
Channel Estimation Techniques Over MIMO-OFDM System
No ratings yet
Channel Estimation Techniques Over MIMO-OFDM System
4 pages
A Study of Focused Web Crawling Techniques
No ratings yet
A Study of Focused Web Crawling Techniques
4 pages
A Review of Wearable Antenna For Body Area Network Application
No ratings yet
A Review of Wearable Antenna For Body Area Network Application
4 pages
Predictive Analysis For Diabetes Using Tableau: Dhanamma Jagli Siddhanth Kotian
No ratings yet
Predictive Analysis For Diabetes Using Tableau: Dhanamma Jagli Siddhanth Kotian
3 pages
Prediction of Crop Yield Using LS-SVM
No ratings yet
Prediction of Crop Yield Using LS-SVM
3 pages
Diagnosis and Prognosis of Breast Cancer Using Multi Classification Algorithm
No ratings yet
Diagnosis and Prognosis of Breast Cancer Using Multi Classification Algorithm
5 pages
Itimer: Count On Your Time
No ratings yet
Itimer: Count On Your Time
4 pages
44 1530697679 - 04-07-2018 PDF
No ratings yet
44 1530697679 - 04-07-2018 PDF
3 pages
45 1530697786 - 04-07-2018 PDF
No ratings yet
45 1530697786 - 04-07-2018 PDF
5 pages
Hybrid Algorithm For Enhanced Watermark Security With Robust Detection
No ratings yet
Hybrid Algorithm For Enhanced Watermark Security With Robust Detection
5 pages
Image Restoration Techniques Using Fusion To Remove Motion Blur
No ratings yet
Image Restoration Techniques Using Fusion To Remove Motion Blur
5 pages
BUSINESS DIARY - An Interactive and Intelligent Platform For SME's
No ratings yet
BUSINESS DIARY - An Interactive and Intelligent Platform For SME's
3 pages
Lift Control System Based On PLC
No ratings yet
Lift Control System Based On PLC
3 pages
Safeguarding Data Privacy by Placing Multi-Level Access Restrictions
No ratings yet
Safeguarding Data Privacy by Placing Multi-Level Access Restrictions
3 pages
41 1530347319 - 30-06-2018 PDF
No ratings yet
41 1530347319 - 30-06-2018 PDF
9 pages
49 1530872658 - 06-07-2018 PDF
No ratings yet
49 1530872658 - 06-07-2018 PDF
6 pages
Motif and Conglomeration of Software Process Improvement Model
No ratings yet
Motif and Conglomeration of Software Process Improvement Model
3 pages
An Approach For Power Control in Vehicular Adhoc Network For Catastrophe Message
No ratings yet
An Approach For Power Control in Vehicular Adhoc Network For Catastrophe Message
7 pages
Paper On Design and Analysis of Wheel Set Assembly & Disassembly Hydraulic Press Machine
No ratings yet
Paper On Design and Analysis of Wheel Set Assembly & Disassembly Hydraulic Press Machine
4 pages
Intoduction To Computing
No ratings yet
Intoduction To Computing
292 pages
Java Performance Tuning Ver 1
No ratings yet
Java Performance Tuning Ver 1
72 pages
Canon I860, I865 SM - Printer1.Blogspot
No ratings yet
Canon I860, I865 SM - Printer1.Blogspot
35 pages
Robot Kits Manual - 006
No ratings yet
Robot Kits Manual - 006
128 pages
Turbine Alignment: Straightness Measurement of Diaphragms and Bearing Journals
No ratings yet
Turbine Alignment: Straightness Measurement of Diaphragms and Bearing Journals
8 pages
Compliance Dashboard v0.6
No ratings yet
Compliance Dashboard v0.6
449 pages
Top 5 Open Source Linux Firewalls
No ratings yet
Top 5 Open Source Linux Firewalls
5 pages
Topic One and Two Notes Comp 100
No ratings yet
Topic One and Two Notes Comp 100
56 pages
Datasheet PSH-PSHsmall Rev1.0
No ratings yet
Datasheet PSH-PSHsmall Rev1.0
1 page
Laptop Tender Nov 2024 Final
No ratings yet
Laptop Tender Nov 2024 Final
9 pages
Tool Manufacturin 1
No ratings yet
Tool Manufacturin 1
2 pages
PG5 User Manual en
No ratings yet
PG5 User Manual en
331 pages
Importance of RAM in A Computer
No ratings yet
Importance of RAM in A Computer
7 pages
2009MCS Chapter 4
No ratings yet
2009MCS Chapter 4
33 pages
STM32G070CB/KB/RB: Arm Cortex - M0+ 32-Bit MCU, 128 KB Flash, 36 KB RAM, 4x USART, Timers, ADC, Comm. I/Fs, 2.0-3.6V
No ratings yet
STM32G070CB/KB/RB: Arm Cortex - M0+ 32-Bit MCU, 128 KB Flash, 36 KB RAM, 4x USART, Timers, ADC, Comm. I/Fs, 2.0-3.6V
93 pages
Embedded System Components PDF
100% (1)
Embedded System Components PDF
208 pages
Manual de Programacao Serie H
No ratings yet
Manual de Programacao Serie H
260 pages
Mimic Diagram: Spider
No ratings yet
Mimic Diagram: Spider
2 pages
Abstract - Automatic Traffic and Street Light Controller
No ratings yet
Abstract - Automatic Traffic and Street Light Controller
3 pages
2009017EN 0 DC01 EN User-manual-NFC-480S
No ratings yet
2009017EN 0 DC01 EN User-manual-NFC-480S
287 pages
CLASS 7 - WH- QUESTIONS
No ratings yet
CLASS 7 - WH- QUESTIONS
33 pages
Beyond The Syllabus of Embedded System
No ratings yet
Beyond The Syllabus of Embedded System
3 pages
CS2002
No ratings yet
CS2002
2 pages
The Dallas Post 05-08-2011
No ratings yet
The Dallas Post 05-08-2011
18 pages
HP 15-Ay132ng Compal CDL50 LA-D707P r1.0
No ratings yet
HP 15-Ay132ng Compal CDL50 LA-D707P r1.0
58 pages
Computer Fundamentals Assignment
No ratings yet
Computer Fundamentals Assignment
5 pages
Chapter 5. Negative Feedback An Intuitive Approach (Analog IC Design An Intuitive Approach)
No ratings yet
Chapter 5. Negative Feedback An Intuitive Approach (Analog IC Design An Intuitive Approach)
39 pages
Chicago Electric Double Cut Saw 68316 Owner's Manual & Safety Instructions (Page 19 of 20)
No ratings yet
Chicago Electric Double Cut Saw 68316 Owner's Manual & Safety Instructions (Page 19 of 20)
20 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Extraction From Hand-Filled Form Using Form Template

Uploaded by

Data Extraction From Hand-Filled Form Using Form Template

Uploaded by

International Journal on Recent and Innovation Trends in Computing and Communication

Data Extraction from Hand-filled Form using Form Template

Dharam Veer Sharma

Department of Computer Science

Department of Computer Science

In daily routine, forms are used to get data from users.

A general system for extraction and cleaning of data from

styles and also discussed problems related to it, such as filled

IJRITCC | August 2015, Available @ http://www.ijritcc.org

International Journal on Recent and Innovation Trends in Computing and Communication

overlapping modes. Moreover, it is more difficult to detect

inadvertently remove the headline from the text and produce

PROPOSED METHOD OF DATA EXTRACTION

A sample of designed form

used. Each form consisted of 22 fields (figure 1.1). The

International Journal on Recent and Innovation Trends in Computing and Communication

A sample of filled form

Table 1-1: Table of data types used in the system

In Figure 1.3, starting and ending reference points (refereed

International Journal on Recent and Innovation Trends in Computing and Communication

Figure 1-3: Template of a sample form with field types marked on it

A module has been developed, to detect the fields on the

of the field, dictionary to be used for post-processing in case

Positions of the fields on the form are calculated and user is

Figure 1-4: Screen short of module while genrating form defination

per inches are the key factors while digitization of paper

IJRITCC | August 2015, Available @ http://www.ijritcc.org

International Journal on Recent and Innovation Trends in Computing and Communication

computed after experimentation) which is without any

size on the form. The size of the actual bounding field

Figure 1-5: Examples of overlapping on all four sides

With the physical location of field of template form, we

A colored form is designed which contain 22 fields as

International Journal on Recent and Innovation Trends in Computing and Communication

Figure 1-6: Form after color drop

Then by using template matching method data is

Students First Name

Table 1-2: Results of data extraction from hand-filled form

This paper proposes a method for data extraction from the

Mothers Last Name

X. Ye, M. Cheriet, C. Y. Suen, A Generic System to Extract

International Journal on Recent and Innovation Trends in Computing and Communication

W. Liu, D. Dori, From Raster to Vectors: Extracting Visual

[10] S. Pan, Research and Realization of a General Form

Recognition System, Master Thesis of Tsinghua University,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.