Data Extraction From Hand-Filled Form Using Form Template
Data Extraction From Hand-Filled Form Using Form Template
ISSN: 2321-8169
Volume: 3 Issue: 8
5311 - 5317
__________________________________________________________________________________________________________________________________
Abstract- Database is very vital for taking the day to day decision and in the long run it helps in formulation of policies, strategies
of an organization. Numerous efforts, time and money are spent to get, store and process the data. To get the data from a user, an
interface is designed which is known as form. The forms may vary from paper based to online. Manually processing paper based
form is prone to errors. Therefore, it will be useful to deploy automated systems for reading data from paper based forms and
storing it in the database. Further, this data can be modified, processed and analyzed. In this paper, we have proposed a method to
extract data from hand-filled pre-designed form based on form templates.
Keywords-Data Extraction, Hand-filled form, Form Tempalte, Color Drop
__________________________________________________*****_________________________________________________
I.
INTRODUCTION
REVIEW OF LITERATURE
________________________________________________________________________________________________________
Figure 1-1:
A. Form Designing
Before designing forms, determines which types of data and
how many various parts of data required. Then the sequence
from one field to the next field, for example, in personal
information first field should be name and the second field
must be fathers name. That data must be captured in the fields
of the form in the logical arrangement. These fields must have
their captions on a form which helps in entering data in the
appropriate fields. Two blocks of square shape are used for
reference points for marking the starting and ending of form.
For the starting reference point, on the left-top corner square is
placed and for ending reference point on the right bottom
corner square is placed as shown in figure 1. Once these two
reference points are identified, then the relative distances of
other fields from the starting reference point are used in
measuring the absolute coordinates of the fields on the forms.
B. Collection of data
The sample data collection form collects data about
student and covers different types of data to be supported by
the system. A total of 235 forms of the same type have been
5312
IJRITCC | August 2015, Available @ http://www.ijritcc.org
________________________________________________________________________________________________________
Figure 1-2:
C. Template Generation
Generating a form definition is backbone of the present
form processing systems. The form contains the field
captions which are printed on the form and data areas where
the user writes the data in their handwriting. From a blank
form a template is created which is used to separate the preprinted matters i.e. captions and data areas. This template
helps to remove the form frame from the actual forms to get
the required printed data, such as barcode and handwritten
data used for recognition.
For customized form processing a new system has to be
developed. The paper presents a method to extract data from
hand-filled pre-designed form (Figure1.1) based on the form
template.
The recognizable fields of the form are highlighted to
generate a form template as shown in figure 1.3. A
definition of the form is generated by using the form
template. It contains the information about starting and
ending reference points for skew assessment, number of
fields, fields data types their sequences and locations, for
validation of numeric field (where applicable) its domain
registration, for post processing contextual dictionaries
applicable on text fields. The data types, considered for the
system, are given in Table 1.1 and have been marked
accordingly in the Figure 1.3.
Data Type
Name
Numeric
2.
Alpha
3.
Alphanum
4.
Date
5.
Choice
6.
Picture
Purpose
For storing digits only as in case of age, pin
code, phone number etc.
For storing data consisting of alphabets only
like parts of name, city, state etc.
For storing data consisting of alphabets and
numbers like house number, class etc.
For storing date type data e.g. date of birth, date
of joining, date of purchase etc.
It is used where multiple options are available
as in case of gender which can be (1) male or
(2) female.
For sub-images in the form document, like
photographs, signatures, barcodes etc.
5313
IJRITCC | August 2015, Available @ http://www.ijritcc.org
________________________________________________________________________________________________________
D. Digitization
Digitization means scanning the original paper based
form document and storing it as a digital image. Brightness,
contrast and scanning resolution measured in terms of dots
________________________________________________________________________________________________________
Results
5315
IJRITCC | August 2015, Available @ http://www.ijritcc.org
________________________________________________________________________________________________________
Field name
Date of Birth
Date
Month
Field name
Roll No
Class
Extracted Image
Year
Department
Students
Middle
Name
Students Last Name
Fathers First Name
Fathers
Middle
Name
Fathers Last Name
Mothers First Name
Extracted Image
V.
Conclusion
Mothers
Name
Middle
5316
IJRITCC | August 2015, Available @ http://www.ijritcc.org
________________________________________________________________________________________________________
[6]
[7]
[8]
[9]
5317
IJRITCC | August 2015, Available @ http://www.ijritcc.org
________________________________________________________________________________________________________