|
Introduction:
Some CAD programs have the ability
to snap a line to a grid. In a
similar way, Pac-n-Zoom® has the
ability to "snap" things to a
previously defined shape.
Pac-n-Zoom snaps similar shapes to
exactly the same shapes to reduce
the amount of noise in the image.
For example, if a page full of
identical 'e's on a word processor
are printed out and scanned back
in, they will probably all be
different. Among the many 'e's
that were scanned back in, there
would probably not be an 'e' that
was exactly correct, but they
would all be within an exact
tolerance of being perfect
(assuming that the printer, paper,
and scanner were within their
specified tolerance).
The golden files are a set of
perfect shapes. When an imperfect
shape is within an acceptable
tolerance of a perfect shape,
Pac-n-Zoom will snap away the
imperfection (which is due to
noise) to leave a perfect shape.
There are three primary benefits
of exact conformation.
|
|
1. |
Visual Appeal: We have all
struggled with fuzzy FAXes
and blurred images. It's
hard to argue against a
sharper image.
|
|
2. |
Better Compression: In a
number of compression
algorithms, any repeating
shape can be tagged, but as
shown in the example of the
'e's (given above), the
noise prevents initially
identical shapes from
repeating. By storing or
transmitting only perfect
shapes, compression can be
much higher. The exact
amount depends on the amount
of tolerance allowed in
noise sources (e.g. printer,
paper, camera, camcorder, or
scanner).
|
|
|
|
3. |
Mathematical Accuracy: The
whole point of the blob
compressor is to group
shapes together to achieve
mathematical consistency,
and with the golden shapes,
we can deliver accurate
mathematical shapes to the
data tagger which is an
important step towards
machines achieving
human-like cognizance.
With accurate mathematical
shapes being provided to the
the data tagger and third
party support being provided
to the
glider convolver,
Pac-n-Zoom 2006 can become a
graphical rosetta stone that
converts pictures from one
program into fully
compatible files of entirely
different application.
|
In the golden database, the golden
file shapes are divided into two
groups, text and graphics. The
graphics can be named, and the
text is categorized by the
following three fields.
|
|
1. |
Font: The golden files will
come with a few fonts, but
we expect to add more all
the time. By following the
correct procedure, the end
user can add fonts as well.
|
|
2. |
Pitch: The pitch is held as
the maximum height and width
of the characters across the
entire character set.
|
|
3. |
Attributes: The person
loading the golden files can
specify as many attributes
as desired.
|
As shown in the diagram, if the
text from the image can be matched
to text found in a golden file, we
can create a formatted page of
text from a scanned image when
additional tools are used.
|
|
|
|
Raster File Conversion:
Pac-n-Zoom® can not currently read
TIFF files. FAXes
are almost always TIFF G3, and scanned images
are typically TIFF G4, TIFF LZW,
or JPEG. Of
these formats, Pac-n-Zoom can only read JPEG. The
TIFF formats are owned by Adobe,
but Adobe lets the software and
associated tools be used without a
license. This environment has
allowed companies to convert from
TIFF to bitmap. A short Internet
search turned up the following
programs which is probably a small
fraction of the possiblities.
1. |
BMP Smartz from Smart Image Converter
|
2. |
2Bitmap from fCoder Group
|
3. |
Advanced Batch Converter from Gold-Software Development
|
4. |
Able Fax Tif View from Graphic Region |
|
Segmentation: If
the page is black and white
(eg. black text on a white
background),
threshold
segmentation should be used.
Threshold segmentation is a
very simple (read fast)
segmentation. If the pixel is
lighter or darker than gray, it
is set to white or black
repsectively. If there are any
colors or shades of gray, full
color
segmentation should be
used.
|
Golden Database: The
external database is optional,
and it is not supported at the
present time . A number of
golden files can be loaded by
Pac-n-Zoom when Pac-n-Zoom
launches, if the document has a
relatively homogeneous
font set
(such as a legal document),
these files can be enough. All
of these files are loaded into
system memory, however, even
when they are not needed by the
current document that is being
processed.
Since all of these fonts are
held in
raster, a single font
type (such as courier), with
various
pitches and
attributes,
might require 500 files (or the
equivalent amount of memory
requirements grouped into fewer
files). Then, a 100 different
fonts requires 50,000 files. If
each file requires 50 KBytes of
system memory, 2.5 GBytes of
system memory are needed. This
number would be reduced to
about 900 MBytes because some
commonality will be found
between the fonts. An external
database reduces the amount of
system memory that is
required.
By manipulating the flags and
attributes in the configuration
file, the database can add to
the ability of harvesting data
from a graphic file.
|
Raster Files: By storing
the files in a *.html wrapper,
we can accomplish the following
things.
1. |
Formatted: Pac-n-Zoom
will retain the original
format within the acceptable
tolerance.
|
2. |
Compressed: The
*.html file will have all
the recognized text and the
original *.pz file. A small
penalty from the text and
overhead of the *.html is
therefore paid.
|
3. |
Accessible: With
the *.html format, the
document can be placed on
the World Wide Web (WWW - or
Internet). The document can
be viewed with a standard
Internet browser, if the
browser has the Pac-n-Zoom
plug-in which is free.
|
4. |
Secure: The
normal Internet security
procedures can be used.
|
5. |
Searchable: When
the document is on the
Internet, the viewer's
favorite Internet search
engine can be used. If the
file is inside an intranet,
the provider's search
engine(s) can be used.
|
|
|
|
*.pzh Type 3: When the
blob compressor snaps an image
cluster to a golden cluster,
the image cluster inherits the
golden cluster
attributes.
Besides the golden attributes,
the *.pzh file always provides
the following information.
1. |
Size: The maximum
pixel height and width of
the cluster are given.
|
2. |
Location: The
row and column of the
initial (highest then most
left) pixel of the cluster
are given.
|
The author of the golden file
decides how many golden cluster
attributes are included, but
the following are some of the
more common ones.
1. |
Text: The
letter, number, or other
text character is specified.
|
2. |
Font: The
style of the character
(e.g., Times Roman or
Courier) is specified.
|
3. |
Pitch: The
size of the letter (e.g.,
10, 12, etc.) is specified.
Since the size of the
cluster is given in pixels,
the pitch can be estimated
from the page size and the
number of dots (both height
and width) used in the
output device. If
re-sizeable fonts are used,
the application file can
resize the image in a
limited way. Without
raster
to
vector conversion, the
graphics won't re-size in a
graceful way.
|
4. |
Attribute: The
emphasis of the letter
(e.g., bold, italics,
underlined) is specified. If
the builder of the golden
file is thorough, the
combinations of attributes
(e.g. bold and italics) are
included.
|
The type 3 *.pzh file contains
all the graphics needed to
reconstruct the image, and in
addition, it contains
attributes about clusters that
were snapped to a golden
cluster.
|
Conversion Program: While
Accelerated I/O has no
intention of writing this
program, this conversion
program does have some utility
across the corporate world as
the following uses show.
1. |
Editing: A
FAXed contract could be
edited and FAXed back.
|
2. |
Checking: The
numbers on the page need to
be audited. This program
would load the numbers from
the page into a spreadsheet
which could be a real time
saver for some professions.
|
3. |
Conformity: In
a world where the customer
is always right, input from
the customer can come in
several different ways. If
the information is on a
paper form, the program can
gather the different fields
from the form to create a
database consistency with
information gathered from
some electronic means (such
as a web site).
|
|
Application Files: There
are a number of different
application files (such as
spreadsheets, databases, and
word processors) that would
benefit from being loaded with
a FAX or scanned image. By
cleaning most of the noise out
of an image, Pac-n-Zoom can
help get the scanned
information into useable
computer data.
|
|
|