Discussion:
Hand-Writing Recognition Software
(too old to reply)
Java Jive
2019-10-05 15:59:57 UTC
Permalink
Cross-posting to one Linux and one Windows group, because I'm happy to
run a workable solution on either OS.

As per the post's title, I have a need for some software, preferably
free or at least low cost, to run on either OS, which can recognise
handwriting. The context is that I've inherited a trunkful of old
Macfarlane documents, nearly all hand-written, which I'm scanning my way
through, having completed probably three-quarters or more of it.

The documents consists of:

1) Some old deeds, wills, etc written on parchment (stretched animal
skin). These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly over
all the joins. This is the stage I'm currently at, and it's taking a
long time because the business of joining the images together is very,
very fiddly. It takes an entire evening to create a single image of
such a document, then, currently, after each is done, I convert it into
text manually, as that's the best way to ensure that the resulting
complete image is fully readable, and even this latter can take another
evening just for one document.

2) Many family letters

3) A few pages of accounts

4) Family trees

5) Log books of sea-voyages, diaries of holidays, hand-written books
containing historical research notes, etc.

6) Many loose pages of notes concerning family history, clan history,
and Scottish history, nearly all of which are on foolscap pages, which
again had to be scanned in two sections to capture the top and bottom of
the page and then joined together to form one image.

Some of the notes were in pencil, which required me altering the
exposure settings. Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.

As I'm currently nearing the end of the scanning phase, next will be
sorting the documents into boxes and the files into sub-directories -
of course they are already, because I was trying to make sense of it all
as I worked through it, but only after I know entirely what is there
will I be able to finalise how best it should be divided up into
meaningful chunks of information. After that will come the probably
much longer phase of converting as much of it as possible into
searchable text. It's this last stage I really would like to shorten as
much as possible, because I'd like to have a chance of finishing it
before I die !-)

So, does anyone here have any experience of software to recognise
hand-writing in an image file (*.png) and produce a text file from it?
Cybe R. Wizard
2019-10-05 18:49:36 UTC
Permalink
On Sat, 5 Oct 2019 16:59:57 +0100
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly
over all the joins. This is the stage I'm currently at, and it's
taking a long time because the business of joining the images
together is very, very fiddly. It takes an entire evening to create
a single image of such a document, then, currently, after each is
done, I convert it into text manually, as that's the best way to
ensure that the resulting complete image is fully readable, and even
this latter can take another evening just for one document.
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.

Sample: Emblend (from the description in Synaptic (Debian Linux Sid))
Enblend is a tool for compositing images. Given a set of images that
overlap in some irregular way, Enblend overlays them in such a way that
the seam between the images is invisible, or at least very difficult to
see. It can, for example, be used to blend a panorama composed of
several images.
--
Cybe R. Wizard

My other computer is a HOLMES IV with the Mycroft OS
My other car is a Chandler MetalSmith Mark III
Wildman
2019-10-06 06:14:24 UTC
Permalink
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly
over all the joins. This is the stage I'm currently at, and it's
taking a long time because the business of joining the images
together is very, very fiddly. It takes an entire evening to create
a single image of such a document, then, currently, after each is
done, I convert it into text manually, as that's the best way to
ensure that the resulting complete image is fully readable, and even
this latter can take another evening just for one document.
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.
Sample: Emblend (from the description in Synaptic (Debian Linux Sid))
Enblend is a tool for compositing images. Given a set of images that
overlap in some irregular way, Enblend overlays them in such a way that
the seam between the images is invisible, or at least very difficult to
see. It can, for example, be used to blend a panorama composed of
several images.
I have used it and it works pretty well.
--
<Wildman> GNU/Linux user #557453
The cow died so I don't need your bull!
Java Jive
2019-10-06 12:43:23 UTC
Permalink
Post by Cybe R. Wizard
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.
Sample: Emblend (from the description in Synaptic (Debian Linux Sid))
Enblend is a tool for compositing images. Given a set of images that
overlap in some irregular way, Enblend overlays them in such a way that
the seam between the images is invisible, or at least very difficult to
see. It can, for example, be used to blend a panorama composed of
several images.
Yes, my Canon S40 came with stitching software, but it doesn't recognise
Portable Network Graphics format (*.png), which is a losslessly
compressed bitmap, and I don't want to use JPEGs because they're lossy.
I may well try some of the other suggestions made here tonight though.
I could probably install Emblend on another laptop which can dual-boot
into Linux or W7, and see if I can stitch the images over the network,
and I already have Irfanview installed on all the Windows PCs, but have
yet to get entirely to grips with it.
Paul
2019-10-06 17:41:25 UTC
Permalink
Post by Java Jive
Post by Cybe R. Wizard
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.
Sample: Emblend (from the description in Synaptic (Debian Linux Sid))
Enblend is a tool for compositing images. Given a set of images that
overlap in some irregular way, Enblend overlays them in such a way that
the seam between the images is invisible, or at least very difficult to
see. It can, for example, be used to blend a panorama composed of
several images.
Yes, my Canon S40 came with stitching software, but it doesn't recognise
Portable Network Graphics format (*.png), which is a losslessly
compressed bitmap, and I don't want to use JPEGs because they're lossy.
I may well try some of the other suggestions made here tonight though. I
could probably install Emblend on another laptop which can dual-boot
into Linux or W7, and see if I can stitch the images over the network,
and I already have Irfanview installed on all the Windows PCs, but have
yet to get entirely to grips with it.
You can try out Microsoft ICE. If the result is big enough,
you get to output in "Photoshop Big" format (to handle pixel
counts too high on W x H). I tried this out several years
ago, with one output over 4 billion pixels. (You view it
in Irfanview... or, the computer runs out of RAM :-) )

https://www.microsoft.com/en-us/research/product/computational-photography-applications/image-composite-editor/

Needs scan overlap to work.

Whereas recognition of handwriting, that's a tougher problem
to handle. The claims I can find are, they can't handle
free-form cursive. Only well-controlled situations are
anywhere near to recognition (hand lettering on gridded paper).
And the slightest bit of "pixel noise" in the image,
completely destroys the neural network method. The NN
can pick letters out of pure noise, that humans can't
"see", making it a bit of a joke. This means if you
did have candidate software to test, you'll need to
"threshold" your scans meticulously and remove all
noise around the cursive. Regular OCR benefits from
this too.

Paul
NY
2019-10-07 18:59:35 UTC
Permalink
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly
over all the joins. This is the stage I'm currently at, and it's
taking a long time because the business of joining the images
together is very, very fiddly. It takes an entire evening to create
a single image of such a document, then, currently, after each is
done, I convert it into text manually, as that's the best way to
ensure that the resulting complete image is fully readable, and even
this latter can take another evening just for one document.
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.
I've used PanaVue V2.10 (dates back to 2002 but it's still good) for
stitching "tiled" scans together - eg of posters (A3 and larger) or
panoramic school photos. It copes well with scans which are not all exactly
parallel - eg if one part has rotated a few degrees. If the original is
small enough that you can always butt one edge or the other against the side
of the scanner glass, then you can get all the scans parallel without
rotation, but a larger original will require at least one row of scans where
there is no edge to align with the side of the scanner, so that will have
all manner of rotations and the software needs to be able cope with it.
Always scan at the highest resolution that you can, even if you then reduce
the resolution of the final assembled image.

One thing: see if your scanner has a way of turning off auto exposure so all
the scans are guaranteed to be at the same exposure. This makes it easier to
match different tiles without such obvious joins.

It can also do the more difficult job of warping photos for a true panorama,
though I think Photoshop Elements (eg V11) does a better job with fewer
mismatches.


Have you also though about using a conventional camera to expose the whole
original in one go? You need to hold the document flat to avoid the creases
showing and you need very even lighting. You may also find that the camera
doesn't have enough resolution to reproduce the text properly (eg an A3
original with small handwriting may require more than just a couple of
thousand pixels on each axis that a camera can achieve. But it's worth a
try.
Java Jive
2019-10-08 18:41:29 UTC
Permalink
Post by NY
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly
over all the joins.  This is the stage I'm currently at, and it's
taking a long time because the business of joining the images
together is very, very fiddly.  It takes an entire evening to create
a single image of such a document, then, currently, after each is
done, I convert it into text manually, as that's the best way to
ensure that the resulting complete image is fully readable, and even
this latter can take another evening just for one document.
For this part you might try some panorama-making software.  I have used
several and all do the job, some better, some faster.  It might well be
worth your time to  look into.
I've used PanaVue V2.10 (dates back to 2002 but it's still good) for
stitching "tiled" scans together - eg of posters (A3 and larger) or
panoramic school photos.
Downloaded a trial version, but it was spectacularly bad at this:
www.macfh.co.uk/Temp/PanaVueIA-Auto.jpg
www.macfh.co.uk/Temp/PanaVueIA-Manual.jpg
NY
2019-10-08 19:13:10 UTC
Permalink
Post by Java Jive
Post by NY
I've used PanaVue V2.10 (dates back to 2002 but it's still good) for
stitching "tiled" scans together - eg of posters (A3 and larger) or
panoramic school photos.
www.macfh.co.uk/Temp/PanaVueIA-Auto.jpg
www.macfh.co.uk/Temp/PanaVueIA-Manual.jpg
I can understand the auto stitching having problems, but usually manual
stitching makes a pretty good job of it, as long as the corresponding marks
(the cross, triangle, circle etc) are placed on the same point on two
overlapping images. Sorry to send you up a blind alley. I'm sure some of the
more modern packages make a better job if it automatically. Photoshop
Elements does remarkably well, with no means of giving it any manual help,
though I've never tried it for flat originals where it only needs to handle
sideways or up/down translations, as opposed to warping the image for
rotating-camera panoramas.
Java Jive
2019-10-10 15:21:24 UTC
Permalink
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly
over all the joins. This is the stage I'm currently at, and it's
taking a long time because the business of joining the images
together is very, very fiddly. It takes an entire evening to create
a single image of such a document, then, currently, after each is
done, I convert it into text manually, as that's the best way to
ensure that the resulting complete image is fully readable, and even
this latter can take another evening just for one document.
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.
Sample: Emblend (from the description in Synaptic (Debian Linux Sid))
Enblend is a tool for compositing images. Given a set of images that
overlap in some irregular way, Enblend overlays them in such a way that
the seam between the images is invisible, or at least very difficult to
see. It can, for example, be used to blend a panorama composed of
several images.
For the last two days, besides trying to continue the work as previously
described, I've been spending some time trying out Linux-based options,
and have some questions.

I've tried scanning with xsane, but, although it does at least use the
full scanning glass area, I hate it - it's another of those programs
that makes a simple job complex:

:-( It plasters the screen with multiple windows, which is a GUI
paradigm I absolutely hate. There is bewildering clutter all over the
screen, and multiple icons for the same program fill up the taskbar and
you've no idea which is which - there is one for the program, one for
each of various controls, and one for every scan that you do, which, if
you don't want to keep it, needs to be closed before trying another scan
in order to avoid confusion.

:-( There is no quick way, for example a keystroke, to differentiate
between doing a draft scan and a quality scan.

:-( There doesn't seem to be any simple way of telling it to do a
greyscale scan, certainly I've not found out how to.

So I guess I'll stick with the HP Windows scanning software for now.

Enblend, not Emblend, seems to be a command-line tool, but its help
output doesn't suggest how to use for merging a grid of scans as
described in my previous posts here. If anyone can suggest some
suitable command-lines with an explanation of what they mean, that might
be very helpful.

Hugin Panorama Creator seems to be a desperately complicated means of
accomplishing something essentially simple in concept, even if the
details are more complex. Other stitching programs ...
1) Ask for a bunch of files
2) Take in some layout details
(rows x columns or drag thumbnails around)
3) Optionally enter manual control points
4) Stitch
... but in Hugin I have no idea how to perform step 2 manually, and why
do I need a 'lens', these are scans, not photographs? The result below,
though in many ways very good, certainly by far the best software effort
yet made, and certainly a lot quicker than doing it manually, has
pincushion distortion presumably because I have incorrect 'lens'
settings. Again, any further explanation from someone knowledgeable
would be helpful.

www.macfh.co.uk/Temp/Hugin.png (21MB)
www.macfh.co.uk/Temp/Manual.png (12MB)
William Unruh
2019-10-10 19:04:00 UTC
Permalink
Post by Java Jive
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly
over all the joins. This is the stage I'm currently at, and it's
taking a long time because the business of joining the images
together is very, very fiddly. It takes an entire evening to create
a single image of such a document, then, currently, after each is
done, I convert it into text manually, as that's the best way to
ensure that the resulting complete image is fully readable, and even
this latter can take another evening just for one document.
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.
Sample: Emblend (from the description in Synaptic (Debian Linux Sid))
Enblend is a tool for compositing images. Given a set of images that
overlap in some irregular way, Enblend overlays them in such a way that
the seam between the images is invisible, or at least very difficult to
see. It can, for example, be used to blend a panorama composed of
several images.
For the last two days, besides trying to continue the work as previously
described, I've been spending some time trying out Linux-based options,
and have some questions.
I've tried scanning with xsane, but, although it does at least use the
full scanning glass area, I hate it - it's another of those programs
So tell it not to open all those options.

If you close say the preview window before closing the xsane, then the
preview window will not come up next time. Similarly with the other
windows.

Also if you do not want to keep a scan, close the window. You seem to
getting annoyed for the sake of getting annoyed. Do you want to
accomplish the job or do you want to complain!
If you keep it open then you WILL clutter your worspace. The only
windows I open are the main scanning window and the preview window since
both are usually useful. ( also because there is a bug in xsane, the
preview window will not open properly if it is not opened when xsane
starts. If you have that problem, open preview, close xsane, and then
reopen it. Now the preview window will behave itself.)
Post by Java Jive
:-( It plasters the screen with multiple windows, which is a GUI
paradigm I absolutely hate. There is bewildering clutter all over the
screen, and multiple icons for the same program fill up the taskbar and
you've no idea which is which - there is one for the program, one for
each of various controls, and one for every scan that you do, which, if
you don't want to keep it, needs to be closed before trying another scan
in order to avoid confusion.
:-( There is no quick way, for example a keystroke, to differentiate
between doing a draft scan and a quality scan.
What are you talking about. They look entirely different.
Post by Java Jive
:-( There doesn't seem to be any simple way of telling it to do a
`> greyscale scan, certainly I've not found out how to.


jpush the "Greyscale" or Colour button and choose from the options.
Post by Java Jive
So I guess I'll stick with the HP Windows scanning software for now.
Go ahead.
Java Jive
2019-10-10 20:44:43 UTC
Permalink
Post by William Unruh
Post by Java Jive
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly
over all the joins. This is the stage I'm currently at, and it's
taking a long time because the business of joining the images
together is very, very fiddly. It takes an entire evening to create
a single image of such a document, then, currently, after each is
done, I convert it into text manually, as that's the best way to
ensure that the resulting complete image is fully readable, and even
this latter can take another evening just for one document.
For this part you might try some panorama-making software. I have used
several and all do the job, some better, some faster. It might well be
worth your time to look into.
Sample: Emblend (from the description in Synaptic (Debian Linux Sid))
Enblend is a tool for compositing images. Given a set of images that
overlap in some irregular way, Enblend overlays them in such a way that
the seam between the images is invisible, or at least very difficult to
see. It can, for example, be used to blend a panorama composed of
several images.
For the last two days, besides trying to continue the work as previously
described, I've been spending some time trying out Linux-based options,
and have some questions.
I've tried scanning with xsane, but, although it does at least use the
full scanning glass area, I hate it - it's another of those programs
So tell it not to open all those options.
If you close say the preview window before closing the xsane, then the
preview window will not come up next time. Similarly with the other
windows.
Also if you do not want to keep a scan, close the window. You seem to
getting annoyed for the sake of getting annoyed. Do you want to
accomplish the job or do you want to complain!
You in turn seem to want just to complain about my complaining.

I want to accomplish the job without the software giving me cause to
complain - vomiting windows all over my desktop is a valid cause for
complaint. Whatever happened to the efficient paradigm that I've worked
with for 30 years since Windows 2 that an app launched in a single
window (which, incidentally, had its control buttons top right, not top
left), and everything was contained in the one window?
Post by William Unruh
If you keep it open then you WILL clutter your worspace. The only
windows I open are the main scanning window and the preview window since
both are usually useful. ( also because there is a bug in xsane, the
preview window will not open properly if it is not opened when xsane
starts. If you have that problem, open preview, close xsane, and then
reopen it. Now the preview window will behave itself.)
Ah! A bug, kind of goes with an ill-thought out interface ...
Post by William Unruh
Post by Java Jive
:-( It plasters the screen with multiple windows, which is a GUI
paradigm I absolutely hate. There is bewildering clutter all over the
screen, and multiple icons for the same program fill up the taskbar and
you've no idea which is which - there is one for the program, one for
each of various controls, and one for every scan that you do, which, if
you don't want to keep it, needs to be closed before trying another scan
in order to avoid confusion.
:-( There is no quick way, for example a keystroke, to differentiate
between doing a draft scan and a quality scan.
What are you talking about. They look entirely different.
In HP's Windows software, <Ctrl-N> starts a new scan (always draft),
<ctrl-S> saves the current scan at the current resolution settings
(which could also be draft, but are more likely to be higher resolution)
- just to be clear, a dialog box then opens to set the output
filename and once this is done the scanner operates a second time more
slowly to take the higher resolution scan to save under the chosen name.
AFAICT, there is no simple equivalent in xsane, you actually have to
go and change setting for the desired resolution, thus giving the
possibility that, as you are constantly changing the settings between
draft and your chosen higher resolution, sooner or later you'll do the
latter incorrectly, and thus save parts of a large document at different
resolutions.
Post by William Unruh
Post by Java Jive
:-( There doesn't seem to be any simple way of telling it to do a
greyscale scan, certainly I've not found out how to.
jpush the "Greyscale" or Colour button and choose from the options.
Can't see any such button with any such option - there's a colour drop
down list with various options for different types of source
photographic negative, slide, etc, but greyscale is not an option there.
Post by William Unruh
Post by Java Jive
So I guess I'll stick with the HP Windows scanning software for now.
Go ahead.
Has anyone anything *useful* to add, especially about Hugin Panorama
Creator, because I'm tantalisingly close with that? From basic physics
I deduced that if I wanted the pictures to be 'flat', without barrel or
pincushion distortion, I needed to increase the focal length, preferably
to infinity, so I tried various powers of 2, beginning at 2^10 = 1024,
but this and 512 caused the program to crash, so I had to settle for
256. The result still has pincushion distortion, but less noticeably so
than the example given above. It's probably acceptable, but I'd like to
get rid of it altogether if it's possible to do so.
Java Jive
2019-10-12 12:33:22 UTC
Permalink
Post by Java Jive
Has anyone anything *useful* to add, especially about Hugin Panorama
Creator, because I'm tantalisingly close with that?  From basic physics
I deduced that if I wanted the pictures to be 'flat', without barrel or
pincushion distortion, I needed to increase the focal length, preferably
to infinity, so I tried various powers of 2, beginning at 2^10 = 1024,
but this and 512 caused the program to crash, so I had to settle for
256.  The result still has pincushion distortion, but less noticeably so
than the example given above.  It's probably acceptable, but I'd like to
get rid of it altogether if it's possible to do so.
Really getting good results with this now, so for the benefit of others,
this is how I'm stitching the scans together using Hugin Panorama
Creator (in some desktop menus aka 'Panorama stitcher') on a Linux PC:

I'm using Ubuntu 16.04 with Kernel v4.15 and Xfce Desktop v4.12.

0) Hugin is installed by default, but Enblend, which Hugin can use, is
not, so before beginning, go into a command prompt and type ...
sudo apt-get install enblend
... or whatever the equivalent command for your distro would be.

Nothing else is required to be done, Hugin will thenceforth know that
it's there and use it by default.

1) From the Graphics sub-menu, launch
Hugin Panorama Creator (aka 'Panorama stitcher')

2) It should by default come up in the Simple Interface, but check this
from the Interface menu. In this version of the interface, there are a
number of tabs labelled: Assistant, Preview, etc, and you should be in
the Assistant tab where there are three buttons across the top denoting
the relevant stages of the work:
1. Load images
2. Align
3. Create panorama

3) So begin by clicking '1. Load images', and in the dialogue box select
the images that you wish to stitch together. In my case there are
twelve in 3 rows x 4 columns labelled ...
* - <page> - <row><col>
... for example ...
xyz - 1 - 1a.png
xyz - 1 - 1b.png
[...]
xyz - 1 - 3c.png
xyz - 1 - 3d.png
... then click Open, at which point a dialogue box entitled ...
Camera and Lens data
... pops up. In the 'Lens type' drop-down list choose ...
Normal (rectilinear)
... and in the box labelled focal length, enter
1000
... and click OK.

4) Next, click the Projection tab, and choose ...
Rectilinear
... from the drop-down list

5) Click the Assistant tab, and then click ...
2. Align
... and a logging box will open up while the software searches the
images for matching borders, and arranges them correspondingly in a
grid, a process that may take a minute or two depending on the speed of
your PC. Usually it will successfully and automatically arrange the
images into the correct grid. If it can't, you have some sort of
problem such as insufficient overlap between two or more neighbouring
images.

6) Now you should have the combined image top left corner of the work
area, but you want it central, so click the 'Move/Drag' tab. Note that
the 'Center' button only seems to centre the image horizontally at best,
so you'll have to do it by eye, but it does zoom on the result, which is
helpful in itself, because you can see more detail. So drag the image
to centre it on the cross-hairs, and then check the result by clicking
Center, and if necessary repeat. For now, don't worry about part or
all of it being greyed out, that's dealt with in the next stage.

Also, if the result is not horizontal, you can correct this by
<rt-dragging> the image or by entering a roll parameter, +ve for
clockwise, -ve for anti-clockwise - usually only fractions of a degree
are required. Note that the roll parameter is cumulative so each new
attempt will work on the picture as now is, not as originally was.

Don't forget to click 'Apply' when you are satisfied you've got it
centred and horizontal.

7) Click the Crop tab and then click ...
Autocrop
... and the greyed out areas that will be cropped out should be
automatically recentred around the edges of the final image in its new
position.

8) Click the Assistant tab, and now click ...
3. Create panorama
... the LDR Format for the output defaults to TIFF, but I prefer PNG,
you can also choose JPEG. Having made that choice, click OK.

Unless you've already saved the project ...

First by default a warning box appears concerning the need to save the
project, this can be disabled permanently through a checkmark option on
it, but anyway clicking Ok will launch ...

Second the actual project save dialogue box appears. I usually choose
to shorten the default name, which can be changed in the preferences,
but none of the possible defaults offered suit my purposes either, so I
usually just change it here. When satisfied click Save.

Either way, next a dialogue box for the final picture filename appears,
which defaults to the basename of the project name chosen above to which
a suitable extension will be added. When satisfied click Save.

Again, depending on the speed of your PC, the software will be away
processing for a few minutes, at the end of which, an empty batch
processing dialogue should appear, empty because by default finished
projects are removed, and hopefully yours finished successfully. If the
dialogue still contains your project, then you're probably already aware
from some earlier warning message that there was some sort of problem.
I haven't researched the possible errors very thoroughly, but one
possible cause seems to be if you set a long focal length like the 1000
I suggested above, but do not set the Projection to be Rectilinear.

But hopefully you will end up with something that has perfect, or nearly
so, legibility like these:
www.macfh.co.uk/Temp/Marriage_Settlement_J_MacFarlane_A_Alston_1.png
www.macfh.co.uk/Temp/Marriage_Settlement_J_MacFarlane_A_Alston_2.png

Each took about 20 mins to create as opposed to several hours trying to
achieve the same by hand, and the results are better in terms of cleaned
up background noise, etc. In particular, using the long focal length
seems effectively to remove the distortion that marred by earlier
efforts - probably it hasn't removed it entirely, just made it
invisibly slight, but, anyway, I think the results above are plenty good
enough.
J. P. Gilliver (John)
2019-10-12 16:18:15 UTC
Permalink
Thanks for these detailed instructions! (I don't at the moment
anticipate ever having to assemble from a grid of images, but you never
know.)
[]
Post by Java Jive
Really getting good results with this now, so for the benefit of
others, this is how I'm stitching the scans together using Hugin
Panorama Creator (in some desktop menus aka 'Panorama stitcher') on a
(Is it _available_ for Windows?)
[]
Post by Java Jive
... then click Open, at which point a dialogue box entitled ...
Camera and Lens data
... pops up. In the 'Lens type' drop-down list choose ...
Normal (rectilinear)
... and in the box labelled focal length, enter
1000
... and click OK.
Odd that there isn't any way of telling it infinite. I suppose it was
written assuming always that the component images come from a camera of
_some_ sort, rather than a scanner.
Post by Java Jive
4) Next, click the Projection tab, and choose ...
Rectilinear
... from the drop-down list
5) Click the Assistant tab, and then click ...
2. Align
... and a logging box will open up while the software searches the
images for matching borders, and arranges them correspondingly in a
grid, a process that may take a minute or two depending on the speed of
your PC. Usually it will successfully and automatically arrange the
images into the correct grid. If it can't, you have some sort of
problem such as insufficient overlap between two or more neighbouring
images.
If that _is_ the case, can you manually tell it (say a couple of common
points), or do you have to go back and do that/those scan(s) again?
[]
Post by Java Jive
Also, if the result is not horizontal, you can correct this by
<rt-dragging> the image or by entering a roll parameter, +ve for
clockwise, -ve for anti-clockwise - usually only fractions of a
degree are required. Note that the roll parameter is cumulative so
each new attempt will work on the picture as now is, not as originally was.
I would tend to leave any such to the last possible moment, as I can't
help thinking it will degrade the image, especially if cumulative.
Rather like saving and reloading as JPEG repeatedly. (Though, like the
alleged - and real - degradations of JPEG, the deterioration may in
practice not be visible.)
[]
Post by Java Jive
cleaned up background noise, etc. In particular, using the long focal
length seems effectively to remove the distortion that marred by
earlier efforts - probably it hasn't removed it entirely, just made
it invisibly slight, but, anyway, I think the results above are plenty
good enough.
Maybe if you contact the developers, they might implement an
infinite-focal-length (or "scanned not photographed") setting; it ought
to be simple to do, as it's _removing_ one set of processing
calculations.
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

WANTED, Dead AND Alive: Schrodinger's Cat
Carlos E.R.
2019-10-12 17:23:24 UTC
Permalink
Post by J. P. Gilliver (John)
Thanks for these detailed instructions! (I don't at the moment
anticipate ever having to assemble from a grid of images, but you never
know.)
...
Post by J. P. Gilliver (John)
Post by Java Jive
Also, if the result is not horizontal, you can correct this by
<rt-dragging> the image or by entering a roll parameter, +ve for
clockwise, -ve for anti-clockwise  -  usually only fractions of a
degree are required.  Note that the roll parameter is cumulative so
each new attempt will work on the picture as now is, not as originally was.
I would tend to leave any such to the last possible moment, as I can't
help thinking it will degrade the image, especially if cumulative.
Rather like saving and reloading as JPEG repeatedly. (Though, like the
alleged - and real - degradations of JPEG, the deterioration may in
practice not be visible.)
To rotate an image a few degrees the algorithm has to actually generate
a new transformed image from the expanded image. Ie, from the image in
memory. To add some more degrees it could go back and transform the
original plus angle A+B. I wonder why not.
--
Cheers, Carlos.
Java Jive
2019-10-12 17:25:00 UTC
Permalink
Post by J. P. Gilliver (John)
Thanks for these detailed instructions! (I don't at the moment
anticipate ever having to assemble from a grid of images, but you never
know.)
That's why I posted them, to remind myself and to help others in future.
Post by J. P. Gilliver (John)
[]
Post by Java Jive
Really getting good results with this now, so for the benefit of
others, this is how I'm stitching the scans together using Hugin
Panorama Creator (in some desktop menus aka 'Panorama stitcher') on a
(Is it _available_ for Windows?)
NAFAIK
Post by J. P. Gilliver (John)
Post by Java Jive
... then click Open, at which point a dialogue box entitled ...
      Camera and Lens data
... pops up.  In the 'Lens type' drop-down list choose ...
      Normal (rectilinear)
... and in the box labelled focal length, enter
      1000
... and click OK.
Odd that there isn't any way of telling it infinite. I suppose it was
written assuming always that the component images come from a camera of
_some_ sort, rather than a scanner.
Yes, I think that's probably it.
Post by J. P. Gilliver (John)
Post by Java Jive
4)     Next, click the Projection tab, and choose ...
      Rectilinear
... from the drop-down list
5)     Click the Assistant tab, and then click ...
      2. Align
... and a logging box will open up while the software searches the
images for matching borders, and arranges them correspondingly in a
grid, a process that may take a minute or two depending on the speed
of your PC.  Usually it will successfully and automatically arrange
the images into the correct grid.  If it can't, you have some sort of
problem such as insufficient overlap between two or more neighbouring
images.
If that _is_ the case, can you manually tell it (say a couple of common
points), or do you have to go back and do that/those scan(s) again?
Yes, you can try clicking the Layout tab and from there entering control
points manually.
Post by J. P. Gilliver (John)
Post by Java Jive
Also, if the result is not horizontal, you can correct this by
<rt-dragging> the image or by entering a roll parameter, +ve for
clockwise, -ve for anti-clockwise  -  usually only fractions of a
degree are required.  Note that the roll parameter is cumulative so
each new attempt will work on the picture as now is, not as originally was.
I would tend to leave any such to the last possible moment, as I can't
help thinking it will degrade the image, especially if cumulative.
Rather like saving and reloading as JPEG repeatedly. (Though, like the
alleged - and real - degradations of JPEG, the deterioration may in
practice not be visible.)
I noticed no such degradation.
Post by J. P. Gilliver (John)
Post by Java Jive
cleaned up background noise, etc.  In particular, using the long focal
length seems effectively to remove the distortion that marred by
earlier efforts  -  probably it hasn't removed it entirely, just made
it invisibly slight, but, anyway, I think the results above are plenty
good enough.
Maybe if you contact the developers, they might implement an
infinite-focal-length (or "scanned not photographed") setting; it ought
to be simple to do, as it's _removing_ one set of processing calculations.
Yes, I can't believe it's not possible.
Carlos E.R.
2019-10-12 14:10:50 UTC
Permalink
Post by Java Jive
Post by William Unruh
Post by Java Jive
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
...
Post by Java Jive
Post by William Unruh
Post by Java Jive
I've tried scanning with xsane, but, although it does at least use the
full scanning glass area, I hate it  -  it's another of those programs
So tell it not to open all those options.
If you close say the preview window before closing the xsane, then the
preview window will not come up next time. Similarly with the other
windows.
Also if you do not want to keep a scan, close the window. You seem to
getting annoyed for the sake of getting annoyed. Do you want to
accomplish the job or do you want to complain!
You in turn seem to want just to complain about my complaining.
I want to accomplish the job without the software giving me cause to
complain  -  vomiting windows all over my desktop is a valid cause for
complaint.  Whatever happened to the efficient paradigm that I've worked
with for 30 years since Windows 2 that an app launched in a single
window (which, incidentally, had its control buttons top right, not top
left), and everything was contained in the one window?
Well, it just uses another paradigm, which is used by other Windows
software too (Delphi, for instance). As I said, others hate the single
window paradigm :-p

Me, I don't care if the software does the job well.

...
Post by Java Jive
Post by William Unruh
Post by Java Jive
    :-(    There is no quick way, for example a keystroke, to
differentiate
between doing a draft scan and a quality scan.
What are you talking about. They look entirely different.
In HP's Windows software, <Ctrl-N> starts a new scan (always draft),
<ctrl-S> saves the current scan at the current resolution settings
(which could also be draft, but are more likely to be higher resolution)
 -  just to be clear, a dialog box then opens to set the output filename
and once this is done the scanner operates a second time more slowly to
take the higher resolution scan to save under the chosen name.  AFAICT,
there is no simple equivalent in xsane, you actually have to go and
change setting for the desired resolution, thus giving the possibility
that, as you are constantly changing the settings between draft and your
chosen higher resolution, sooner or later you'll do the latter
incorrectly, and thus save parts of a large document at different
resolutions.
You are doing it wrong. For a preview aka draft, just click the acquire
preview button in the preview window.
Post by Java Jive
Post by William Unruh
Post by Java Jive
    :-(    There doesn't seem to be any simple way of telling it to do a
greyscale scan, certainly I've not found out how to.
jpush the "Greyscale" or Colour button and choose from the options.
Can't see any such button with any such option  -  there's a colour drop
down list with various options for different types of source
photographic negative, slide, etc, but greyscale is not an option there.
The one above it.
Post by Java Jive
Post by William Unruh
Post by Java Jive
So I guess I'll stick with the HP Windows scanning software for now.
Go ahead.
Has anyone anything *useful* to add, especially about Hugin Panorama
Creator, because I'm tantalisingly close with that?  From basic physics
I deduced that if I wanted the pictures to be 'flat', without barrel or
pincushion distortion, I needed to increase the focal length, preferably
to infinity, so I tried various powers of 2, beginning at 2^10 = 1024,
but this and 512 caused the program to crash, so I had to settle for
256.  The result still has pincushion distortion, but less noticeably so
than the example given above.  It's probably acceptable, but I'd like to
get rid of it altogether if it's possible to do so.
Sorry, I don't know of software to join flat scans. Google suggests Hugin

<http://hugin.sourceforge.net/tutorials/scans/en.shtml>

<https://sibleyfineart.com/tutorial--join-scans.htm>

<https://www.davidrevoy.com/article314/autostiching-scan-with-hugin>

<https://graphicdesign.stackexchange.com/questions/114368/how-to-join-separate-scans-of-same-object>

<https://havecamerawilltravel.com/photographer/scan-oversize-images/>

<https://www.dpreview.com/forums/thread/3593334>
--
Cheers, Carlos.
William Unruh
2019-10-13 14:34:55 UTC
Permalink
Post by Java Jive
Post by William Unruh
Post by Java Jive
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
...
Post by Java Jive
Post by William Unruh
Post by Java Jive
I've tried scanning with xsane, but, although it does at least use the
full scanning glass area, I hate it  -  it's another of those programs
So tell it not to open all those options.
If you close say the preview window before closing the xsane, then the
preview window will not come up next time. Similarly with the other
windows.
Also if you do not want to keep a scan, close the window. You seem to
getting annoyed for the sake of getting annoyed. Do you want to
accomplish the job or do you want to complain!
You in turn seem to want just to complain about my complaining.
I want to accomplish the job without the software giving me cause to
complain  -  vomiting windows all over my desktop is a valid cause for
Impossible. Some people just like complaining, and become especially
vocifereous if the tools they use do not conform exactly to their
prejudices as to how the tools should look or work.

As I said, you can alter the number of windows that are "spewed". I told
youhow to alter that behaviour. Something you then ignored, instead of
saying "Thank you".
Post by Java Jive
complaint.  Whatever happened to the efficient paradigm that I've worked
with for 30 years since Windows 2 that an app launched in a single
It has never been a paradigm. Some programs worked that way some worked
differently. For example menu choices almost always produced a second window
listing the options. Why would they clutter up, resize, or cover up the
one window when they want to show extra information?
Post by Java Jive
window (which, incidentally, had its control buttons top right, not top
left), and everything was contained in the one window?
You can adjust where you want the control buttom for a window to be
placed. But this complaining is just getting silly.
Carlos E.R.
2019-10-12 13:31:36 UTC
Permalink
Post by Java Jive
Post by Cybe R. Wizard
On Sat, 5 Oct 2019 16:59:57 +0100
For the last two days, besides trying to continue the work as previously
described, I've been spending some time trying out Linux-based options,
and have some questions.
I've tried scanning with xsane, but, although it does at least use the
full scanning glass area, I hate it  -  it's another of those programs
    :-(    It plasters the screen with multiple windows, which is a GUI
paradigm I absolutely hate.
Well, it has its uses. Others hate the single window paradigm.
Post by Java Jive
  There is bewildering clutter all over the
screen, and multiple icons for the same program fill up the taskbar and
you've no idea which is which  -  there is one for the program, one for
each of various controls,
Just close those you do not use, put the rest in the positions you like,
then save preferences. I use the main window, preview, and histogram.

And I dedicate a workspace to it.
Post by Java Jive
and one for every scan that you do, which, if
you don't want to keep it, needs to be closed before trying another scan
in order to avoid confusion.
So?
Post by Java Jive
    :-(    There is no quick way, for example a keystroke, to
differentiate between doing a draft scan and a quality scan.
Yes, there is. Just click on the "acquire preview" in the preview window.
Post by Java Jive
    :-(    There doesn't seem to be any simple way of telling it to do a
greyscale scan, certainly I've not found out how to.
Main window: Color, grayscale, lineart.
Post by Java Jive
So I guess I'll stick with the HP Windows scanning software for now.
Use what you like and works for you (but remember that HP software has a
tendency to talk home).

xsane is not a "simple" program, it is a powerful one with many dozens
of options. Takes a bit of learning, as most.
--
Cheers, Carlos.
Big Al
2019-10-05 21:10:42 UTC
Permalink
Post by Java Jive
Cross-posting to one Linux and one Windows group, because I'm happy to
run a workable solution on either OS.
As per the post's title, I have a need for some software, preferably
free or at least low cost, to run on either OS, which can recognise
handwriting.  The context is that I've inherited a trunkful of old
Macfarlane documents, nearly all hand-written, which I'm scanning my way
through, having completed probably three-quarters or more of it.
1)    Some old deeds, wills, etc written on parchment (stretched animal
skin).  These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly over
all the joins.  This is the stage I'm currently at, and it's taking a
long time because the business of joining the images together is very,
very fiddly.  It takes an entire evening to create a single image of
such a document, then, currently, after each is done, I convert it into
text manually, as that's the best way to ensure that the resulting
complete image is fully readable, and even this latter can take another
evening just for one document.
2)    Many family letters
3)    A few pages of accounts
4)    Family trees
5)    Log books of sea-voyages, diaries of holidays, hand-written books
containing historical research notes, etc.
6)    Many loose pages of notes concerning family history, clan history,
and Scottish history, nearly all of which are on foolscap pages, which
again had to be scanned in two sections to capture the top and bottom of
the page and then joined together to form one image.
Some of the notes were in pencil, which required me altering the
exposure settings.  Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.
As I'm currently nearing the end of the scanning phase, next will be
sorting the documents into boxes and the files into sub-directories  -
of course they are already, because I was trying to make sense of it all
as I worked through it, but only after I know entirely what is there
will I be able to finalise how best it should be divided up into
meaningful chunks of information.  After that will come the probably
much longer phase of converting as much of it as possible into
searchable text.  It's this last stage I really would like to shorten as
much as possible, because I'd like to have a chance of finishing it
before I die !-)
So, does anyone here have any experience of software to recognise
hand-writing in an image file (*.png) and produce a text file from it?
We had some family docs / letters too once that we OCR'd into MS Word.
I can just tell you that it can be done but OCR (optical character
recognition) is an imperfect tool. You have several hurdles to jump
over especially if it's handwriting. But either way even printed text
we did that was done on a typewriter, the hammers didn't strike
consistently over the entire document and some o's looked like c's and
we had a lot of errors we had to correct once in Word.

I can't suggest any but you can google 'OCR software' and find a few and
test them. Some are better as they allow you to adjust sensitivity when
scanning. But you're going to have to spend a lot of time proof
reading in detail for errors. Luckily Word (or whatever) will spell
check and get some major issues like 'majcr'. But others you'll have to
read almost letter for letter to make sure.

Also the image you are trying to OCR you can sometimes adjust in a photo
editor the contrast and brightness and make it sharper to decrease you
error rate.

Good Luck.

Al
Carlos E. R.
2019-10-05 22:05:56 UTC
Permalink
On 05/10/2019 17.59, Java Jive wrote:
...
Post by Java Jive
6)    Many loose pages of notes concerning family history, clan history,
and Scottish history, nearly all of which are on foolscap pages, which
again had to be scanned in two sections to capture the top and bottom of
the page and then joined together to form one image.
Maybe you could use a high resolution camera instead of a scanner.

I saw a video of a book scanner that used two cameras, each focused on
the left or right hand pages. The book was not fully opened, but at an
angle, and maybe there were two glasses holding the pages flat (at an
angle one to the other).
Post by Java Jive
Some of the notes were in pencil, which required me altering the
exposure settings.  Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.
I have an idea that they might be enhanced by adjusting "the curve" in
two sections (I don't know how to explain). Or blend two enhanced
photos. Or just two photos and OCR them separately.

...
--
Cheers,
Carlos E.R.
Java Jive
2019-10-06 12:50:58 UTC
Permalink
Post by Carlos E. R.
Maybe you could use a high resolution camera instead of a scanner.
I already have a decent camera, ring-flash, and tripod, which I used to
photograph the plates in an antique bird book before I sold it, but the
trouble is that old photos get curly, and the parchments are all creased
where they've been folded, and many of the papers are dog-eared and
crinkled at the edges, and a camera really doesn't get very good results
for these.
Post by Carlos E. R.
Post by Java Jive
Some of the notes were in pencil, which required me altering the
exposure settings.  Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.
I have an idea that they might be enhanced by adjusting "the curve" in
two sections (I don't know how to explain). Or blend two enhanced
photos. Or just two photos and OCR them separately.
Yes, I think that's what the Exposure settings actually do.
Carlos E.R.
2019-10-06 13:04:58 UTC
Permalink
Post by Java Jive
Post by Carlos E. R.
Maybe you could use a high resolution camera instead of a scanner.
I already have a decent camera, ring-flash, and tripod, which I used to
photograph the plates in an antique bird book before I sold it, but the
trouble is that old photos get curly, and the parchments are all creased
where they've been folded, and many of the papers are dog-eared and
crinkled at the edges, and a camera really doesn't get very good results
for these.
Put a glass over it.
Post by Java Jive
Post by Carlos E. R.
Post by Java Jive
Some of the notes were in pencil, which required me altering the
exposure settings.  Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.
I have an idea that they might be enhanced by adjusting "the curve" in
two sections (I don't know how to explain). Or blend two enhanced
photos. Or just two photos and OCR them separately.
Yes, I think that's what the Exposure settings actually do.
No, far from it.

***********
*
*
*
*******
*
*
*
*****

Post processing. Pre processing can't do that.
--
Cheers, Carlos.
Java Jive
2019-10-06 15:06:52 UTC
Permalink
Post by Carlos E.R.
Post by Java Jive
Post by Carlos E. R.
Maybe you could use a high resolution camera instead of a scanner.
I already have a decent camera, ring-flash, and tripod, which I used to
photograph the plates in an antique bird book before I sold it, but the
trouble is that old photos get curly, and the parchments are all creased
where they've been folded, and many of the papers are dog-eared and
crinkled at the edges, and a camera really doesn't get very good results
for these.
Put a glass over it.
Easy to say, but I have nothing suitable.
Post by Carlos E.R.
Post by Java Jive
Post by Carlos E. R.
Post by Java Jive
Some of the notes were in pencil, which required me altering the
exposure settings.  Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.
I have an idea that they might be enhanced by adjusting "the curve" in
two sections (I don't know how to explain). Or blend two enhanced
photos. Or just two photos and OCR them separately.
Yes, I think that's what the Exposure settings actually do.
No, far from it.
***********
*
*
*
*******
*
*
*
*****
Post processing. Pre processing can't do that.
I don't understand what you're trying to say then.
F Russell
2019-10-06 19:51:28 UTC
Permalink
For the splicing, you definitely need some sort of joining software.
On GNU/Linux, a little known alternative to the popular Hugen/Enblend
tools is a stitching program called xmerge:

http://xmerge.sourceforge.net

The last release was 2005, but the source code still compiles provided
the "-std=gnu89" option is used with gcc.

It works well. I have tested it with the images available here:

http://hugin.sourceforge.net/tutorials/scans/en.shtml

scan-1.jpg, scan-2.jpg
Carlos E.R.
2019-10-06 21:21:47 UTC
Permalink
Post by Java Jive
Post by Carlos E.R.
Post by Java Jive
Post by Carlos E. R.
Maybe you could use a high resolution camera instead of a scanner.
I already have a decent camera, ring-flash, and tripod, which I used to
photograph the plates in an antique bird book before I sold it, but the
trouble is that old photos get curly, and the parchments are all creased
where they've been folded, and many of the papers are dog-eared and
crinkled at the edges, and a camera really doesn't get very good results
for these.
Put a glass over it.
Easy to say, but I have nothing suitable.
Glass should be cheap to buy. But it has to be good quality, and
illuminating without reflections is a pain. There is a special
non-reflective glass used to cover paintings, but it loses some
transparency and adds some "grain".
Post by Java Jive
Post by Carlos E.R.
Post by Java Jive
Post by Carlos E. R.
Post by Java Jive
Some of the notes were in pencil, which required me altering the
exposure settings.  Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.
I have an idea that they might be enhanced by adjusting "the curve" in
two sections (I don't know how to explain). Or blend two enhanced
photos. Or just two photos and OCR them separately.
Yes, I think that's what the Exposure settings actually do.
No, far from it.
                   ***********
                  *
                 *
                *
         *******
        *
       *
      *
*****
Post processing. Pre processing can't do that.
I don't understand what you're trying to say then.
Gimp, Tools, Colour tools, Curves.

Example:

Original: <https://susepaste.org/82042053>
curve: <https://susepaste.org/33160711>
result: <https://susepaste.org/37943478>

Quality reduced for posting. Post will expire in a month.

Playing with that "curve" produces interesting results, and can improve
readability in some cases. The colours can change dramatically though,
but the purpose is to see some hidden detail.

(Better png than jpg for this treatment)
--
Cheers, Carlos.
Java Jive
2019-10-06 22:25:11 UTC
Permalink
Post by Carlos E.R.
Glass should be cheap to buy. But it has to be good quality, and
illuminating without reflections is a pain. There is a special
non-reflective glass used to cover paintings, but it loses some
transparency and adds some "grain".
By the time I next have to visit a town, the nearest is at least 35
miles away, I'll probably have finished the job anyway.
David E. Ross
2019-10-06 00:05:23 UTC
Permalink
Post by Java Jive
Cross-posting to one Linux and one Windows group, because I'm happy to
run a workable solution on either OS.
As per the post's title, I have a need for some software, preferably
free or at least low cost, to run on either OS, which can recognise
handwriting. The context is that I've inherited a trunkful of old
Macfarlane documents, nearly all hand-written, which I'm scanning my way
through, having completed probably three-quarters or more of it.
1) Some old deeds, wills, etc written on parchment (stretched animal
skin). These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
together into one so that each line of writing continues seamlessly over
all the joins. This is the stage I'm currently at, and it's taking a
long time because the business of joining the images together is very,
very fiddly. It takes an entire evening to create a single image of
such a document, then, currently, after each is done, I convert it into
text manually, as that's the best way to ensure that the resulting
complete image is fully readable, and even this latter can take another
evening just for one document.
2) Many family letters
3) A few pages of accounts
4) Family trees
5) Log books of sea-voyages, diaries of holidays, hand-written books
containing historical research notes, etc.
6) Many loose pages of notes concerning family history, clan history,
and Scottish history, nearly all of which are on foolscap pages, which
again had to be scanned in two sections to capture the top and bottom of
the page and then joined together to form one image.
Some of the notes were in pencil, which required me altering the
exposure settings. Worse still, quite a number of pages had both ink
and pencil, for which it was difficult to find settings that showed both
to best advantage.
As I'm currently nearing the end of the scanning phase, next will be
sorting the documents into boxes and the files into sub-directories -
of course they are already, because I was trying to make sense of it all
as I worked through it, but only after I know entirely what is there
will I be able to finalise how best it should be divided up into
meaningful chunks of information. After that will come the probably
much longer phase of converting as much of it as possible into
searchable text. It's this last stage I really would like to shorten as
much as possible, because I'd like to have a chance of finishing it
before I die !-)
So, does anyone here have any experience of software to recognise
hand-writing in an image file (*.png) and produce a text file from it?
As for transcribing handwriting, I do not think you will find anything
that is not very expensive. Even humans trained in handwriting analysis
often have trouble reading handwritten documents. For a computer to do
this, you will need a very sophisticated artificial intellegence system.
By "very sophisticated", I mean very expensive and then only if it
exists.
--
David E. Ross
<http://www.rossde.com/>

Immigration authorities arrested 680 undocumented aliens in meat
processing facilities in Mississippi. Employing someone who is not
legally in the U.S. is also illegal. How many of the EMPLOYERS are
being criminally charged? If none, why not?
Carlos E.R.
2019-10-06 12:50:39 UTC
Permalink
...
Post by David E. Ross
Post by Java Jive
So, does anyone here have any experience of software to recognise
hand-writing in an image file (*.png) and produce a text file from it?
As for transcribing handwriting, I do not think you will find anything
that is not very expensive. Even humans trained in handwriting analysis
often have trouble reading handwritten documents. For a computer to do
this, you will need a very sophisticated artificial intellegence system.
By "very sophisticated", I mean very expensive and then only if it
exists.
Unless google puts up an online service...
--
Cheers, Carlos.
J. P. Gilliver (John)
2019-10-06 00:22:17 UTC
Permalink
In message <qnaelv$5rt$***@gioia.aioe.org>, Java Jive
<***@evij.com.invalid> writes:
[]
animal skin). These are so large that they have had to be scanned in
sections, anything from 9-16 in number, and then the individual images
joined together into one so that each line of writing continues
seamlessly over all the joins. This is the stage I'm currently at, and
it's taking a long time because the business of joining the images
together is very, very fiddly. It takes an entire evening to create a
single image of such a document, then, currently, after each is done, I
convert it into text manually, as that's the best way to ensure that
the resulting complete image is fully readable, and even this latter
can take another evening just for one document.
I echo "Cybe"'s suggestion of panorama-making, or other similar
splicing, software. I _think_ even IrfanView (late versions, possibly
only the latest) has some such function; I haven't played with it so
don't know how good it is, but it's free and fairly fast, so it might be
worth a play, if only to see what you need from better software. (I
don't know if it has any automatic function or is just manual.)
[]
history, and Scottish history, nearly all of which are on foolscap
pages, which again had to be scanned in two sections to capture the top
and bottom of the page and then joined together to form one image.
You say that in the past tense so it sounds like you've already done it,
but if not: some sheetfed scanners, especially the portable ones, will
actually scan over a longer distance than flatbed scanners of the same
_width_. (British BMD certificates before a certain date are long and
narrow; I have such a scanner I use for them. It's the sort designed for
portable use - basically a long rod with motors in, that you feed the
document through sideways. (I have some clear sleeves for putting more
delicate things through it in.) Putting "portable scanner" into ebay
will find you lots (obviously ignore the barcode scanner it throws up).
A lot now seem to be described as "wand" scanners, so maybe are
wipe-across rather than feed-through, which might be better for you,
unless they actually only have a short scanning area.

For the larger documents, assuming price (and possibly somewhere to keep
it/them!) of large format scanners makes them beyond your reach, you
could also have a look at mouse scanners - here's one:
https://www.ebay.co.uk/itm/LG-LSM-150-Smart-Mouse-Scanner-BNIB-SDcard-8GB/333351004034
(9.99 - or offer! - post free in UK; if it shows otherwise from where
you are, just put "mouse scanner" into your local ebay). Have a look at
the YouTube video linked from it for how it works. They _claim_ they'll
do up to A3. I rather imagine they'd be tedious to use on large
documents, but maybe no more so than stitching multiple small segments.
They're basically small scanners whose driver software includes good
stitching software. I have one; it works well enough. (Obviously it's
the driver software, not the mouse scanner itself, that does the
stitching, but that doesn't make any difference in practice. [Ditto the
OCR function, despite what the man doing the video thinks; that's only
for print anyway, so probably not relevant to you - though may be for
any typed documents.])
[]
So, does anyone here have any experience of software to recognise
hand-writing in an image file (*.png) and produce a text file from it?
Sorry, not handwriting.

If you have any blind clubs/institutions/whatever near you, you might
approach them: OCR is a subject of great interest to blind people,
including of handwritten, and they may be able to show you practical
matters. (Especially if there are any exhibitions of VH/VI aids -
including software - coming up in your area. Though be prepared for
pricing to make your eyes water - the small size of the market is such
that such equipment and software doesn't pass over the quantity
threshold, and most includes a fair part of the development cost. At
least that's the case in the UK.)
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

Tact is the ability to describe others as they see themselves. -Abraham
Lincoln, 16th president of the U.S (1809-1865)
Java Jive
2019-10-06 12:59:09 UTC
Permalink
Post by J. P. Gilliver (John)
I echo "Cybe"'s suggestion of panorama-making, or other similar
splicing, software. I _think_ even IrfanView (late versions, possibly
only the latest) has some such function; I haven't played with it so
don't know how good it is, but it's free and fairly fast, so it might be
worth a play, if only to see what you need from better software. (I
don't know if it has any automatic function or is just manual.)
I have versions of that on all my Windows PCs, so that's worth
investigating. Thanks.
Post by J. P. Gilliver (John)
history, and Scottish history, nearly all of which are on foolscap
pages, which again had to be scanned in two sections to capture the
top and bottom of the page and then joined together to form one image.
You say that in the past tense so it sounds like you've already done it,
Pretty much so. I think I'm about 3/4 of the way through - all I've
got left to do are about 5 or 6 more large parchments, some other large
legal documents on paper, some rolls of paper whose contents I have yet
to examine, and about 4 or 5 notebooks, which also I have yet to examine.
Post by J. P. Gilliver (John)
So, does anyone here have any experience of software to recognise
hand-writing in an image file (*.png) and produce a text file from it?
Sorry, not handwriting.
Unfortunately that seems to be the general cry.
Post by J. P. Gilliver (John)
If you have any blind clubs/institutions/whatever near you, you might
approach them: OCR is a subject of great interest to blind people,
including of handwritten, and they may be able to show you practical
matters. (Especially if there are any exhibitions of VH/VI aids -
including software - coming up in your area. Though be prepared for
pricing to make your eyes water - the small size of the market is such
that such equipment and software doesn't pass over the quantity
threshold, and most includes a fair part of the development cost. At
least that's the case in the UK.)
That may be a useful tip. Thanks.
F Russell
2019-10-06 00:55:14 UTC
Permalink
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
You may want to try this DIY modification to make scanning and
stitching easier:

https://mpetroff.net/2013/09/scanner-modifications-to-scan-large-documents

He even includes scripts to automate the scanning/stitching process.


As far as handwriting recognition software, FOSS cannot compete
with the major commercial software firms.

If you cannot type fast enough to transcribe the material directly
then you may want to consider voice recognition software. Reading
each document aloud can give you a direct text output.
Java Jive
2019-10-06 12:17:45 UTC
Permalink
Post by F Russell
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
You may want to try this DIY modification to make scanning and
https://mpetroff.net/2013/09/scanner-modifications-to-scan-large-documents
He even includes scripts to automate the scanning/stitching process.
Thanks, I haven't read all of that but enough.

The scanner I'm actually using is a fairly old HP C9850A; it's old
enough that it doesn't have 64-bit drivers so it's attached to an XP
32-bit laptop. It has ADF, but almost none of the documents to be
scanned are in a condition that I'd want to use ADF, and anyway when I
could do so, the foolscap sheets were missing the bottom inch or so off
each document. As it happens, some years ago the ADF packed up after a
close encounter of the coffee kind, so I bought another scanner of the
same model for spares or repair and cannibalised parts from it to get
mine fully working again. I did wonder whether to try and adapt the
carriage of the spare to do something similar to your suggestion, but
ruled it out as being too much work.

The big problems with items too big to scan in one go are:

:-( As above, although it's described as a flatbed scanner, it isn't
really because the surrounding bezel is raised above the glass like so
(a fixed font will be needed to display this ASCII art):

Bezel Scanner Glass Bezel
=====\ /=====
\================/

This means the stuff at the edge tends to be stretched and distorted,
making it difficult to marry up with adjoining images.

:-( Automatic exposure can make different sections of the same document
look very different in terms of background noise, greasy fingerprints,
yellowing of documents, etc, so it's necessary to remember after
scanning the first section to set the exposure manually to be the same
for the rest of that document, then to remember to set it back to
automatic again ready for the next. Needless to say, I forget
sometimes, curse, and have to start a document again.

:-( Although nominally the same, the vertical and horizontal
resolutions are actually slightly different, so if one section of a
document is scanned vertically, and another horizontally, one of the
sections will require squashing or stretching slightly to marry up with
the other.

:-( There is no easy way to align the different sections of scan. The
parchments have irregular edges that are stretched out at the corners,
and even the sheets of paper have often been cut by hand, and opposite
edges are not parallel, and are not at right-angles to the other edges,
thus even if the scanner travel was accurately aligned with the inner
edges of the bezel, which it isn't, and even if the edges of the scanner
glass were not masked out 1-5mm into the scan, which they are and each
edge is different (rolls eyes), *still* you wouldn't be able simply to
abut each sheet against an edge of the bezel! Thus I have resorted to
scanning each section several times at draft resolution trying to align
the writing to being perfectly horizontal, and then doing a hi-res scan.
You might think I could plonk it on the scanner sufficiently close,
and then rotate slightly each section in the image editing program,
which is Paint Shop Pro, as required to align them, but if you have to
do that on one side of the document, by the time you've reached the
other side, the errors have multiplied to the point where things can get
very tricky, and by trial and error I've found that it's best to start
with the scanning of each section aligned as well as possible the same
as the rest, and then start stitching from the centre working outwards
to the edges.
Post by F Russell
As far as handwriting recognition software, FOSS cannot compete
with the major commercial software firms.
If you cannot type fast enough to transcribe the material directly
then you may want to consider voice recognition software. Reading
each document aloud can give you a direct text output.
Yes, I think I still have a copy of Dragon.

Which brings me to a trip down memory lane ... My very first computer
was a Commodore Pet, into which I wanted to key in the hex coding for
Supermon from a listing of it in a magazine. I realised that it would
be a terrible chore trying to keep moving between the magazine and the
keyboard, so I read the listing aloud into my tape-recorder at a
measured pace which I hoped allowed enough time to type in the coding
during the next phase, and then did the actual typing as I played it
back. It was certainly much easier doing it like that. I barely made
an error, which certainly wouldn't have been the case typing it in
directly from the magazine.
Carlos E.R.
2019-10-06 13:01:02 UTC
Permalink
...
    :-(    Automatic exposure can make different sections of the same
document look very different in terms of background noise, greasy
fingerprints, yellowing of documents, etc, so it's necessary to remember
after scanning the first section to set the exposure manually to be the
same for the rest of that document, then to remember to set it back to
automatic again ready for the next.  Needless to say, I forget
sometimes, curse, and have to start a document again.
On "xsane" automatic adjustment doesn't trigger automatically, the
settings by default are those of the previous scan.
    :-(    Although nominally the same, the vertical and horizontal
resolutions are actually slightly different, so if one section of a
document is scanned vertically, and another horizontally, one of the
sections will require squashing or stretching slightly to marry up with
the other.
Can't be. There has to be a calibration error or something.
    :-(    There is no easy way to align the different sections of
scan.  The parchments have irregular edges that are stretched out at the
I think a good camera would work better.
--
Cheers, Carlos.
Java Jive
2019-10-06 16:06:17 UTC
Permalink
Post by Carlos E.R.
    :-(    Although nominally the same, the vertical and horizontal
resolutions are actually slightly different, so if one section of a
document is scanned vertically, and another horizontally, one of the
sections will require squashing or stretching slightly to marry up with
the other.
Can't be. There has to be a calibration error or something.
Why not? All that's needed is for the stepper motor to have a step
that's ever so slightly different from the resolution across the scanner
bar.
Carlos E.R.
2019-10-06 20:44:19 UTC
Permalink
Post by Carlos E.R.
     :-(    Although nominally the same, the vertical and horizontal
resolutions are actually slightly different, so if one section of a
document is scanned vertically, and another horizontally, one of the
sections will require squashing or stretching slightly to marry up with
the other.
Can't be. There has to be a calibration error or something.
Why not?  All that's needed is for the stepper motor to have a step
that's ever so slightly different from the resolution across the scanner
bar.
Then bad manufacturing.
--
Cheers, Carlos.
J. P. Gilliver (John)
2019-10-06 23:36:46 UTC
Permalink
Post by Carlos E.R.
Post by Carlos E.R.
     :-(    Although nominally the same, the vertical and horizontal
resolutions are actually slightly different, so if one section of a
document is scanned vertically, and another horizontally, one of the
sections will require squashing or stretching slightly to marry up with
the other.
Can't be. There has to be a calibration error or something.
Why not?  All that's needed is for the stepper motor to have a step
that's ever so slightly different from the resolution across the scanner
bar.
Then bad manufacturing.
Not necessarily; could be wear, or something rubber drying out. Even 1%
will affect a scan noticeably.
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

The first banjo solo I played was actually just a series of mistakes. In fact
it was all the mistakes I knew at the time. - Tim Dowling, RT2015/6/20-26
Paul
2019-10-07 01:00:58 UTC
Permalink
Post by J. P. Gilliver (John)
Post by Carlos E.R.
Post by Java Jive
Post by Carlos E.R.
:-( Although nominally the same, the vertical and horizontal
resolutions are actually slightly different, so if one section of a
document is scanned vertically, and another horizontally, one of the
sections will require squashing or stretching slightly to marry up with
the other.
Can't be. There has to be a calibration error or something.
Why not? All that's needed is for the stepper motor to have a step
that's ever so slightly different from the resolution across the scanner
bar.
Then bad manufacturing.
Not necessarily; could be wear, or something rubber drying out. Even 1%
will affect a scan noticeably.
The transport can use a belt.

Loading Image...

That's probably the weakest component of the lot.

Paul
Java Jive
2019-10-07 10:47:59 UTC
Permalink
Post by Paul
The transport can use a belt.
https://upload.wikimedia.org/wikipedia/commons/e/e3/Scanner.view.750pix.jpg
It does.
Post by Paul
That's probably the weakest component of the lot.
Yes, and the scanner is about 20 years old or thereabouts.
NY
2019-10-07 19:13:53 UTC
Permalink
Post by Paul
The transport can use a belt.
https://upload.wikimedia.org/wikipedia/commons/e/e3/Scanner.view.750pix.jpg
That's probably the weakest component of the lot.
I have a 35 mm film scanner (so slightly OT for this thread) which moves the
single-row-of-pixels sensor along the film by rotating a shaft which drives
a worm gear. The motor connects to the shaft by a plastic sleeve which has
split over the years and sometimes slips, resulting in horizontal and
vertical resolution being different and therefore aspect ratio stretching. I
managed to repair it by wrapping magic tape (not old fashioned sellotape
whose glue starts oozing after a time) round the shaft to increase its
diameter enough for the sleeve to grip.

It is possible for flat-bed scanners to develop the same sort of problem,
though many are not thoughtfully supplied with screws so you can get inside
and repair.
Tin Man
2019-10-06 14:13:36 UTC
Permalink
Post by Java Jive
The scanner I'm actually using is a fairly old HP C9850A;
You are in luck. The GNU/Linux scanning software SANE supports
the HP 5490C (which is the C9850A). The following driver is
https://sourceforge.net/projects/hp5400backend/files/hp5400-backend/beta_1_20030526
This is BETA software but it may do the basic job that you need.
Furthermore, with SANE, the scanning process can be scripted (automated).
You could also check eBay for cheap, used scanners that are supported
by SANE and then hack them according to the article.
http://www.sane-project.org/sane-mfgs.html
Wow!
The Linux driver is still in beta for a scanner that was released
circa 2005.
Go Linux!
William Unruh
2019-10-06 15:04:56 UTC
Permalink
Post by Tin Man
Post by Java Jive
The scanner I'm actually using is a fairly old HP C9850A;
You are in luck. The GNU/Linux scanning software SANE supports
the HP 5490C (which is the C9850A). The following driver is
https://sourceforge.net/projects/hp5400backend/files/hp5400-backend/beta_1_20030526
This is BETA software but it may do the basic job that you need.
Furthermore, with SANE, the scanning process can be scripted (automated).
You could also check eBay for cheap, used scanners that are supported
by SANE and then hack them according to the article.
http://www.sane-project.org/sane-mfgs.html
Wow!
The Linux driver is still in beta for a scanner that was released
circa 2005.
Go Linux!
cdrecord, the cd writer software on linux is still in alpha and always
has been, depite the release of something like a 100 versions. What
people call something does not necessarily relate to what it is.
William Unruh
2019-10-06 15:01:27 UTC
Permalink
Post by Java Jive
Post by F Russell
Post by Java Jive
These are so large that they have had to be scanned in sections,
anything from 9-16 in number, and then the individual images joined
You may want to try this DIY modification to make scanning and
https://mpetroff.net/2013/09/scanner-modifications-to-scan-large-documents
He even includes scripts to automate the scanning/stitching process.
Thanks, I haven't read all of that but enough.
The scanner I'm actually using is a fairly old HP C9850A; it's old
enough that it doesn't have 64-bit drivers so it's attached to an XP
32-bit laptop. It has ADF, but almost none of the documents to be
scanned are in a condition that I'd want to use ADF, and anyway when I
could do so, the foolscap sheets were missing the bottom inch or so off
each document. As it happens, some years ago the ADF packed up after a
close encounter of the coffee kind, so I bought another scanner of the
same model for spares or repair and cannibalised parts from it to get
mine fully working again. I did wonder whether to try and adapt the
carriage of the spare to do something similar to your suggestion, but
ruled it out as being too much work.
It really sounds like you should get a new scanner that has a large
enough flatbed that it CAN scan your documents. Or go to a copying shop
and get them reduced to 8.5x11 or A4 (whatever you use). YOuhave by this
time wasted enough time, that evenif you only value your time at $10/hr,
you could have paid for a new scanner. It is stupid to tryto paint your
house with a toothbrush.

Note that most copiers nowadays will allow you scan documents to a usb
stick so if this is a rare event (scanning those oversize articles) that
would also be a possible approach which would be much cheaper than what
you are trying to do now.
Post by Java Jive
:-( As above, although it's described as a flatbed scanner, it isn't
really because the surrounding bezel is raised above the glass like so
Bezel Scanner Glass Bezel
=====\ /=====
\================/
This means the stuff at the edge tends to be stretched and distorted,
making it difficult to marry up with adjoining images.
:-( Automatic exposure can make different sections of the same document
look very different in terms of background noise, greasy fingerprints,
yellowing of documents, etc, so it's necessary to remember after
scanning the first section to set the exposure manually to be the same
for the rest of that document, then to remember to set it back to
automatic again ready for the next. Needless to say, I forget
sometimes, curse, and have to start a document again.
:-( Although nominally the same, the vertical and horizontal
resolutions are actually slightly different, so if one section of a
document is scanned vertically, and another horizontally, one of the
sections will require squashing or stretching slightly to marry up with
the other.
:-( There is no easy way to align the different sections of scan. The
parchments have irregular edges that are stretched out at the corners,
and even the sheets of paper have often been cut by hand, and opposite
edges are not parallel, and are not at right-angles to the other edges,
thus even if the scanner travel was accurately aligned with the inner
edges of the bezel, which it isn't, and even if the edges of the scanner
glass were not masked out 1-5mm into the scan, which they are and each
edge is different (rolls eyes), *still* you wouldn't be able simply to
abut each sheet against an edge of the bezel! Thus I have resorted to
scanning each section several times at draft resolution trying to align
the writing to being perfectly horizontal, and then doing a hi-res scan.
You might think I could plonk it on the scanner sufficiently close,
and then rotate slightly each section in the image editing program,
which is Paint Shop Pro, as required to align them, but if you have to
do that on one side of the document, by the time you've reached the
other side, the errors have multiplied to the point where things can get
very tricky, and by trial and error I've found that it's best to start
with the scanning of each section aligned as well as possible the same
as the rest, and then start stitching from the centre working outwards
to the edges.
Post by F Russell
As far as handwriting recognition software, FOSS cannot compete
with the major commercial software firms.
If you cannot type fast enough to transcribe the material directly
then you may want to consider voice recognition software. Reading
each document aloud can give you a direct text output.
Yes, I think I still have a copy of Dragon.
Which brings me to a trip down memory lane ... My very first computer
was a Commodore Pet, into which I wanted to key in the hex coding for
Supermon from a listing of it in a magazine. I realised that it would
be a terrible chore trying to keep moving between the magazine and the
keyboard, so I read the listing aloud into my tape-recorder at a
measured pace which I hoped allowed enough time to type in the coding
during the next phase, and then did the actual typing as I played it
back. It was certainly much easier doing it like that. I barely made
an error, which certainly wouldn't have been the case typing it in
directly from the magazine.
Java Jive
2019-10-06 15:23:10 UTC
Permalink
Post by William Unruh
It really sounds like you should get a new scanner that has a large
enough flatbed that it CAN scan your documents. Or go to a copying shop
and get them reduced to 8.5x11 or A4 (whatever you use). YOuhave by this
time wasted enough time, that evenif you only value your time at $10/hr,
you could have paid for a new scanner. It is stupid to tryto paint your
house with a toothbrush.
I value my time at £15/hr when I work for others, but, as I am retired
now except one remaining legacy client, and have both a house and a car
needing repair, I can't afford to spend money on a new scanner for a
one-off job.
Post by William Unruh
Note that most copiers nowadays will allow you scan documents to a usb
stick so if this is a rare event (scanning those oversize articles) that
would also be a possible approach which would be much cheaper than what
you are trying to do now.
That might be a possibility. I could enquire at the local library, but
it's a very small out of the way place, so I'm not hopeful for much help
there. They usually have a something of a bother even getting books I
ask to borrow!
William Unruh
2019-10-06 18:12:40 UTC
Permalink
Post by Java Jive
Post by William Unruh
It really sounds like you should get a new scanner that has a large
enough flatbed that it CAN scan your documents. Or go to a copying shop
and get them reduced to 8.5x11 or A4 (whatever you use). YOuhave by this
time wasted enough time, that evenif you only value your time at $10/hr,
you could have paid for a new scanner. It is stupid to tryto paint your
house with a toothbrush.
I value my time at £15/hr when I work for others, but, as I am retired
now except one remaining legacy client, and have both a house and a car
needing repair, I can't afford to spend money on a new scanner for a
one-off job.
Post by William Unruh
Note that most copiers nowadays will allow you scan documents to a usb
stick so if this is a rare event (scanning those oversize articles) that
would also be a possible approach which would be much cheaper than what
you are trying to do now.
That might be a possibility. I could enquire at the local library, but
it's a very small out of the way place, so I'm not hopeful for much help
there. They usually have a something of a bother even getting books I
ask to borrow!
If you are in any size of town/city, there should be commercial copy
shops which will usually have much higher end copiers (larger sheets,
etc). They will charge you of course, but a lot less than the 15 pounds
times the number of hours you have used I suspect. Of course if you are
wayout in the countryside with the nearest village having a population
of 20, then self reliance is a necessity not an option.
Java Jive
2019-10-06 23:03:29 UTC
Permalink
Post by William Unruh
If you are in any size of town/city,
I'm not, Scottish Highlands.
Java Jive
2019-10-06 16:36:16 UTC
Permalink
Post by Java Jive
The scanner I'm actually using is a fairly old HP C9850A;
You are in luck. The GNU/Linux scanning software SANE supports
the HP 5490C (which is the C9850A). The following driver is
https://sourceforge.net/projects/hp5400backend/files/hp5400-backend/beta_1_20030526
This is BETA software but it may do the basic job that you need.
Furthermore, with SANE, the scanning process can be scripted (automated).
You could also check eBay for cheap, used scanners that are supported
by SANE and then hack them according to the article.
http://www.sane-project.org/sane-mfgs.html
Thanks, I'll bookmark that.
Post by Java Jive
Bezel Scanner Glass Bezel
=====\ /=====
\================/
This means the stuff at the edge tends to be stretched and distorted,
making it difficult to marry up with adjoining images.
This is very easily fixed. The image is simply cropped to remove the
edges and keep the flat central portion. The overlaps would need to
be correspondingly greater but that's no problem.
Yes, that's exactly what I do.
scanimage [parameters] | pamcut -cropright X -cropleft Y -croptop Z -cropbottom W | pamtotiff > image.tif
The output from SANE scanimage is a PNM file which is then piped to
pamcut, from the netpbm package, which removes the edges and outputs
and converts to TIFF format for further processing.
Post by Java Jive
Automatic exposure can make different sections of the same document
look very different
This can also be corrected, although perhaps not as easily.
The "exposure" can be defined exactly through GAMMA curves in SANE.
An optimal setting may have to be determined through some initial
test scans.
Also, automated post-processing can be extremely effective. Here is
http://www.fmwconcepts.com/imagemagick/textcleaner/index.php
There are many other possibilities for post-processing but I would
need to see an actual example of the material to make recommendations.
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
Post by Java Jive
Although nominally the same, the vertical and horizontal
resolutions are actually slightly different,
I don't understand. A scanner will scan in strips (lines) and
the resolution (dpi) is the same in spite of direction.
I presume that the vertical distance it steps down the document is ever
so slightly different from the horizontal resolution of the sensors
across the scanning bar.
Post by Java Jive
There is no easy way to align the different sections of scan.
You need to establish a reference for each document before
scanning. It doesn't matter how it's done as long as it
is consistent across the entire document.
Since you are viewing the backs during scanning, perhaps you can
draw lines or other marks on the back of each document. These
marks don't need to be permanent and could be done with a special
tape that is only loosely bound.
A carpenters square or other alignment tools, can also help
if you are marking the documents.
I have a carpenters set square, but:

:-( There is no reliable straight edge to measure from, because the
carriage mechanism is not quite precisely aligned with the inner edges
of the scanner glass aperture in the bezel - it strays a few pixels
across by the time it reaches the bottom of the document, and the outer
edges of the scanner are <sarcasm> streamlined for that impression of
swish speed so you can scan at GTX speeds </sarcasm> - of course it's
bollocks, a typical triumph of design over functionality that is the
curse of modern life, in this case the curved edge of the scanner makes
it well nigh impossible to measure from anything reliably.

:-( The scanner glass is not illuminated until you start the scan, so
you can't see the lines of text through the back, and therefore the only
practical way to align them is to do trial scans until you get it right.
https://github.com/Flameeyes/unpaper
But deskewing can be very tedious when applied to type of
scanning that you are doing.
Yes, that rather sounds like doing rotations of fractions of a degree
within Paint Shop Pro, it can work, but it can also be very
time-consuming. In my experience it's easier to get the scan aligned as
well as possible in the first place.
Finally, good software is essential and GNU/Linux has a great
deal of professional image processing tools.
For stitching, the panotools is the basis for everything
http://panotools.sourceforge.net
If you can establish a reliable scanning protocol, much or
even most of the process can be fully automated with command-line
tools, and this could save much time and labor.
To me, this is a very challenging project. Certainly there is
professional equipment available at very high cost which would
be used by institutions. But the challenge is to do a professional
job using the inexpensive tools available with GNU/Linux.
Yes, challenging is not a description I would argue with! Thanks for
your suggestions.
F Russell
2019-10-06 17:48:12 UTC
Permalink
Post by Java Jive
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
The scan is not optimal. The histogram is heavily shifted to the
"white" end.

But I managed to improve things considerably through automated
post-processing. Manual tweaking in an image editor would give
the best results but, in your project, automation through scripting
is probably desired.

Your scan was first histogram normalized with pnmnorm (from the
netpbm package) after converting the PNG to PNM:

pamtopng Will.png | pnmnorm > will-normalized.pnm

Then this normalized image was passed, without parameters, through
the textcleaner script:

textcleaner will-normalized.pnm will-cleaned.png

The resulting image is posted here:

http://s000.tinyupload.com/?file_id=13280754916408973062

(the result also contains the same gamma chunk of 0.4545)

Both pnmnorm and textcleaner can be further tweaked using
extra parameters and this may give even better results.

But a bit of automated post-processing can give considerable
improvement, and the stitching process can also be automated.
Java Jive
2019-10-06 23:02:25 UTC
Permalink
Post by F Russell
Post by Java Jive
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
The scan is not optimal. The histogram is heavily shifted to the
"white" end.
But I managed to improve things considerably through automated
post-processing. Manual tweaking in an image editor would give
the best results but, in your project, automation through scripting
is probably desired.
Your scan was first histogram normalized with pnmnorm (from the
pamtopng Will.png | pnmnorm > will-normalized.pnm
Then this normalized image was passed, without parameters, through
textcleaner will-normalized.pnm will-cleaned.png
http://s000.tinyupload.com/?file_id=13280754916408973062
Yes, that's certainly better, thank you. I think what I may do is
finish the scanning first, and then start investigating things like this.
Java Jive
2020-02-14 22:54:48 UTC
Permalink
Post by F Russell
Post by Java Jive
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
The scan is not optimal. The histogram is heavily shifted to the
"white" end.
But I managed to improve things considerably through automated
post-processing. Manual tweaking in an image editor would give
the best results but, in your project, automation through scripting
is probably desired.
Your scan was first histogram normalized with pnmnorm (from the
pamtopng Will.png | pnmnorm > will-normalized.pnm
Then this normalized image was passed, without parameters, through
textcleaner will-normalized.pnm will-cleaned.png
http://s000.tinyupload.com/?file_id=13280754916408973062
(the result also contains the same gamma chunk of 0.4545)
Both pnmnorm and textcleaner can be further tweaked using
extra parameters and this may give even better results.
But a bit of automated post-processing can give considerable
improvement, and the stitching process can also be automated.
Thanks for this. Now that I've finished the actual scanning of the
entire trunk's contents - 4,081 scans, but many made up of multiple
scans, so probably about 6-7,000 scans in all - I've been playing
around with the above commands, and certainly some of the scans are
improved. So I've decided to make four copies of each, original,
normed, textcleaned, normed and textcleaned, and for each item or groups
of items I'll pick out the best and use that.

NY
2019-10-07 19:21:29 UTC
Permalink
Post by Java Jive
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
That is a wonderful document to find, and you've managed to get a legible
scan from ink that looks as if it varies in darkness (I won't say blackness,
because I imagine it's a sort of muddy brown!). And the handwriting is not
as bad as I was expecting for 1810.

Genealogists of the future will have it very easy, since all modern
documents are typed or computer printed, so bad handwriting isn't an issue.
Mind you, modern laser-printer paper doesn't last as long as vellum or old
paper before it starts to crack and go yellow: I've found documents that I
printed maybe 30 years ago which are starting to suffer "paper rot".
Java Jive
2019-10-08 11:54:18 UTC
Permalink
Post by NY
Post by Java Jive
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
That is a wonderful document to find, and you've managed to get a
legible scan from ink that looks as if it varies in darkness (I won't
say blackness, because I imagine it's a sort of muddy brown!). And the
handwriting is not as bad as I was expecting for 1810.
What you're probably forgetting is that in those days they didn't have
fountain pens - apparently the first patent for a fountain pen dates
from 1827. Think back to writers such as Jane Austen and the
dramatisations of them you've seen on TV, where quill pens are used.
With this type of pen, it is inevitable that the density of the ink will
be patchy - darker where the quill has just been dipped in the ink,
becoming fainter over the following words until the writer deems it time
to dip it again.
Java Jive
2019-10-08 12:05:40 UTC
Permalink
Post by Java Jive
Post by NY
Post by Java Jive
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
That is a wonderful document to find, and you've managed to get a
legible scan from ink that looks as if it varies in darkness (I won't
say blackness, because I imagine it's a sort of muddy brown!). And the
handwriting is not as bad as I was expecting for 1810.
What you're probably forgetting is that in those days they didn't have
fountain pens  -  apparently the first patent for a fountain pen dates
from 1827.
Actually, looking further, this Wikipedia articles suggests that they go
back *much* further than that, including similar inventions from the
ancient Muslim world, a Leonardo da Vinci device, and English patents
around the time of C P Macfarlane's Will! However ...

"First patents

Progress in developing a reliable pen was slow until the mid-19th
century because of an imperfect understanding of the role that air
pressure plays in the operation of pens. Furthermore, most inks were
highly corrosive and full of sedimentary inclusions."

https://en.wikipedia.org/wiki/Fountain_pen

... hence ...
Post by Java Jive
Think back to writers such as Jane Austen and the
dramatisations of them you've seen on TV, where quill pens are used.
With this type of pen, it is inevitable that the density of the ink will
be patchy  -  darker where the quill has just been dipped in the ink,
becoming fainter over the following words until the writer deems it time
to dip it again.
NY
2019-10-08 19:06:46 UTC
Permalink
Post by Java Jive
Post by NY
Post by Java Jive
www.macfh.co.uk/Temp/Will_Of_C_P_MacFarlane_-_Double_Probate.png
That is a wonderful document to find, and you've managed to get a legible
scan from ink that looks as if it varies in darkness (I won't say
blackness, because I imagine it's a sort of muddy brown!). And the
handwriting is not as bad as I was expecting for 1810.
What you're probably forgetting is that in those days they didn't have
fountain pens - apparently the first patent for a fountain pen dates
from 1827. Think back to writers such as Jane Austen and the
dramatisations of them you've seen on TV, where quill pens are used. With
this type of pen, it is inevitable that the density of the ink will be
patchy - darker where the quill has just been dipped in the ink,
becoming fainter over the following words until the writer deems it time
to dip it again.
I hadn't realised that goosefeather quill pens were the only way to write in
ink until as late as the 1820s. Was it the steel nib and small reservoir
that was patented in 1827 (still requiring the pen to be dipped into an
inkwell every so often) or was it the larger "tank" of ink (often a rubber
bladder compressed by a lever action to make it suck up ink) that was
patented?

I've never used a quill or a steel-nibbed pen that needed dipping every few
words, but I have "fond" memories of carrying around a bottle of Quink
(blue-black, royal blue or black, as the mood took me!) to fill up my
fountain pen at school - biros were banned ;-) I can't remember whether
Pentel and other water-based-ink ballpoint pens were allowed: they came out
a few years after I started at the school where everyone used fountain pens.
NY
2019-10-07 19:07:29 UTC
Permalink
:-( As above, although it's described as a flatbed scanner, it isn't
really because the surrounding bezel is raised above the glass like so (a
Bezel Scanner Glass Bezel
=====\ /=====
\================/
This means the stuff at the edge tends to be stretched and distorted,
making it difficult to marry up with adjoining images.
The best way to get round that is to arrange a fair amount of overlap
between tiles, and crop off the edges of each scan where the document is not
in contact with the glass, such that what remains still has a bit of
overlap..
pyotr filipivich
2019-10-06 15:05:05 UTC
Permalink
Post by Java Jive
As I'm currently nearing the end of the scanning phase, next will be
sorting the documents into boxes and the files into sub-directories -
of course they are already, because I was trying to make sense of it all
as I worked through it, but only after I know entirely what is there
will I be able to finalise how best it should be divided up into
meaningful chunks of information. After that will come the probably
much longer phase of converting as much of it as possible into
searchable text. It's this last stage I really would like to shorten as
much as possible, because I'd like to have a chance of finishing it
before I die !-)
So, does anyone here have any experience of software to recognise
hand-writing in an image file (*.png) and produce a text file from it?
I had the opportunity to examine a facsimile of a 15th century
document. "Yep, those are letters." says I, "but that's some kind of
ligature, and that's an abrev, and is that a 'jot', 'title', or
error?" And then there is me, trying to read what I wrote last year,
ten years, forty years ago. Actually, that's "easy", it is my hand,
and the subject is (mostly) known. What you're trying to do is
recognize someone else's handwriting, in a "foreign" language (It
might nominally be English, but the then contemporary understanding
and abbreviations may be quite different.), from pictures.

Training a neural network to read various handwritings is
difficult. It is even harder when the neural network is not a human
being, but a computer.

And do not, repeat, do not lose those originals. They are more
likely to survive than computer images. At least the retrieval
technology is not likely to go obsolete in a decade or two.


tschus
pyotr
--
pyotr filipivich
The two oldest cliches are "The Old Days were better."
and "After all, these are Modern Times."
Mayayana
2019-10-06 17:15:14 UTC
Permalink
"Java Jive" <***@evij.com.invalid> wrote

| So, does anyone here have any experience of software to recognise
| hand-writing in an image file (*.png) and produce a text file from it?
|

I got curious. My sense is that Ray Kurzweil makes the best
stuff, though it's not free and he may have downloaded himself
to a chip by now. (That was his plan -- to take mega-vitamins
in order to live long enough to download himself to a chip.... I
guess everyone needs a hobby.)

Another thing well regarded is Google's Tesseract. I found
FreeOCR. Simple. Small. No hiccups. Uses Tesseract v. 3.
But some reasonably readable handwriting came through
as complete nonsense. There's now a v. 4 but so far I
haven't found a free program using Tesseract v. 4 that
will install properly for me. That might be worth a try, but
I'd be very surprised if it works. Even non-typical fonts
can be a problem, so it's hard to imagine how they'd get
a method to recognize handwriting dependably.
Ken Hart
2019-10-07 00:28:59 UTC
Permalink
Post by Mayayana
| So, does anyone here have any experience of software to recognise
| hand-writing in an image file (*.png) and produce a text file from it?
|
I got curious. My sense is that Ray Kurzweil makes the best
stuff, though it's not free and he may have downloaded himself
to a chip by now. (That was his plan -- to take mega-vitamins
in order to live long enough to download himself to a chip.... I
guess everyone needs a hobby.)
Another thing well regarded is Google's Tesseract. I found
FreeOCR. Simple. Small. No hiccups. Uses Tesseract v. 3.
But some reasonably readable handwriting came through
as complete nonsense. There's now a v. 4 but so far I
haven't found a free program using Tesseract v. 4 that
will install properly for me. That might be worth a try, but
I'd be very surprised if it works. Even non-typical fonts
can be a problem, so it's hard to imagine how they'd get
a method to recognize handwriting dependably.
I installed gImageReader and tesseract-ocr 3.04, just for fun, since
Mr "Jive" posted a sample (the last will of Charles P McFarlane), and Mr
Russell posted a cleaned image of the same.

(I assume that the tesseract files would be the same for any other OCR
frontend, so it would be a waste of time to try other OCR's that use
tesseract V3.)

The original image resulted in gibberish, and not that much of it,
considering the size of the document.

The cleaned image resulted in about three times as much gibberish, with
a few- very few- readable phrases. For example, near the end was this
example of the lawyer's art:
" j : :— fitted, the last Will and/Lesmnsifii Vzafifiaz‘jf (fag:
Pg/grc‘axfzfflive Cgmm’ '[awfuny comm."

I noticed that there was a tesseract language file for Middle English
1100-1500. Since Gutenberg's movable type was late 1400's, I thought
this might be good. No such luck- absolutely no intelligible text.

Then I noticed there was a tesseract-script. I combined that with both
Middle English and English. It churned away for about 20 minutes and
came up with what appeared to be the proper amount of gibberish.


In my opinion, the suggestion to check with organizations that deal with
the blind will be the best bet.

Good luck!
--
Ken Hart
***@frontier.com
Paul
2019-10-07 01:46:20 UTC
Permalink
Post by Mayayana
| So, does anyone here have any experience of software to recognise
| hand-writing in an image file (*.png) and produce a text file from it?
|
I got curious. My sense is that Ray Kurzweil makes the best
stuff, though it's not free and he may have downloaded himself
to a chip by now. (That was his plan -- to take mega-vitamins
in order to live long enough to download himself to a chip.... I
guess everyone needs a hobby.)
Another thing well regarded is Google's Tesseract. I found
FreeOCR. Simple. Small. No hiccups. Uses Tesseract v. 3.
But some reasonably readable handwriting came through
as complete nonsense. There's now a v. 4 but so far I
haven't found a free program using Tesseract v. 4 that
will install properly for me. That might be worth a try, but
I'd be very surprised if it works. Even non-typical fonts
can be a problem, so it's hard to imagine how they'd get
a method to recognize handwriting dependably.
I installed gImageReader and tesseract-ocr 3.04, just for fun, since Mr
"Jive" posted a sample (the last will of Charles P McFarlane), and Mr
Russell posted a cleaned image of the same.
(I assume that the tesseract files would be the same for any other OCR
frontend, so it would be a waste of time to try other OCR's that use
tesseract V3.)
The original image resulted in gibberish, and not that much of it,
considering the size of the document.
The cleaned image resulted in about three times as much gibberish, with
a few- very few- readable phrases. For example, near the end was this
Pg/grc‘axfzfflive Cgmm’ '[awfuny comm."
I noticed that there was a tesseract language file for Middle English
1100-1500. Since Gutenberg's movable type was late 1400's, I thought
this might be good. No such luck- absolutely no intelligible text.
Then I noticed there was a tesseract-script. I combined that with both
Middle English and English. It churned away for about 20 minutes and
came up with what appeared to be the proper amount of gibberish.
In my opinion, the suggestion to check with organizations that deal with
the blind will be the best bet.
Good luck!
Here's an article that claims there was some success in
this area, in 2011. Using multiple convolutional neural networks.

https://towardsdatascience.com/https-medium-com-rachelwiles-have-we-solved-the-problem-of-handwriting-recognition-712e279f373b

But what is being done here, is different than the bulk of the
description in the previous link. It's not even handwriting
in the example, just OCR.

https://cloud.google.com/blog/products/ai-machine-learning/how-the-new-york-times-is-using-google-cloud-to-find-untold-stories-in-millions-of-archived-photos

What the academics have worked on, is this problem.
Individually scribbled characters, not cursive writing.
They've beaten the piss out of this problem (2011 and later),
and it's like a "neural network convention" on this topic.

Loading Image...

Rather than solving this problem. I'm not seeing
any claims they can handle this. Cursive handwriting.
(And for the record, this looks like *machine generated* cursive!)

Loading Image...

Paul
Java Jive
2019-10-07 11:22:32 UTC
Permalink
Last night I spent mostly housekeeping the results so far, but also ...

I rediscovered why I'm working the scanner from Windows XP - when I
tried from Linux, menu option in recent edition of Ubuntu with XCFE, it
didn't know what size the scanner-glass was, auto left it too small,
while the manual settings offered weren't the right size either.

I also installed Enblend (that's it's actual name, not Emblend), but
thereafter couldn't find a menu option for it. As it was getting late,
I didn't pursue this further.

I discovered how to stitch non-overlapping images in Irfanview, but that
just seemed to glue them together at the edges, I've yet to try images
with overlap to see if they're stitched appropriately.

Also ...
Post by Paul
Post by Ken Hart
Post by Mayayana
| So, does anyone here have any experience of software to recognise
| hand-writing in an image file (*.png) and produce a text file from it?
[...]
Another thing well regarded is Google's Tesseract. I found
FreeOCR. Simple. Small. No hiccups. Uses Tesseract v. 3.
But some reasonably readable handwriting came through
as complete nonsense. There's now a v. 4 but so far I
haven't found a free program using Tesseract v. 4 that
will install properly for me. That might be worth a try, but
I'd be very surprised if it works. Even non-typical fonts
can be a problem, so it's hard to imagine how they'd get
a method to recognize handwriting dependably.
I installed gImageReader and tesseract-ocr 3.04, just for fun, since
Mr "Jive" posted a sample (the last will of Charles P McFarlane), and
Mr Russell posted a cleaned image of the same.
[...] and
came up with what appeared to be the proper amount of gibberish.
In my opinion, the suggestion to check with organizations that deal
with the blind will be the best bet.
Thanks for doing this, it saved me the disappointment! It is beginning
to seem that speaking the documents out loud into Dragon might be my
best bet.
Post by Paul
Here's an article that claims there was some success in
this area, in 2011. Using multiple convolutional neural networks.
https://towardsdatascience.com/https-medium-com-rachelwiles-have-we-solved-the-problem-of-handwriting-recognition-712e279f373b
But what is being done here, is different than the bulk of the
description in the previous link. It's not even handwriting
in the example, just OCR.
https://cloud.google.com/blog/products/ai-machine-learning/how-the-new-york-times-is-using-google-cloud-to-find-untold-stories-in-millions-of-archived-photos
What the academics have worked on, is this problem.
Individually scribbled characters, not cursive writing.
They've beaten the piss out of this problem (2011 and later),
and it's like a "neural network convention" on this topic.
https://miro.medium.com/max/1750/1*cf7luBAWvHApT-ohFZta5Q.jpeg
Rather than solving this problem. I'm not seeing
any claims they can handle this. Cursive handwriting.
(And for the record, this looks like *machine generated* cursive!)
http://static.nautil.us/10346_c8661fbb8d748c08800779b570047110.png
Interesting links, for which thanks, though, as you suggest, perhaps not
very helpful to my central problem! I'm wondering though, whether it
might be worth contacting someone like the academic Rachel Wiles who
wrote the first article above to see if she can offer any advice.
J. P. Gilliver (John)
2019-10-07 11:39:59 UTC
Permalink
Post by Java Jive
Last night I spent mostly housekeeping the results so far, but also ...
[]
Post by Java Jive
I discovered how to stitch non-overlapping images in Irfanview, but
that just seemed to glue them together at the edges, I've yet to try
images with overlap to see if they're stitched appropriately.
Ah, right. It may not do other than abut; I hadn't tried, just
remembered noticing some mention of an ability at some point. You really
need something automatic, that will handle overlaps, and rotations, and
slight scale variation. (Unless you find something that really does grid
assembly, I suspect you might need to create images from rows of scans,
and then put _those_ together, too, rather than just adding individual
scans.)

Out of curiosity, did you look at the video of the scanner mouse in use
skip to 1:45)? That's how
automatic stitching _should_ work! (Not that I'm suggesting you buy one
of those [though they're cheap enough and rather fun], just that the
automatic software is so good it's seamless, you don't realise that it's
_doing_ autostitching.)
[]
Post by Java Jive
Post by Ken Hart
In my opinion, the suggestion to check with organizations that deal
with the blind will be the best bet.
Still worth considering. Though I suspect there won't be much in your
remote location )-:.
[]
Post by Java Jive
Interesting links, for which thanks, though, as you suggest, perhaps
not very helpful to my central problem! I'm wondering though, whether
it might be worth contacting someone like the academic Rachel Wiles who
wrote the first article above to see if she can offer any advice.
Can't hurt!
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

"The people here are more educated and intelligent. Even stupid people in
Britain are smarter than Americans." Madonna, in RT 30 June-6July 2001 (page
32)
Paul
2019-10-07 13:03:23 UTC
Permalink
Post by J. P. Gilliver (John)
Post by Java Jive
Last night I spent mostly housekeeping the results so far, but also ...
[]
Post by Java Jive
I discovered how to stitch non-overlapping images in Irfanview, but
that just seemed to glue them together at the edges, I've yet to try
images with overlap to see if they're stitched appropriately.
Ah, right. It may not do other than abut; I hadn't tried, just
remembered noticing some mention of an ability at some point. You really
need something automatic, that will handle overlaps, and rotations, and
slight scale variation. (Unless you find something that really does grid
assembly, I suspect you might need to create images from rows of scans,
and then put _those_ together, too, rather than just adding individual
scans.)
Out of curiosity, did you look at the video of the scanner mouse in use
http://youtu.be/M9Oj02ZVE_Y skip to 1:45)? That's how
automatic stitching _should_ work! (Not that I'm suggesting you buy one
of those [though they're cheap enough and rather fun], just that the
automatic software is so good it's seamless, you don't realise that it's
_doing_ autostitching.)
[]
Post by Java Jive
Post by Ken Hart
In my opinion, the suggestion to check with organizations that deal
with the blind will be the best bet.
Still worth considering. Though I suspect there won't be much in your
remote location )-:.
[]
Post by Java Jive
Interesting links, for which thanks, though, as you suggest, perhaps
not very helpful to my central problem! I'm wondering though, whether
it might be worth contacting someone like the academic Rachel Wiles
who wrote the first article above to see if she can offer any advice.
Can't hurt!
They call it handwriting, but it's still script, and
very clean script as input, at that.

https://pdf.iskysoft.com/ocr-pdf/handwriting-ocr.html

*******

And this picture was titled

"OCR methods fail with cursive handwriting"

Loading Image...

*******

And each time when they make claims, like in the "Description" tab...

http://www.recogniform.net/eng/chr-cursive-handwritten-recognition.html

the "Implementation" tab claims seem much more "luke-warm".

http://www.recogniform.net/eng/recogniform-desktop-reader.html

CHR recognition engine (cursive handwritten text). Speed: UNLIMITED

Yet there is no picture of cursive writing, suggesting
it actually recognizes such.

I suppose if there was a trial version, you wouldn't care
about the ever-so-deceptive marketing materials.

Paul
Java Jive
2019-10-07 15:40:54 UTC
Permalink
Post by Paul
Post by J. P. Gilliver (John)
Post by Java Jive
I discovered how to stitch non-overlapping images in Irfanview, but
that just seemed to glue them together at the edges, I've yet to try
images with overlap to see if they're stitched appropriately.
Ah, right. It may not do other than abut; I hadn't tried, just
remembered noticing some mention of an ability at some point. You
really need something automatic, that will handle overlaps, and
rotations, and slight scale variation. (Unless you find something that
really does grid assembly, I suspect you might need to create images
from rows of scans, and then put _those_ together, too, rather than
just adding individual scans.)
Out of curiosity, did you look at the video of the scanner mouse in
use http://youtu.be/M9Oj02ZVE_Y skip to 1:45)?
That's how automatic stitching _should_ work! (Not that I'm suggesting
you buy one of those [though they're cheap enough and rather fun],
just that the automatic software is so good it's seamless, you don't
realise that it's _doing_ autostitching.)
No, I haven't looked at that yet.
Post by Paul
Post by J. P. Gilliver (John)
Post by Java Jive
Post by Ken Hart
In my opinion, the suggestion to check with organizations that deal
with the blind will be the best bet.
Still worth considering. Though I suspect there won't be much in your
remote location )-:.
[]
Post by Java Jive
Interesting links, for which thanks, though, as you suggest, perhaps
not very helpful to my central problem!  I'm wondering though,
whether it might be worth contacting someone like the academic Rachel
Wiles who wrote the first article above to see if she can offer any
advice.
Can't hurt!
They call it handwriting, but it's still script, and
very clean script as input, at that.
https://pdf.iskysoft.com/ocr-pdf/handwriting-ocr.html
See below ...
Post by Paul
And this picture was titled
"OCR methods fail with cursive handwriting"
https://miro.medium.com/max/1688/1*iJdmiFLdhKWDxpkE_NtUfw.jpeg
And each time when they make claims, like in the "Description" tab...
http://www.recogniform.net/eng/chr-cursive-handwritten-recognition.html
the "Implementation" tab claims seem much more "luke-warm".
http://www.recogniform.net/eng/recogniform-desktop-reader.html
   CHR recognition engine (cursive handwritten text). Speed: UNLIMITED
Yet there is no picture of cursive writing, suggesting
it actually recognizes such.
I suppose if there was a trial version, you wouldn't care
about the ever-so-deceptive marketing materials.
Yes, claims are one actuality another ...

I did some trials today, working from the following pages which link to
each other, and cover much the same ground as Paul's link above:

https://www.makeuseof.com/tag/convert-handwriting-text-ocr/
How to Convert an Image With Handwriting to Text Using OCR

Microsoft One Note
Recommended 2016 Desktop version now seems unavailable and I'm not
opening a cloud account just to try something out.

Google Drive, etc
I'm not opening a cloud account just to try something out.

SimpleOCR
Doesn't recognise Portable Network Graphics (*.png) files, and even with
a bitmap produced rubbish. Uninstalled.

OnlineOCR
https://www.onlineocr.net/
I've been using this free online service for many years on ad hoc
occasions - examples are a book I used to own which I found serialised
in an Australian newspaper, text from LP covers, etc. I think it's
quite good for printed text, and it also gets a good review below, but
it too produced rubbish for hand-writing.

FreeOCR
I *think* this is the download version of the goodish OnlineOCR above.
I've installed it, and tried it out, but it too produces rubbish for
hand-writing. However, because I think it's the download version of the
above goodish online service, I've decided not to uninstall it for now.

TopOCR
Crashed when I fed it F Russell's cleaned up version of the will.
Uninstalled.

https://www.makeuseof.com/tag/4-free-online-ocr-tools-put-ultimate-test/
4 Free Online OCR Tools Put to the Ultimate Test

"The Final Outcome

If, like most people, you’re just looking to scan a few magazine
articles, and some household bills, you won’t need to edit these
documents. Therefore, converting direct to a PDF will be suitable for
you, because you’ll still be able to search those documents. For this,
Free Online OCR was definitely the best free tool we tested. That being
said, if you’re willing to pay $5 per month for near-perfection, ABBYY’s
FineReader Online was slightly more accurate.

When it comes to converting documents to DOC, we didn’t manage to find
any solution that was perfect, but by far the best results came from
Online OCR. The conversion wasn’t perfect, but the integrity of the
formatting was largely kept intact, and mistakes were negligible. When
we compare these results to the “premium” offering from ABBYY, you can’t
help but be massively impressed."

https://www.makeuseof.com/tag/free-vs-paid-ocr/
Free vs. Paid OCR Software: Microsoft OneNote and Nuance OmniPage Compared

Conclusion:
"Would You Pay for an OCR Tool After This?

With an unbelievably close score of 13 to 14, OmniPage just barely beat
out OneNote. OmniPage was able to recognize more characters than OneNote
but, at the end of the day, both were equally useful (or useless). The
handwriting, printed writing, and downloaded JPG tests stumped both
programs, but they each did well with the PDF to text and smartphone
image to text recognition.

But is it worth it to invest in a paid OCR tool? In my opinion, no. If
OneNote can succeed and fail in the same areas where OmniPage can, why
spend the $60?"
Java Jive
2019-10-07 17:19:48 UTC
Permalink
Post by Java Jive
OnlineOCR
https://www.onlineocr.net/
I've been using this free online service for many years on ad hoc
occasions  -  examples are a book I used to own which I found serialised
in an Australian newspaper, text from LP covers, etc.  I think it's
quite good for printed text, and it also gets a good review below, but
it too produced rubbish for hand-writing.
FreeOCR
I *think* this is the download version of the goodish OnlineOCR above.
I've installed it, and tried it out, but it too produces rubbish for
hand-writing.  However, because I think it's the download version of the
above goodish online service, I've decided not to uninstall it for now.
Incorrect, the online version of FreeOCR is supposedly ...

free-ocr.com

... but that redirects to ...

https://www.sodapdf.com/ocr-pdf/

... which errored when I fed it the cleaned up version of the will, and
if on that page you click the link for the desktop version, you end up
here ...

https://www.sodapdf.com/buy/freeonlinetools/dw-success/?mkey1=ocr-pdf

So that's obviously not the online version of FreeOCR either!
Java Jive
2019-10-07 17:50:00 UTC
Permalink
Post by Mayayana
FreeOCR
I *think* this is the download version of the goodish OnlineOCR above.
I've installed it, and tried it out, but it too produces rubbish for
hand-writing.  However, because I think it's the download version of the
above goodish online service, I've decided not to uninstall it for now.
It produced rubbish for a newscutting also, whereas OnlineOCR made sense
of it, so now uninstalled.
Soviet_Mario
2019-10-07 18:34:32 UTC
Permalink
Post by Java Jive
Post by Mayayana
FreeOCR
I *think* this is the download version of the goodish
OnlineOCR above. I've installed it, and tried it out, but
it too produces rubbish for hand-writing.  However,
because I think it's the download version of the above
goodish online service, I've decided not to uninstall it
for now.
It produced rubbish for a newscutting also, whereas
OnlineOCR made sense of it, so now uninstalled.
They happen to offer some warranty of privacy of your text ?
(I mean, some warranty you could actually verify is such) ?
--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)
Java Jive
2019-10-08 11:05:12 UTC
Permalink
Post by Java Jive
It produced rubbish for a newscutting also, whereas OnlineOCR made
sense of it, so now uninstalled.
They happen to offer some warranty of privacy of your text ? (I mean,
some warranty you could actually verify is such) ?
https://www.onlineocr.net/service/faq

FWIW, their FAQs say:

"What happens with uploaded file?

All documents recognized under the "Guest" account are deleted
automatically after ending process. For registered users source
documents and converted files are stored into user`s document list one
month"

For the sort of items that I've used it for, that's good enough for me.
Java Jive
2019-10-07 16:27:29 UTC
Permalink
Post by J. P. Gilliver (John)
Out of curiosity, did you look at the video of the scanner mouse in use
http://youtu.be/M9Oj02ZVE_Y skip to 1:45)? That's how
automatic stitching _should_ work! (Not that I'm suggesting you buy one
of those [though they're cheap enough and rather fun], just that the
automatic software is so good it's seamless, you don't realise that it's
_doing_ autostitching.)
I've watched it now. I agree it looks quite impressive, but I'm
wondering which model he was using, as I've found apparently similar
models at wildly different prices:

LG LSM-150 Scanner Mouse £35.98
https://www.amazon.co.uk/LG-Electronics-LSM-150-Scanner-Mouse/dp/B00B83CR5Y/ref=sr_1_2

LG LSM-100 5 Button Scanner Mouse £75.00
https://www.amazon.co.uk/LG-LSM-100-Button-Scanner-Scroll/dp/B0053T0HNI/ref=sr_1_9

However, I don't think this would cope well with damaged documents which
are badly creased, dog-eared, letters where the seal area is, etc.
J. P. Gilliver (John)
2019-10-08 03:43:20 UTC
Permalink
Post by Java Jive
Post by J. P. Gilliver (John)
Out of curiosity, did you look at the video of the scanner mouse in
use http://youtu.be/M9Oj02ZVE_Y skip to 1:45)?
That's how automatic stitching _should_ work! (Not that I'm
suggesting you buy one of those [though they're cheap enough and
rather fun], just that the automatic software is so good it's
seamless, you don't realise that it's _doing_ autostitching.)
I've watched it now. I agree it looks quite impressive, but I'm
wondering which model he was using, as I've found apparently similar
LG LSM-150 Scanner Mouse £35.98
https://www.amazon.co.uk/LG-Electronics-LSM-150-Scanner-Mouse/dp/B00B83C
R5Y/ref=sr_1_2
Ouch!
https://www.ebay.co.uk/itm/LG-LSM-150-Smart-Mouse-Scanner-BNIB-SDcard-8GB
/333351004034 9.99 if you're in UK. I think that's the one in use in the
DABS review, as it's that listing I got the YouTube link from.
Post by Java Jive
LG LSM-100 5 Button Scanner Mouse £75.00
https://www.amazon.co.uk/LG-LSM-100-Button-Scanner-Scroll/dp/B0053T0HNI/
ref=sr_1_9
Double ouch!
https://www.ebay.co.uk/sch/i.html?_odkw=LSM-150&_osacat=0&_from=R40&_trks
id=m570.l1313&_nkw=LSM-100&_sacat=0 has several, though they _do_ appear
more expensive.
Post by Java Jive
However, I don't think this would cope well with damaged documents
which are badly creased, dog-eared, letters where the seal area is, etc.
No, you'd probably need at least a clear sheet of plastic.

I just felt that the driver software for them showed what automatic
stitching _can_ do. Pity it isn't (AFAIK) sold in a form that could use
pre-existing images.
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

A waist is a terrible thing to mind.
Carlos E.R.
2019-10-07 11:43:29 UTC
Permalink
Post by Java Jive
Last night I spent mostly housekeeping the results so far, but also ...
I rediscovered why I'm working the scanner from Windows XP  -  when I
tried from Linux, menu option in recent edition of Ubuntu with XCFE, it
didn't know what size the scanner-glass was, auto left it too small,
while the manual settings offered weren't the right size either.
What application was that? I don't have Ubuntu.

The most powerful one is xsane, or calling xsane from inside gimp.
--
Cheers, Carlos.
Java Jive
2019-10-07 15:49:57 UTC
Permalink
Post by Carlos E.R.
Post by Java Jive
Last night I spent mostly housekeeping the results so far, but also ...
I rediscovered why I'm working the scanner from Windows XP  -  when I
tried from Linux, menu option in recent edition of Ubuntu with XCFE, it
didn't know what size the scanner-glass was, auto left it too small,
while the manual settings offered weren't the right size either.
What application was that? I don't have Ubuntu.
The most powerful one is xsane, or calling xsane from inside gimp.
Menu, Graphics, Simple Scan.
Carlos E.R.
2019-10-12 13:02:36 UTC
Permalink
Post by Java Jive
Post by Carlos E.R.
Post by Java Jive
Last night I spent mostly housekeeping the results so far, but also ...
I rediscovered why I'm working the scanner from Windows XP  -  when I
tried from Linux, menu option in recent edition of Ubuntu with XCFE, it
didn't know what size the scanner-glass was, auto left it too small,
while the manual settings offered weren't the right size either.
What application was that? I don't have Ubuntu.
The most powerful one is xsane, or calling xsane from inside gimp.
Menu, Graphics, Simple Scan.
That tells me nothing: I said I do not have Ubuntu. Right click on the
menu, properties, and find out the actual application name.

Google tells me that it probably comes from Gnome, by Rober Ancel, and
the actual name is "simple-scan".

I have it, so I'm just testing it. I'm not impressed. I would not use it
for anything serious. The only good thing is that it is /simple/. It
lacks power.

You should switch to xsane, or gimp+xsane.
--
Cheers, Carlos.
Java Jive
2019-10-08 14:43:40 UTC
Permalink
Post by Java Jive
I discovered how to stitch non-overlapping images in Irfanview, but that
just seemed to glue them together at the edges, I've yet to try images
with overlap to see if they're stitched appropriately.
Irfanview just abuts overlapping images as well.
Paul
2019-10-08 19:20:58 UTC
Permalink
Post by Java Jive
Post by Java Jive
I discovered how to stitch non-overlapping images in Irfanview, but
that just seemed to glue them together at the edges, I've yet to try
images with overlap to see if they're stitched appropriately.
Irfanview just abuts overlapping images as well.
For small image sets, this seems to overlap too.
Except when you apply a "projection", in which case
it will mess with things a bit more. It also attempts
to fill in missing images in a panorama (with comedic
results).

https://www.microsoft.com/en-us/research/product/computational-photography-applications/image-composite-editor/

That's possibly four years old, and the author has
stopped working on it (as near as I can tell).

Paul
Loading...