The Luftwaffe Archives & Records Reference Group  

Go Back   The Luftwaffe Archives & Records Reference Group > NAMED AREAS OF INTEREST > Document Handling Issues

Reply
 
Thread Tools Display Modes
  #1  
Old 03-04-2004, 12:39 PM
Richard T Eger Richard T Eger is offline
Administrator
 
Join Date: Jun 2000
Location: Seaford, DE, U.S.A.
Posts: 23,700
Default Microfilm & Hardcopy to Digital & Hardcopy, Scanning, OCR, Digital or Analog Photos - 4

I am continuing a discussion here started on the topic "Microfilm & Hardcopy to Digital & Hardcopy, Scanning, OCR, Digital or Analog Photos - 3".

Dear Rod,

Ah, you have brought up a subject near and dear to my heart - microfilm reader/printers. Unfortunately, I've yet to find a truly good one, especially one that would be likely good enough to try on OCR. Frequently, microfilming of the original left something to be desired. Adding to that was that the original, itself, frequently was a poor copy to start with.

Given these two constraints, my goal in selecting a reader/printer is legibility to the eye. The ideal reader/printer will print out very weak sections of a letter, such as the stems in an "r" in something like "Times New Roman" font, but without bleeding them. For OCR, actual bleed can be a benefit. however, then you run the risk of filling in small openings, such as in an "e", so, while you might benefit with the "r" in OCR, you will lose with the "e".

The best I have found to date is the Canon 90. It has much of the features I desire in its ability to tease out information of very poor microfilm images. The Canon 80 is slightly poorer, but a possibility if you have nothing else. I have tried a Canon computerized reader/printer with laserjet output and was dissatisfied with the results. It did best with an archive newspaper microfilm, but, even then, it varied in darkness top to bottom such that, if the letters were well formed at the top, they would be too light and broken up at the bottom. It also digitized the image, meaning that a microfilm with gray tonality would be transformed into strictly black and white. The impact of this is that a slightly gray background which, say, on your typical photo editing software could be eliminated, now converts into hard black dots, which cannot be eliminated. My guess is that the OCR software would not be able to handle such a conversion. To add to the misery, in the sample of an historical document that I tried, the letters reproduced very, very poorly. It is a far cry between working with a pristine image and typically what one runs into with historical documents.

One would think that a computerized microfilm reader/printer with great output would be a slam dunk, but, so far, I haven't found one. When I got done evaluating the computerized Canon system, I left in disgust, saying to myself that this isn't rocket science and there is no excuse for such a disgustingly inadequate design. Thus, a word of caution here: New is not necessarily better. And, tying the scanning system to a laserjet as opposed to an inkjet I believe is a huge mistake. There was no gray tonality whatsoever with the laserjet tied to the Canon. Frankly, though, my preference is still a direct drum image.

As regards, translating German to English via Babelfish, I agree with your observation that sometimes translating one sentence at a time, even part of a sentence at a time, may give clearer translations. At least it allows you to focus on the sentence in question and reason how the translated sentence should actually be arranged. As I said earlier, one needs to see how the original German sentence was arranged, as this can be far more fluent than what Babelfish decides to do with the grammar in its translation.

Regards,
Richard
Reply With Quote
  #2  
Old 03-04-2004, 06:12 PM
Rod Mackenzie Rod Mackenzie is offline
Junior Member
 
Join Date: Jan 2004
Location: Invercargill, New Zealand
Posts: 15
Default Microfilm/Fiche Readers

Hi Richard,

I must say that I was disappointed in some ways with the Canon M305 that I used yesterday.

I am primary concerned with obtaining good digital copy just as much as a printed hardcopy.

In my view, a scanner needs to be able to convert an A4-sized document on 16mm microfilm or fiche to a full A4 image at 300 pdi. Also the image needs to be sharp and have good contrast.

The best results I have obtained so far on 16mm, was a sample scanned on a CanoScan 4000US - not a practical solution because film needs to be cut. I obtained slightly smaller than A4 at 200 dpi with good constrast and reasonable sharpness.

By the way, I tried scanning the Microfilm Reader output and putting it through OCR. It was horrendous so that isn't a viable option.

I am going to try digitising, using a professional service. I will be interested to see the results and whether they met the target...


Cheers

Rod
Reply With Quote
  #3  
Old 03-04-2004, 11:29 PM
Richard T Eger Richard T Eger is offline
Administrator
 
Join Date: Jun 2000
Location: Seaford, DE, U.S.A.
Posts: 23,700
Default

Dear Rod,

I'm not surprised that you had OCR problems with the Canon M305.

Early on, when I started this general topic, I had some hopes that OCR software really could take broken characters and determine what the intended letters were. This is what I run into with historical documents. But, my initial look-see wasn't promising and I digressed to transcribing, obviously a very slow and laborious procedure and one requiring careful proofreading. If there is a "magical", omniscient" OCR software out there, I'd love to hear of it. My thinking is that there is a long distance between the hype and reality.

Having personally retrenched on this, but still holding out hopes, the key thing in my mind is legibility, the being able to see/infer as much of each character as possible. To this end, this desire is actually in conflict with the needs of OCR. With OCR, as you say, you really want black on white, with no tonality. But, for greatest legibility to the naked eye, you want just the opposite: as much tonality as possible. This, then, allows for the presentation of even the faintest of details, essentially resulting in darker gray against lighter gray, the difference in shade poytentially being quite subtle. Taking the same document and putting it through text mode, giving strictly black on white, much of the character information disappears. Thus, the needs of readability and OCR conversion compatability are in distinct conflict with many historical documents. What is needed is OCR software that acts more like the human eye, integrating all that it sees and infering what is missing.

Regards,
Richard
Reply With Quote
Reply

« Previous Thread | Next Thread »
Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page
Display Modes
Linear Mode Switch to Linear Mode
Hybrid Mode Hybrid Mode
Threaded Mode Switch to Threaded Mode

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 12:06 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.