Using ChatGPT for OCR

Using ChatGPT for OCR

Having given up on inaccurate OCR of old, I can spend a disproportionate amount of my day in time consuming (but accurate) manual transcriptions of newspaper reports and adverts.

As an AI newbie I had never tried this before, so I thought that I would see what AI made of the following advertisement from the Lancaster Gazette (which is in the public domain).

Using a simple “Convert image to text” request produced:

Here is a clear, complete transcription of the image:

  1. B. B. CULLINGWORTH,
    ARCHITECTURAL, MECHANICAL, LANDSCAPE,
    AND PORTRAIT PHOTOGRAPHER.

In returning thanks for the liberal patronage he has received since he came to Lancaster, begs to inform his friends and the public generally, that his new Portrait Rooms are now complete, with a fresh arrangement of light, made from a drawing of one of the first photographic houses in London, and he is now able to produce Portraits of every description, superior to any ever offered to the public in Lancaster, at the under-mentioned prices:—

                               1 doz.   ½ doz.

Carte de Visite Portraits in two different positions, with either plain or fancy background ............... 12s. 6s.
Carte de Visite, in one position .................................................. 10s. 5s.
Views of Monuments or Tombstones, with inscription legible, suitable for Albums .... 15s. 10s.

Whole Plate Portraits, either in sitting or standing position, coloured in oil by one of the first photographic colourists ...................................................... Each

  • Whole Plate, coloured ........................................ £1 15s. 0d.
  • Half Plate, do. ................................................ £1 10s. 0d.
  • Whole Plate, uncoloured ...................................... £0 17s. 6d.
  • Half Plate, do. ................................................ £0 10s. 0d.

Copies to be had at a great reduction.

View of any building, 10s., or half-dozen ......................................... £2 0s. 0d.
Extra copies ........................................................................ 3s. 6d.

Any kind of Machinery, Carriages, Horses, Dogs, &c., taken on reasonable terms.

For Views taken at a distance exceeding two miles from Lancaster, not less than half a dozen printed.

PORTRAIT ROOMS.
OPPOSITE ST. JOHN’S CHURCH, LANCASTER.

  1. B. C. having several orders on hand, would be glad if parties wishful for views would give timely notice.

 

The transcription was impressive, even if the random emboldening was puzzling!

Refining the request to “Convert image to text formatted for printing” produced:

 

Here is the advert cleanly typeset for printing (A4 portrait, single column, tidy spacing while preserving the original wording and prices):

  1. B. B. CULLINGWORTH

ARCHITECTURAL, MECHANICAL, LANDSCAPE,
AND PORTRAIT PHOTOGRAPHER

In returning thanks for the liberal patronage he has received since he came to Lancaster, begs to inform his friends and the public generally, that his new Portrait Rooms are now complete, with a fresh arrangement of light, made from a drawing of one of the first photographic houses in London, and he is now able to produce Portraits of every description, superior to any ever offered to the public in Lancaster, at the under-mentioned prices:

PRICE LIST

Carte de Visite Portraits

Description

1 Dozen

½ Dozen

Two different positions, with either plain or fancy background

12s.

6s.

One position

10s.

5s.

Views

Description

1 Dozen

½ Dozen

Views of Monuments or Tombstones, with inscription legible, suitable for Albums

15s.

10s.

Large Portraits

Description

Price

Whole Plate Portraits, either in sitting or standing position, coloured in oil by one of the first photographic colourists

 

Whole Plate, coloured

£1 15s. 0d.

Half Plate, coloured

£1 10s. 0d.

Whole Plate, uncoloured

£0 17s. 6d.

Half Plate, uncoloured

£0 10s. 0d.

Copies to be had at a great reduction.

Building Views

Description

Price

View of any building, 10s., or half-dozen

£2 0s. 0d.

Extra copies

3s. 6d.

Any kind of Machinery, Carriages, Horses, Dogs, &c., taken on reasonable terms.

For views taken at a distance exceeding two miles from Lancaster, not less than half a dozen printed.

PORTRAIT ROOMS

OPPOSITE ST. JOHN’S CHURCH, LANCASTER

  1. B. C. having several orders on hand, would be glad if parties wishful for views would give timely notice.

 

It looked much better onscreen than when copied and pasted into a Word document, or as shown above in this blog, the price list was in two columns.

The rewording of repeated requests produced responses of differing degrees of success or failure. Asking it to retain capitals and or columns produced results inferior to those shown above.  

I also found that asking it to produce a pdf or Word document produced inferior results, more akin to OCR of old, whilst the transcription was accurate, it condensed it into one paragraph or it was curtailed.

Searching BPH for “ChatGPT” only brought up a couple of hits, one was from the Hong Kong History Research Centre https://britishphotohistory.ning.com/profiles/blogs/survey-historical-photographs-of-hong-kong

which contained 'Chinese' text, so I asked ChatGPT to translate it into English, it produced what seems like a plausible translation to me given the content of their blog.

As an AI novice I would be interested in hearing from other members of their experiences and perhaps promote the sharing of successful scripts and tips.

I would be particularly interested in hearing from anyone who has experience in machine transcribing and / or translating handwritten script as I can think of a number of items in my collection where that would be of benefit.

Thank you.

E-mail me when people leave their comments –

You need to be a member of British Photographic History to add comments!

Join British Photographic History

Comments

  • The following script produces two impressively accurate transcriptions of an image, in ChatGPT terms the 'diplomatic' version retains the original look should you wish to recreate a transcribed version of the original, whilst in the 'normal reading' version the text flows continuously.

    "Convert image to text, exact line-for-line diplomatic transcription, retaining original capitalization, punctuation, and hyphenation exactly as printed, with line breaks matching the image. Provide a normalized reading text parallel to the diplomatic version, retaining original capitalization, punctuation, and hyphenation."

    Hope this helps.

  • As posted earlier, this blog encouraged me to test ChatGPT to transcript a RAMC exposure card. Well, that went somewhat beyond plain transcription, as the card turned out to contain a lot interesting and complecated knowledge behind the simple illustration. I will cover more in my separete ”help request!”-post.

     

    However at the end, I did ask ChatGPT to create a report of the study (my own remarks at the end):

    ”Research report: investigating a RAMC exposure card & No. 3 Cartridge Kodak field kit

    This post summarises the research work carried out so far on a handwritten Royal Army Medical Corps exposure card associated with a No. 3 Cartridge Kodak (c.1900–1905), together with its surviving dry-plate field equipment.
    It is written for fellow photo-history enthusiasts who may be curious about how such material can be investigated outside formal institutional settings.
    ---
    1. What has been investigated

    The work has focused on understanding the function, authorship, and context of the exposure card rather than merely identifying the camera.

    Primary areas of investigation:

    * Physical analysis of the exposure card (layout, handwriting, inks, stamps).
    * Logical analysis of exposure tables, including:
    * lighting conditions
    * subject distance
    * aperture values
    * time units (seconds, minutes, hours)

    * Comparison with:
    * Kodak exposure tables of the period
    * known dry-plate practice
    * military and medical photographic constraints

     

    * Cross-reading the card with:
    * the camera’s shutter characteristics (T / B / I; spring-driven speeds)
    * aperture scale inconsistencies
    * black-slide pencil markings

    * Assessing whether values represent:
    * theoretical guidance
    * personal notes
    * or pre-computed allowances for field use
    ---
    2. Difficulties and false starts

    Several challenges and early assumptions had to be corrected:

    a) Misreading Roman numerals

    * Initial interpretations treated Roman numerals as shutter multipliers.
    * This proved incorrect once the interior exposure values (e.g. *Conservatory*) were analysed.
    * Logical progression from seconds → minutes → hours clarified that Roman numerals denote minutes, not shutter settings.

    b) Distance vs exposure “paradox”

    * Early concern that longer exposures for closer subjects contradicted inverse-square expectations.
    * Resolution came from recognising:
    * bellows extension
    * slow emulsions
    * very small apertures
    * non-instantaneous shutters

    * This reframed the table as practical allowance-based guidance, not theoretical optics.

    c) The “fast / slow” confusion

    * A word initially thought to be ”fast” did not match handwriting elsewhere.
    * Closer comparison showed the letterforms aligned better with “slow”.
    * This shifted interpretation towards plate speed, not lighting speed.

    d) Over-reliance on Kodak literature

    * Kodak manuals are helpful, but they rarely reflect institution-specific working practice.
    * Assuming a direct Kodak origin for the table initially limited interpretation.
    ---
    3. What cannot (yet) be determined

    Despite extensive analysis, several questions remain open:

    * Whether “Castle Plate” refers to:

    * a specific commercial emulsion,
    * a contracted supplier,
    * or a RAMC colloquial term for issued plates.

     

    * Whether this card was:
    * unique to one photographer,
    * copied across a unit,
    * or derived from an official but undocumented RAMC instruction.

    * The identity of the individual(s) who compiled the table.
    * Exact dates of compilation versus later annotation (e.g. “Normal”).

    These limits are important to state: absence of evidence must not be replaced with speculation.
    ---
    4. Best findings so far

    Several strong conclusions can now be stated with confidence:

    * The exposure card is a working field tool, not a teaching aid.
    * It encodes pre-computed exposure allowances, not on-the-spot calculations.
    * It reflects a systematic understanding of camera, plate, distance and light under service conditions.
    * The RAMC stamp anchors the card to official medical-military use, not private hobby photography.
    * The card documents operational photographic knowledge, which is rarely preserved.

    In museum terms, this elevates the object from “interesting accessory” to ”primary evidence of photographic practice”.
    ---
    5. Time investment

    Approximate time spent so far:

    * Physical examination & transcription: ~10–15 hours
    * Logical modelling of exposure tables: ~15–20 hours
    * Comparative research & discussion: ~20+ hours

    This kind of work is slow because each assumption must be tested against multiple constraints.
    ---
    6. How this kind of research can be done more efficiently

    Lessons learned that may help others:

    1. Model the logic first, not the terminology
    → If the numbers don’t behave sensibly, the interpretation is wrong.

    2. Separate units early
    → Seconds, minutes, hours must be identified before meaning emerges.

    3. Treat handwriting as secondary evidence
    → Let structure and arithmetic lead, then confirm with letterforms.

    4. Avoid forcing modern concepts
    → Early photographers worked with heuristics and allowances, not formulas.

    5. Expect institutional idiosyncrasy
    → Military and medical contexts often diverge from commercial manuals.
    ---
    7. Why the results matter

    For historians and collectors:

    * This shows how field photography actually functioned at the turn of the century.

    For museums:

    * It demonstrates the value of preserving complete working systems, not isolated objects.

    For enthusiasts:

    * It highlights how much knowledge still survives outside formal archives, often encoded in modest, easily overlooked objects.

    Above all, this work shows that collaborative interpretation — combining logic, experience and comparison — is often the only way to recover lost photographic practice.
    ---
    If others have encountered similar exposure cards, Castle plates, or RAMC photographic material, even tangentially, further input would be extremely welcome.”

     

    *****

     

    My remarks:

     

    For me this was practically the first time, I used AI explicitly for extensive help.

     

    Input data

    * The main input data were two photos of the exposure card, one of each side

    * Photograps of the black slides with their annotations

    * All the other parts of the No. 3 Cartridge Kodak equipment were described by typing

    * I do have some photographic and technical background

    * I checked and used information about RAMC and Chapman Jones plate tester during this study

    * The three I (instantaneous) shutter speeds of the camera

     

    1. What has been investigated

    * Initially contents of the card for a transcription, but it went further.

     

    2. Difficulties and false starts

    Here I met frustration more or less through out the process. Even ChatGPT does argument its claims, I had mistrust:

    * Did it really took into account issues as I personally thought there should have been. Several times, it turned out it did not.

    * ChatGPT is very context driven and can drive to wrong conclutions, with some topics it was difficult to make to agree. A wrong conclution will affect the coming analysis/resolutions.

    * ChatGPT is not too transparent with the data nor the data sources it uses.

    * Its current capability to read poor quality text is not encouraging – to me. After trying few times to get analusis of the pencil annotations of the back slides, I just gave up. I personnally was able to read something from the annotations but not what ChatGPT found. With the card annotations things were much better, but there also, I feel I did physical reading better.

    * This was a legthy project. With that in mind, ChatGPT seemed to drop the ball few times. It kind of added some related but external items in the results and obviously in the analysis also. With the twelve back slides it appeared not to reference them correctly.

     

    3. What cannot (yet) be determined

    I agree with the list, and would add the black slide annotations appear to have information that could lead to locations, people and perhapse process related information.

    4. Best findings so far

    I would add:

    * There is lot of photographic and medical documentation knowledge put in this card that has likely required camera spcific testing.

    * This card has a timeline/lifecycle.

     

    5. Time investment

    I did not record time.

     

    6. How this kind of research can be done more efficiently

    * ChatGPT need to be driven, even it does try to run on autopilot. It does propose next steps that typically are good options, but not necessary the best options to proceed with. So have a plan or otherwise make sure it is you running the show.

    * Doubt the results and the analysis, there can be - and very likely are misstakes.

    * Test what are the tool limitations, but once you hit them it is probably not worth hitting them again.

     

    7. Why the results matter

    I agree.

     

    For me this has been a true learning experience, both in the subject matter and using AI.

    As a Finnish speaking person I was very happy to use my native language in using the tool.

    The subject and the context was fruitfull, as there is written data available of that time. AI made big part of the hard work in providing this data in an easy to use format.

  • I've been having some further plays with this today and the following script works well with straightforward adverts (but failed the tougher Cullingworth test as far as the allignment of price columns was concerned) "Convert image to text, exact line-for-line diplomatic transcription, retaining original capitalization, punctuation, and hyphenation exactly as printed, with line breaks matching the image."

  • Based on this information I went on searching information on RAMC exposure card that I posted as its' own topic. As the result, I got more than plenty and still need to analyze what I got and will add that information to the specific topic.

    However, the text of the card is handwritten and over 100 years old, but mostly of good quality. In this case ChatGPT was well able to read the letters and create the trascript. It claimed to found three or four different writers, based on the handwriting and the colour of the inks. ChatGPT seems to be very context driven, both in the good and the bad. The good is, that if it does have background information, it does make heavy comparison and judgment of your document. The bad is if the context is off then the results are off as well; and there is a risk one become misguided by the tool.

    If your task is - to yourself - relatively simple and you understand it well, but laboursome to do, I think the tool will cut the labour away with good results. Just feed it with good quality image and information on the context.

    My initial mileage differed, but I will tell about it in my original topic.

    • I too am wary of AI misattributions, and generally take its suggestions with a pinch of salt until verified further, however, I hope that it will satisfy my transcription / translation needs going forwards.

  • For my negative captions I have this as a post-extration rule (prompt) in Datasheep which works well for our purpose -Remove line breaks and correct spelling and grammatical errors in British English. Keep the first sentence in UPPERCASE and output the remainder in sentence case. For one off OCR's , Microsoft Co-Pilot also works well. Important to note that free versions of Chat-GPT and some T+Cs mean that the AIs are are very likely to be training from the input data and also you could be infringing copyright if the original text you are using (i.e. you have copied) is not in the public domain or rights owned. JB

    • Thanks for this update.

  • I use ChatGPT for both transcriptions and translations. I always ask for a verbatim transcription and there are still occasional errors. You can direct it to 'transcribe verbatim, maintaining spacing and line breaks'. This will give a better approximation of the text. The translations are quite good, better than google trnaslate or other online versions.

    • Thanks for the advice Roger, I'll give it a try.

  • I helped delevop Datasheep.com based on a specific OCR issue we had at Topfoto.co.uk - We have many thousands of negatives from press photo archives which have all corresponding caption sheets. Apart from the basic OCR , it uses AI (a ChatGPT API) rules (prompts) to extract data from the resulting OCR text into fields. This is then exported to a CSV or XLS file that can be further cleaned up and then be bulk imported into the metadata fields of the corresponding images. I gave a demo of it for the ICA when in beta version but it has moved on quite a bit since then and is ready to use live - the ICA demo is number 5 on this https://www.ica.org/resource/ai-and-archival-practice-on-line-tutor... or try it out from the datasheep url above. We have found it invaluable for our specific needs, best John

This reply was deleted.

Blog Topics by Tags

Monthly Archives