Gregarius » BetaNews.Com » gImageReader extracts text from images, PDFs, more

Canaux

108470 éléments (108470 non lus) dans 10 canaux

Actualités

01net. Actualités (48730 non lus)

Hoax

Top dix quotidien Sophos des canulars (65 non lus)

Logiciels

BetaNews.Com (34382 non lus)
Presence PC - Actualité Logiciels, jeux vidéos (3551 non lus)
LeMondeInformatique.fr : Actualités Applications (1133 non lus)

Sécurité

Actualités Sécurité Informatique (406 non lus)
Les Nouvelles.net : sécurité informatique. (1262 non lus)

Referencement

Secrets2moteurs (11793 non lus)
Referencement Google WebRankInfo (2255 non lus)
Abondance (4893 non lus)

BetaNews.Com

gImageReader extracts text from images, PDFs, more

Publié: juillet 12, 2015, 1:06pm CEST par Mike Williams

Extracting text from a PDF can be very easy. Just select a section and copy it to the clipboard, or maybe -- in Adobe Reader -- click File > Save As Other > Text to save the entire document.

This all works just fine, too, until you come across a PDF which is all images. And that’s when you need something a little more powerful.

GImageReader is an open source front end for the Tesseract OCR engine, and can extract text from PDFs, image files, or by acquiring them from your scanner. If that's not enough it also accepts images from the clipboard, or by taking a screenshot.

A one-click "Autodetect layout" option will hopefully detect all the text regions within the source. The reliability of this can be anything from "amazing" to "useless", depending on the image, but you can delete or reorder the regions as necessary. Or you might select a block manually by clicking and dragging with the mouse.

If the task is a simple one -- just a paragraph or two of high quality text -- you could just right-click a region and select "Recognize to clipboard". GImageReader grabs whatever text it can from the image and copies it to the clipboard, ready for immediate reuse elsewhere.

Longer blocks can be sent to an "Output" pane for cleaning up. There’s nothing too advanced -- search and replace, stripping line breaks, a chance for manual editing -- but it might be helpful, and when you’re done the results can be saved as a TXT file.

GImageReader’s interface is a little awkward in places, but once you've figured it out it’s easy enough to use, and the Tesseract engine can be very accurate. The program is available now for Windows XP+ and Linux.

← Here's how Windows Hello login works ... Second zero-day flaw found in Adobe F... →

TOP Gregarius 0.5.2 est propulsé par PHP, MagpieRSS, kses, SAJAX Tentatively valid XHTML1.0, CSS2.0 Last update: lun. 23 déc. 2024 17:11:51 CET

Gregarius » BetaNews.Com » gImageReader extracts text from images, PDFs, more

Canaux

BetaNews.Com

gImageReader extracts text from images, PDFs, more

Publié: juillet 12, 2015, 1:06pm CEST par Mike Williams