#PDF OCR
Explore tagged Tumblr posts
queerliblib · 7 months ago
Note
So the question about books not available on Libby raised another question in my mind. If one of us following you on social media has one of those books that’s unavailable on Libby could we scan it and submit it to you as a PDF somehow so others could access it? I don’t have the several hundred dollar book that was mentioned, and I know this could be dipping my toes into copyright law territory, but it could be beneficial to try and crowd source some of our history, Zine style
ah. okay, love the crowdsource-y punk vibes. however we are NOT in a position to play fast and loose with copyright laws. we can’t even take pdf’s directly from the authors! we have formal non-profit status* and for us, it’s really important that we maintain access nationwide to as many folks as possible, for as many books as we can (and we’re still buying more as fast as our budgets allow - we’re not close to being done yet!)
we’ve got lots of plans to keep growing and expanding our catalogue, but what you’re suggesting is not one of the feasible options for us.
in the meantime, some other great options are to keep requesting queer books from your local public libraries, to use InterLibrary Loan if you (or a friend) has access to a university system, and explore some (legal) Open Access or Public Domain projects that are out there (queer zine archive project, directory of open access books, project gutenberg, etc..)
85 notes · View notes
finnlongman · 9 months ago
Note
i hope this doesn't sound patronising if you've already tried that route, but in case you haven't: if this is a text that has been given to you as is and been produced by a general OCR, it might be worth looking into whether ppl have already trained language-specific OCR models for your field of study and using that to transcribe the scanned text again. there's a lot of transcription solutions/software and different fields prefer different ones, and idk the standard for Celtic studies personally, but a site i use often (transkribus) has 2 Irish models whose related projects you might be able to use as a starting point for research at least. best of luck to you either way!
So there are several factors at work with the OCR problems with this text specifically.
The PDF of the text is from Archive. The library copy it was scanned from has various pencil markings and annotations that are interfering with the printed text -- it's not a clean scan. It's also not super high definition, so letters like "h" sometimes get misread as "li", even though they're totally readable to human eyes.
The edition uses frequent italics and brackets to show where abbrevations in the manuscript has been expanded. Individual italicised letters confuse the OCR, as do random square brackets in the middle of words.
It also has a lot of superscript numbers corresponding to manuscirpt variants in the footnotes. Sometimes these are in the middle of a word. This also confuses most OCR systems, even if it can tell that the footnotes are separate from the main text.
The language of the text is late Middle / Early Modern Irish, from two different manuscripts that have their own unique spelling quirks (for example, one of them loves to spell Cú Chulainn's name "Cú Cholain", which is a vibe).
In order to run the text through a more sophisticated OCR system that was equipped to cope with a) annotations, b) weird formatting and punctuation, c) incredibly frequent footnotes (variants), and d) non-standardised spelling (which throws off many language models), I would probably still need to have a reliable, clear, and high-definition scan of the text. Which would require re-digitising it from scratch.
So, the quickest and easiest way to get a version of the text that I personally can use is to sit here and type up 20,000 words into a document. This is 2-4 days' work, depending on how focused I am, and gives me the chance to go through the text in detail and spot things I might miss otherwise, so it's probably a whole lot less effort for more benefit than trying to adapt an entire language model that could read this terrible PDF. Especially as I have no experience of using these programmes so would have a steep learning curve.
Now, somebody absolutely should do that, so we could get proper searchable editions of more things. But honestly, if using transcription tools for medieval/early modern Irish I think there are higher priorities than things already available in printed form, so I doubt it's at the top of anyone's to-do list!
16 notes · View notes
snarp · 7 months ago
Text
Kentucky court system needs to fire its web developers.
8 notes · View notes
istherewifiinhell · 2 months ago
Text
not only does the (points above my head) blog descriptor always stand. but im also picky about how much dead space people are including in their screenshots of text. if only there was some way we could condense this into just the relevant information that was easily and efficiently displayed on all devices.......
2 notes · View notes
joyrom · 16 days ago
Text
Google Chrome: OCR per leggere i pdf scansionati
La nuova versione di Google Chrome integrerà una funzionalità #OCR che consentirà di aprire un file PDF, ottenuto tramite scansione, quindi praticamente un'immagine, e poter utilizzare le funzionalità di copia-incolla, come se quel testo esistesse e fosse stato scritto.
0 notes
mohemakjasa · 2 months ago
Text
احترف تعديل النصوص على الصور المكتوبة بدون برامج
Tumblr media
تعديل النصوص المكتوبة على الصور بدون برامج: يعتبر هو أحد أكثر المواضيع بحثًا في وقتنا الحالي، خاصة مع التطور الكبير في أدوات الذكاء الاصطناعي التي أصبحت تتيح إمكانية إزالة النصوص المكتوبة على الصور، أو تعديلها بسهولة ودقة، دون الحاجة إلى تحميل برامج ثقيلة أو معقدة. بالتالي، وسواء كنت تعمل في مجال: التصميم، أو التسويق، أو حتى من مستخدمي وسائل التواصل الاجتماعي، فإنك بالتأكيد تحتاج إلى تعديل النصوص في صورة ما، دون فقدان جودتها أو معناها.
0 notes
algodocs · 5 months ago
Text
  🖼️📃 🔍🔀PDFs come in various forms—standard editable PDFs and scanned image PDFs. Standard PDFs facilitate easy data editing and copy-pasting, whereas scanned image PDFs are not directly editable. But what if you need to extract data from scanned image PDFs? Unlike standard PDFs, you can’t simply copy and paste the information. So, how can you efficiently extract data from scanned image PDFs? In this post, we’ll explore: ✅What is PDF image extraction? ✅The challenges of PDF image extraction ✅The best tools available to streamline the process ✅How AlgoDocs AI simplifies and automates data extraction from scanned PDFs Read our compressive guide to lean more ⬇️  https://www.algodocs.com/pdf-image-extraction-comprehensive-guide-2025/
0 notes
lets-steal-an-archive · 8 months ago
Text
https://www.tvwriting.co.uk/tv_scripts/Collections/Drama/Supernatural/Supernatural_15x18_-_Despair.pdf
NOW you are prepared.
Tumblr media
[Image ID: A screenshot of three folders which have been labeled 'memes if Donald Trump wins', 'memes if Georgia turns blue again' and 'memes if Kamala Harris wins' respectively. /End ID]
I am prepared.
14K notes · View notes
codician · 11 months ago
Video
youtube
Extract text from PDF(OCR/Image) File using Python / Voter data extraction
0 notes
linuxtldr · 1 year ago
Text
1 note · View note
meisnerd · 10 months ago
Text
Putting it out of tags because it's important, but if you ever need to do readings for xyz and you don't do well with text, use a text reader! It's easy to find somewhat good ones for free (if you read in English), like microsoft edge, and even better ones if you can pay.
Do not torture yourself.
Tumblr media
In Prince's funky name, amen.
63K notes · View notes
bearbench · 1 year ago
Text
0 notes
Text
OCR technology has revolutionized data collection processes, providing many benefits to various industries. By harnessing the power of OCR with AI, businesses can unlock valuable insights from unstructured data, increase operational efficiency, and gain a competitive edge in today's digital landscape. At Globose Technology Solutions, we are committed to leading innovative solutions that empower businesses to thrive in the age of AI.
0 notes
onlineocrtool · 1 year ago
Text
0 notes
starvista · 1 year ago
Text
Tumblr media
Best Free Online OCR Tool
The StarVista Online OCR tool is for converting scanned PDF files that have not been OCR'ed to make them Section 508 Compliant PDFs.
0 notes
mastersindia · 2 years ago
Text
Tumblr media
Invoice OCR
The main work of Invoice OCR is to convert the data present in Invoice PDF or Image into machine readable format. Masters India Invoice ocr is Pre-Trained AI & ML Model. It has No template dependency. It is 100% AI. Drag and drop or Upload an Invoices, Receipt, Purchase order, E -invoice Qr code and Bill of entry in PDF, Image or Png and see the OCR in action.
0 notes