#optical-character-recognition
Explore tagged Tumblr posts
Text
AI Tool Reproduces Ancient Cuneiform Characters with High Accuracy

ProtoSnap, developed by Cornell and Tel Aviv universities, aligns prototype signs to photographed clay tablets to decode thousands of years of Mesopotamian writing.
Cornell University researchers report that scholars can now use artificial intelligence to “identify and copy over cuneiform characters from photos of tablets,” greatly easing the reading of these intricate scripts.
The new method, called ProtoSnap, effectively “snaps” a skeletal template of a cuneiform sign onto the image of a tablet, aligning the prototype to the strokes actually impressed in the clay.
By fitting each character’s prototype to its real-world variation, the system can produce an accurate copy of any sign and even reproduce entire tablets.
"Cuneiform, like Egyptian hieroglyphs, is one of the oldest known writing systems and contains over 1,000 unique symbols.
Its characters change shape dramatically across different eras, cultures and even individual scribes so that even the same character… looks different across time,” Cornell computer scientist Hadar Averbuch-Elor explains.
This extreme variability has long made automated reading of cuneiform a very challenging problem.
The ProtoSnap technique addresses this by using a generative AI model known as a diffusion model.
It compares each pixel of a photographed tablet character to a reference prototype sign, calculating deep-feature similarities.
Once the correspondences are found, the AI aligns the prototype skeleton to the tablet’s marking and “snaps” it into place so that the template matches the actual strokes.
In effect, the system corrects for differences in writing style or tablet wear by deforming the ideal prototype to fit the real inscription.
Crucially, the corrected (or “snapped”) character images can then train other AI tools.
The researchers used these aligned signs to train optical-character-recognition models that turn tablet photos into machine-readable text.
They found the models trained on ProtoSnap data performed much better than previous approaches at recognizing cuneiform signs, especially the rare ones or those with highly varied forms.
In practical terms, this means the AI can read and copy symbols that earlier methods often missed.
This advance could save scholars enormous amounts of time.
Traditionally, experts painstakingly hand-copy each cuneiform sign on a tablet.
The AI method can automate that process, freeing specialists to focus on interpretation.
It also enables large-scale comparisons of handwriting across time and place, something too laborious to do by hand.
As Tel Aviv University archaeologist Yoram Cohen says, the goal is to “increase the ancient sources available to us by tenfold,” allowing big-data analysis of how ancient societies lived – from their religion and economy to their laws and social life.
The research was led by Hadar Averbuch-Elor of Cornell Tech and carried out jointly with colleagues at Tel Aviv University.
Graduate student Rachel Mikulinsky, a co-first author, will present the work – titled “ProtoSnap: Prototype Alignment for Cuneiform Signs” – at the International Conference on Learning Representations (ICLR) in April.
In all, roughly 500,000 cuneiform tablets are stored in museums worldwide, but only a small fraction have ever been translated and published.
By giving AI a way to automatically interpret the vast trove of tablet images, the ProtoSnap method could unlock centuries of untapped knowledge about the ancient world.
#protosnap#artificial intelligence#a.i#cuneiform#Egyptian hieroglyphs#prototype#symbols#writing systems#diffusion model#optical-character-recognition#machine-readable text#Cornell Tech#Tel Aviv University#International Conference on Learning Representations (ICLR)#cuneiform tablets#ancient world#ancient civilizations#technology#science#clay tablet#Mesopotamian writing
5 notes
·
View notes
Note
So the question about books not available on Libby raised another question in my mind. If one of us following you on social media has one of those books that’s unavailable on Libby could we scan it and submit it to you as a PDF somehow so others could access it? I don’t have the several hundred dollar book that was mentioned, and I know this could be dipping my toes into copyright law territory, but it could be beneficial to try and crowd source some of our history, Zine style
ah. okay, love the crowdsource-y punk vibes. however we are NOT in a position to play fast and loose with copyright laws. we can’t even take pdf’s directly from the authors! we have formal non-profit status* and for us, it’s really important that we maintain access nationwide to as many folks as possible, for as many books as we can (and we’re still buying more as fast as our budgets allow - we’re not close to being done yet!)
we’ve got lots of plans to keep growing and expanding our catalogue, but what you’re suggesting is not one of the feasible options for us.
in the meantime, some other great options are to keep requesting queer books from your local public libraries, to use InterLibrary Loan if you (or a friend) has access to a university system, and explore some (legal) Open Access or Public Domain projects that are out there (queer zine archive project, directory of open access books, project gutenberg, etc..)
#*through our fiscal sponsor NOPI -we link on our website#asks#also slightly unrelated but pdfs can be AWFUL for accessibility#like if they dont have OCR (optical character recognition) built in then a screen reader can’t read it#and we also take accessibility pretty seriously around here
85 notes
·
View notes
Text
A most unusual Christmas to entertain the eyes during the holidays. Please share if you like this.
#optical illusion art#optical illusion#optical art#optical character recognition#double exposure#christmas decorations#christmas tree#christmas gift#christmas ornament
2 notes
·
View notes
Text
My Chemical Romance?
No, comrade! Our Chemical Romance.
so like I said, I work in the tech industry, and it's been kind of fascinating watching whole new taboos develop at work around this genAI stuff. All we do is talk about genAI, everything is genAI now, "we have to win the AI race," blah blah blah, but nobody asks - you can't ask -
What's it for?
What's it for?
Why would anyone want this?
I sit in so many meetings and listen to genuinely very intelligent people talk until steam is rising off their skulls about genAI, and wonder how fast I'd get fired if I asked: do real people actually want this product, or are the only people excited about this technology the shareholders who want to see lines go up?
like you realize this is a bubble, right, guys? because nobody actually needs this? because it's not actually very good? normal people are excited by the novelty of it, and finance bro capitalists are wetting their shorts about it because they want to get rich quick off of the Next Big Thing In Tech, but the novelty will wear off and the bros will move on to something else and we'll just be left with billions and billions of dollars invested in technology that nobody wants.
and I don't say it, because I need my job. And I wonder how many other people sitting at the same table, in the same meeting, are also not saying it, because they need their jobs.
idk man it's just become a really weird environment.
#it's optical character recognition#where computer's can read text from images#very useful#not a trillion dollar problem#but people with money want to make more money#I'm guessing that some engineers read#The 48 Laws of Power#and it's all went to shit
33K notes
·
View notes
Text
Top 10 Best OCR Models You Need to Know in 2025
In an increasingly digital world, a surprising amount of critical information remains locked away in physical documents, scanned images, or unstructured digital formats. This is where Optical Character Recognition (OCR) technology steps in – converting various types of documents, such as scanned paper documents, PDFs, or images, into editable and searchable data.
But OCR is no longer just about basic text extraction. In 2025, advanced OCR models, powered by sophisticated AI and deep learning, are moving beyond mere character recognition to truly understand document layouts, extract structured data from complex forms, and even decipher diverse handwriting. The right OCR model can be the linchpin for digital transformation, automating workflows, enhancing data accessibility, and unlocking invaluable insights.
Here are the top 10 OCR models and technologies that are making waves and defining the landscape in 2025:
Cloud-Powered & Enterprise Solutions
These offerings typically provide robust, scalable, and often AI-infused solutions with extensive language support and pre-built models for common document types.
Google Cloud Document AI
Strength: More than just OCR, it's a comprehensive document processing platform. It uses specialized parsers trained on specific document types (invoices, receipts, contracts, IDs) to extract structured data, not just raw text. Its underlying OCR is highly accurate, especially for complex layouts and tables.
Why for 2025: Integrated with Google Cloud's broader AI ecosystem, it's ideal for businesses needing deep document understanding and automation across various industries, pushing beyond simple text extraction.
Amazon Textract
Strength: A machine learning service that goes beyond simple OCR to automatically extract text, handwriting, and data from scanned documents. It excels at identifying forms, tables, and key-value pairs, making it powerful for automating data entry from semi-structured documents.
Why for 2025: Part of the AWS ecosystem, Textract is known for its high accuracy and seamless integration into cloud-native applications, perfect for scalable document processing pipelines.
Azure Document Intelligence (formerly Form Recognizer)
Strength: Microsoft's offering provides powerful OCR alongside intelligent document processing. It supports pre-built models for common document types (invoices, receipts, business cards), custom model training for unique layouts, and layout analysis to preserve document structure.
Why for 2025: Its tight integration with Azure services and strong capabilities in understanding both printed and handwritten text, even with complex layouts, make it a top choice for enterprises leveraging Microsoft's cloud.
ABBYY FineReader PDF (and ABBYY Vantage)
Strength: A long-standing leader in OCR, ABBYY offers highly accurate text recognition across a vast number of languages (over 190). FineReader is excellent for converting scanned documents and PDFs into editable formats. ABBYY Vantage extends this to Intelligent Document Processing (IDP) with AI-powered data capture from complex business documents.
Why for 2025: Known for its precision and comprehensive language support, ABBYY remains a go-to for high-fidelity document conversion and advanced IDP needs, especially where accuracy in diverse languages is paramount.
Advanced Open-Source Models & Frameworks
For developers and researchers who need customization, specific integrations, or budget-friendly solutions.
Tesseract OCR (maintained by Google)
Strength: The most widely used open-source OCR engine. While traditionally needing pre-processing, its latest versions (Tesseract 4+ with LSTM-based engine) offer significantly improved accuracy, especially for line-level recognition, and support over 100 languages.
Why for 2025: It's a foundational tool, highly customizable, and perfect for projects where you need a free, powerful OCR engine with extensive language support. Often used as a baseline or integrated into larger systems.
PaddleOCR (Baidu)
Strength: A comprehensive, open-source toolkit for OCR that boasts strong performance across various scenarios, including complex layouts, multi-language support (over 80 languages, including complex Chinese/Japanese characters), and impressive accuracy, often outperforming Tesseract out-of-the-box on certain benchmarks.
Why for 2025: Its ease of use, robust pre-trained models, and strong community support make it an excellent choice for developers looking for a high-performance, flexible open-source solution.
docTR (Mindee)
Strength: An open-source, end-to-end OCR library built on deep learning frameworks (TensorFlow 2 & PyTorch). It focuses on document understanding, offering strong performance in text detection and recognition, particularly for structured documents and various fonts.
Why for 2025: Offers a modern, deep-learning based approach, known for good accuracy on challenging document types like scanned forms and screenshots. It's a strong option for developers building custom document processing workflows.
Emerging & Specialized Models
These models represent newer advancements, often leveraging vision-language models or focusing on specific niches.
Florence-2 (Microsoft)
Strength: A powerful vision-language model that excels at various computer vision tasks, including detailed OCR. Its ability to understand the spatial relationships between text and other visual elements makes it excellent for complex document layouts, scene text, and even visual question answering.
Why for 2025: As a versatile foundation model, Florence-2 pushes the boundaries of multimodal understanding, suggesting a future where OCR is deeply integrated with broader visual intelligence.
Surya
Strength: A Python-based OCR toolkit specifically designed for line-level text detection and recognition across 90+ languages. It's gaining popularity for its efficiency and accuracy, often touted as outperforming Tesseract in speed and recognition for certain tasks.
Why for 2025: For developers who need fast, accurate line-level OCR, especially in a Python environment, Surya offers a compelling lightweight alternative to larger models.
Mistral OCR (Mistral AI)
Strength: Launched recently in 2025, Mistral OCR is quickly gaining recognition for its robust performance on complex documents including PDFs, scanned images, tables, and even equations. It accurately extracts text and visuals, making it useful for Retrieval Augmented Generation (RAG) applications.
Why for 2025: As a product from a leading AI firm, it represents the cutting edge in highly accurate, context-aware OCR, especially for integrating document intelligence with advanced AI systems.
Key Trends Shaping OCR in 2025
Beyond Text: The focus is shifting from mere text extraction to comprehensive Document Understanding, including layout analysis, table extraction, and key-value pair identification.
AI Integration: OCR is increasingly powered by sophisticated deep learning models and integrated with larger AI pipelines, including Large Language Models (LLMs) for semantic understanding and post-processing.
Handwriting Recognition (ICR): Significant advancements are being made in accurately recognizing diverse handwriting styles.
Cloud-Native & API-Driven: Most leading solutions are offered as scalable cloud services with robust APIs for seamless integration into enterprise applications.
Multimodal OCR: Models are leveraging both visual and textual cues to improve accuracy and contextual understanding.
Choosing Your OCR Model
The "best" OCR model depends entirely on your specific needs:
For high-volume, structured documents (invoices, receipts): Consider cloud services like Google Document AI, Amazon Textract, or Azure Document Intelligence.
For broad language support and customizability (open-source): Tesseract or PaddleOCR are strong contenders.
For complex layouts and modern deep learning approaches: docTR, Florence-2, or Mistral OCR are excellent choices.
For specific tasks like line-level text detection: Surya offers a specialized solution.
The landscape of OCR in 2025 is dynamic and exciting. By understanding these top models and the underlying trends, you can choose the right tools to unlock the vast potential hidden within your documents and drive meaningful automation and insights.
0 notes
Text
Chrome в Android теперь позволит увеличивать текст без увеличения всей страницы
Экранная читалка Google TalkBack также теперь позволяет задавать дополнительные вопросы по изображению.
https://tefida.com/chromes-android-app-will-now-let-you-zoom-in-on-text-without-affecting-the-webpage/
0 notes
Text
Optical Character Recognition (OCR) technology has revolutionized our interaction with printed and handwritten text. It enables seamless digitization and automation. However, while the technology is widely used for English text, its adoption for regional languages like Hindi and Gujarati is unique and brings more opportunities. Hence, in this post, we will learn what OCR technology is, how it works, its advantages, disadvantages, benefits, and its role in operating in different languages.
0 notes
Text
#AWS#Amazon Bedrock#AI#Generative AI#API#AWS SDK#Anthropic Claude 3.7 Sonnet#Anthropic Claude 3.7#Anthropic#Claude 3.7 Sonnet#Claude 3.7#Claude#Optical Character Recognition#OCR
0 notes
Text
What is ANPR Based Vehicle Access Control System?
ANPR based vehicle access control system refers to the use of Automatic Number Plate Recognition (ANPR) technology to manage and automate vehicle entry and exit in restricted areas. By capturing and reading vehicle license plates in real-time, the system determines whether a vehicle is authorized to access a particular zone (e.g., parking facilities, gated communities, toll roads, or secure premises). This modern approach replaces or supplements traditional methods of access control, allowing for seamless and automated management of vehicle movements.
Definition of Vehicle Access Control Using ANPR Technology
ANPR based vehicle access control systems employ high-resolution cameras to capture the license plates of vehicles approaching an entry point. Using Optical Character Recognition (OCR), the system extracts the license plate number and compares it against a pre-approved database or watchlist. If the plate is recognized as authorized, the system automatically grants access by opening gates or barriers. This eliminates the need for physical credentials like RFID tags, access cards, or manual inspections by security personnel.
#ANPR based vehicle access control#ANPR Technology#ANPR System#Optical Character Recognition#Vehicle Access Control#Automatic Number Plate Recognition
0 notes
Text
OCR & AI: Powering Smart Document Processing.
Hey there, document wranglers and data tamers! Tired of drowning in paperwork and digital files? Well, buckle up, because we're diving into how Optical Character Recognition (OCR) and Artificial Intelligence (AI) are teaming up to revolutionize document management.
Imagine having a super-smart, tireless assistant who can read, organize, and make sense of your documents faster than you can say "Where did I put that file?" That's exactly what the powerful combo of OCR and AI brings to the table. This dynamic duo is giving traditional document management a major upgrade, automating tasks, boosting accuracy, and slashing costs.

Tech Progress in OCR.
The move from extracting data by hand to automating it kicked off with OCR tech. originally, this was just for turning printed words into digital ones, but wow, it’s come a long way! Now it can handle all sorts of document formats like business cards, invoices, receipts, & even complex documents that take up multiple pages.
This change has made OCR super important for going digital. It lets businesses turn their paper files into editable & search-friendly digital formats quickly. Picture a company that used to stack files in cabinets; now they can convert all that into digital form! This makes finding & managing documents way easier. Plus, it saves time and clears up space.
The Key Role of AI in Document Management.
While OCR lays the groundwork for text conversion, AI kicks it up a notch in document management. It goes beyond just recognizing text by diving into the context & grabbing useful info from messy or partly organized documents.
With tech like machine learning & natural language processing, systems can learn from what they see, spot patterns, and even make smart decisions. This cuts down on how much humans need to be involved in processing documents. For instance, an AI can look at past invoices to guess future billing trends, helping finance teams keep cash flow in check.
Boosting Skills with OCR and AI Together.
Mixing OCR with AI creates a strong base for smart document processing—here are some highlights:
Better Accuracy: AI's advanced ability to spot patterns tackles issues with various document formats & layouts that OCR faces. This means way fewer mistakes when pulling out text! Businesses end up trusting their data more—better choices happen here.
Automated Data Pull: These smart processing systems use AI models to automatically pull out data from documents. This smoothens workflows and cuts back on manual entry errors big time! Employees then get to focus on important tasks instead of boring data entries.
Instant Document Classification: Thanks to AI, these systems can quickly sort and send documents where they need to go. Think big organizations where invoices go directly to finance or contracts head straight to legal—this makes sure everything lands in the right inbox without making anyone wait.
Strong Security Measures: By adding AI into the mix, smart document software can crank up security measures like encrypted storage and multi-factor authentication—keeping sensitive info safe from prying eyes & cyber risks is crucial today when data breaches are so common.
Detailed Audit Trails: These systems keep track of every action related to documents too! This is super important for staying clear & accountable while following regulations. Organizations have an easy way to see who looked at what info and when—great for compliance audits!
Various Industry Uses.
The combo of OCR and AI is beneficial across tons of sectors:
Finance: Automating loan processes or spotting fraud helps banks run smoother & stay secure while handling compliance docs like KYC checks.
Healthcare: Piling up those patient records or processing insurance claims becomes simpler—leading to better patient care and smoother operations.
Retail: Judging by how purchase orders, invoices, & inventory records get managed easily makes audits a breeze while boosting overall efficiency.
Legal: Speeding up case file management or checking contracts helps law firms save time while cutting down on errors in vital actions.
Immigration: Making application processes faster helps improve accuracy—all super important for timely decisions!
Conclusion:
The mix of OCR and AI doesn't just tidy up document handling; it brings real precision & efficiency that can change how businesses operate. As industries continue their digital journeys, using smart document processing will be key. By leaning into these technologies, organizations can become more agile, comply better with regulations, and keep customers happy too! These tech tools are not just about upgrades—they’re about gaining an edge in today’s business landscape!
#ocr technology#ai#intelligentautomation#document ai#optical character recognition#artificial intelligence
0 notes
Text
0 notes
Text
#handwritten character recognition neural network#optical character recognition#ocr#inter-layer webs#four (4) layer neural network
0 notes
Text
#writing#script#calligraphy#pinyin#ocr#optical character recognition#ideographs rendered in pinyin for easier digital input and output and processing
0 notes
Text

How to Select the Right KYC Verification Service for Your Business?
It has always been crucial to know who your consumers are, whether it is related to online marketplaces, banking institutions, or almost any other organisation that needs to identify its clientele. KYC verification protects the business against fraud and guarantees that you are in compliance with laws. Finding the finest KYC verification service is important, though, as there are numerous options accessible.
#kyc verification#id verification#optical character recognition software#anti money laundering#transaction monitoring
0 notes
Text
Anyone who says AI is the future has never run OCR on a document.
0 notes
Text
Rekomendasi Layanan Optical Character Recognition (OCR) Terbaik dari Verihubs
Kita telah memasuki era di mana digitalisasi menjadi kunci untuk efisiensi operasional. Dan teknologi Optical Character Recognition (OCR) telah muncul sebagai salah satu sarana yang sangat berguna bagi perusahaan, untuk mengelola arsip perusahaan dengan lebih mudah, aman, akurat serta efisien. Dengan kemampuannya untuk mengubah teks yang tercetak atau ditulis tangan menjadi teks yang dapat…

View On WordPress
0 notes