Almost Timely News: ๐️ Using Local AI for Document Scanning (2026-02-01)
Almost Timely News: 🗞️ Using Local AI for Document Scanning (2026-02-01)Mundane? Yes. Useful? Also yes.Almost Timely News: 🗞️ Using Local AI for Document Scanning (2026-02-01) :: View in Browser The Big PlugTwo new things to try out this week: 1. Got a stuck AI project? Try out Katie’s new, free AI Readiness Assessment tool. A simple quiz to help predict project success. Content Authenticity Statement100% of this week’s newsletter content was originated by me, the human. Learn why this kind of disclosure is a good idea and might be required for anyone doing business in any capacity with the EU in the near future. Watch This Newsletter On YouTube 📺Click here for the video 📺 version of this newsletter on YouTube » Click here for an MP3 audio 🎧 only version » What’s On My Mind: Using Local AI for Document ScanningThis week, let’s dig into a very specific application of local AI. Last week we covered how to get started with private, local AI which I recommend you review and do. We’ll be building on that. If you don’t want or can’t get a local AI model running, consider using an infrastructure provider like DeepInfra or Groq (note the spelling) as they can provide low cost access to today’s best models, often with zero data retention APIs. The specific application of local AI we’re looking at this week is something seemingly mundane: document scanning. Now, you might say, “Chris, that is the more boring, mundane, unsexiest use of generative AI, don’t you have anything more interesting?” But something like document scanning is the epitome of the Shirky Principle: once a technology is technologically boring, it can be societally interesting. Using generative AI for document scanning is boring, but there are plenty of documents in the world that are very difficult to read through normal scanning. Photographic scans of paperwork. Documents with charts and graphs and images embedded in them. Weirdly formatted tables. Partially redacted documents. All those are things that can throw regular document scanners for a loop. Generative AI models trained to be document scanners can overcome many of those issues. So let’s dig in. Part 1: A Pre-Emptive GlossaryBefore we dig into the how-to, let’s take a few moments to describe the what. Document scanning is a profession unto itself and has a lot of lingo and jargon - jargon that, if you know, makes it easier to work with AI. The most common term you’ll hear is OCR - optical character recognition. This is what a lot of computer vision software started with, the need to scan letters and convert analog text (like the printed page) into digital text. Additionally, as models got more powerful and compute got bigger, OCR improved to start reading handwriting. Today, you can take a photo of handwritten text even from centuries past and most AI models will be able to transcribe it. As a fun aside, I tried that with some Sumerian text from a local museum, from the Museum of Fine Arts in Boston, and Google’s Gem and I was actually able to read it with reasonable accuracy. Transcribe itself is a specific word in document scanning, something you’ll want to note for AI prompts. To transcribe is to write down text, word for word, as closely as possible to the original. In general, you often want AI to transcribe something first before doing anything else, so you can check the quality of its work. Many people make the mistake of trying to have AI do too much and just process an entire document in one shot, rather than break down the steps of the workflow. When we talk about using AI for document scanning, we are often talking about VLMs, vision language models. VLMs are models that can work with images as well as text (and sometimes video). They can “see” in ways that a text model cannot, because they’ve been trained on images as well as text. When we’re doing document scanning, we want to make sure we’re using a vision language model for the processing part. Speaking of which, there is a distinct workflow for doing document scanning with AI:
A few other terms you’ll want to know: SQLite is one of the most useful database formats there is, because it’s a single file that lives on your computer. Unlike bigger systems that require servers (MySQL, PostgreSQL, Microsoft SQL Server, Google BigQuery, etc.) SQLite is just a single flat file that lives in any folder. You can pick it up and move it around if needed. It’s also a database format that AI is especially fluent in and knows how to manipulate, which comes in handy when we’re talking about document scanning and storage. Open source software is any software that is licensed for other people to use and modify, often for free, even for commercial use (depending on the license). Many of the world’s top systems and software are open source, such as the Apache web server, the Linux operating system, many programming languages, and other core technologies. Often abbreviated FOSS (free and open source software), open source is what powers a lot of the modern Internet. Generative AI has extensive knowledge of open source software, which comes in handy for not reinventing the wheel. Python is probably the most common programming language in the world now, and certainly the most popular language in open source software. Python version 3.12 and 3.13 are the versions that many libraries (basically like plugins) that AI tools depend upon. Finally, context window. All AI has two kinds of memory, long term and short term. Long term memory is the data the AI has been trained on, and as of today, when you use any AI model, that memory does not change. It’s why so many AI tools integrate web search. The short term memory, or working memory, is called the context window. For the purposes of building and working with your own local AI, the bigger you make it (you set it in your software, like LM Studio/AnythingLLM) the more memory your AI consumes and the slower it runs. That’s another reason why lots of small tasks are better than one big task - it’s far more resource efficient. Okay, now that we’ve got the book learning out of the way, let’s dig in. Part 2: Choosing a ModelAssuming you completed the setup from last week’s newsletter, you should have either LM Studio or AnythingLLM set up on your computer. We now need to find an AI model that will work for document scanning. There are a ton of excellent choices out there, some of which include:
Ideally, if you have the time and resources, take a couple of pages from the kind of document you want to work with, install all 3 models, and then do a test run to see which model is most accurate for the specific kind of task you’re doing. For example, if you’re doing old manuscripts that were handwritten in 18th century penmanship, you might find that Mistral OCR 3 does a better job than Qwen or DeepSeek. If you’re dealing with scanned notes from cold cases from the 1970s written on old typewriters, you might find that DeepSeek does a better job. Logically, you wouldn’t be investigating OCR models unless you had a LOT of documents you needed scanned, so take the time to test out models and see how their accuracy performs. You might ask at this point, why wouldn’t we just use Gemini or ChatGPT or Claude for these kinds of tasks? The answer is based on last week’s newsletter - my assumption is that there are plenty of documents that are confidential, that you wouldn’t want in the hands of third parties, especially medical and legal documents. Maybe there are documents that you don’t want someone else to know you’re looking at. For those kinds of documents, local AI is the best choice. Additionally, there are plenty of documents that you might want to process that you simply don’t want to pay for. When you use cloud-based artificial intelligence, you are going to pay per token. And while an individual token may not cost much, if you’re talking about millions of pages of documents, that bill can get very large, very quickly. Local AI will save you that money in exchange for electricity and the computer you’re running it on. Part 3: The AutomationHere’s the challenge with lots of document scanning. You can, pretty easily, take a short document that’s a few pages long and just drop it into chat and have nearly any AI model give you a good transcript. This is true of local models and cloud-based models (like ChatGPT). But when a document is dozens or hundreds of pages long, or you have archives of thousands of documents, that’s no longer practical. Even fifty pages is a lot for a model like Gemini to handle, and the risk of it hallucinating or skipping pages gets higher as you add more work to its plate. If, on the other hand, you could feed AI one page at a time? It’s perfectly comfortable with that and will give you great results. As I mentioned in the trashy romance novel issue, AI does great when you take a big task and turn it into a ton of small tasks. So our next step is to do exactly that. Using the AI of your choice, we want to design a Python script that can take apart a big document or set of documents, split them into single pages, and then process each page and store it in a database. If you followed the instructions from last week’s newsletter, you should have Python on your computer and ready to go. If you didn’t, go back and do that. Here’s the prompt you can paste into any big AI model like Gemini, ChatGPT, or Claude. Modify the part in the curly braces, then copy and paste into your favorite AI.
After a whole bunch of thinking, you should end up with a single Python script you download and put somewhere on your computer, ideally in the same folder as the folder of documents you want to scan. If you’re not sure what to do, put this prompt in and follow its guidelines, editing the parts in curly braces:
This will give you the step by step instructions for how to actually use the script that the AI software spit out. Run a quick test on a folder with a couple of PDFs and see how it goes. Using the tools for troubleshooting is just as important. If you give your AI exactly the error messages it gives you, it will help you straighten out any bugs that it’s created as well. Part 4: Using the DataOnce your script has used its AI to do vision-based OCR, you should have a database of results. The question is, what do you do with that database? You have a couple of options. First, if you know your way around databases, SQLite is nothing more than a SQL database. You can use any FOSS SQLite app to browse the data, write inquiries, etc. My recommended client is one called DB Browser for SQLite - it’s available for free for Mac, Windows, and Linux. If you’re not fluent in SQL (the language of SQLite), then as long as you have an AI agent that can see the database file itself - Claude Code, Claude Cowork, Google Antigravity, OpenAI Codex, etc. - then you can ask natural language questions of the AI agent and have it interrogate the database directly. Here’s an example prompt - note that you MUST have the AI agent running in the folder you’re working in; see this Trust Insights livestream for getting started with Claude Cowork and this one for getting started with Claude Code.
This will give you a CSV spreadsheet you can open in the spreadsheet software of your choice - though I do recommend learning SQL, as it’s a super handy language to know. Once you’ve got the data, it’s up to you what you do with it. And you can ask generative AI to route the data any way you want it; you don’t have to use SQLite. If you’ve got existing systems that work well for you, if you have APIs that work well for you, you can have your AI of choice modify your Python script for those instead. For example, suppose you were scanning in a ton of receipts, and maybe you have a system like Quickbooks. As long as Quickbooks has an API you’re allowed to use, you could have your script send your receipts text to it instead. Part 5: Wrapping UpDocument transcription can be incredibly boring to start, but once you start learning how to work with local AI on it, you’ll find tons of uses. For example, one of my favorite use cases for a vision language model exactly as we set up in this article is to rename all those “Screenshot 2026-01-31 14:00:12.png” files littering my computer into things like “cat_sitting_on_my_head.png” so that I have a better sense of what’s in my actual images folder. The same thing is true for PDFs - “EFTA01264412.pdf” becomes “att_wireline_phone_records.pdf”, a far more helpful filename. The sky’s the limit once you start using local AI, in partnership with code you write with big cloud AI, to get seemingly boring, mundane tasks done. Is it the fanciest thing that’s gonna get you lots of clout on LinkedIn or the other social networks of your choice? No. Will you be more productive? Yes. How Was This Issue?Rate this week’s newsletter issue with a single click/tap. Your feedback over time helps me figure out what content to create for you. Here’s The UnsubscribeIt took me a while to find a convenient way to link it up, but here’s how to get to the unsubscribe. If you don’t see anything, here’s the text link to copy and paste: https://almosttimely.substack.com/action/disable_email Share With a Friend or ColleaguePlease share this newsletter with two other people. Send this URL to your friends/colleagues: https://www.christopherspenn.com/newsletter For enrolled subscribers on Substack, there are referral rewards if you refer 100, 200, or 300 other readers. Visit the Leaderboard here. ICYMI: In Case You Missed ItHere’s content from the last week in case things fell through the cracks:
On The TubesHere’s what debuted on my YouTube channel this week: Skill Up With ClassesThese are just a few of the classes I have available over at the Trust Insights website that you can take. PremiumFree
Advertisement: New AI Book!In Almost Timeless, generative AI expert Christopher Penn provides the definitive playbook. Drawing on 18 months of in-the-trenches work and insights from thousands of real-world questions, Penn distills the noise into 48 foundational principles—durable mental models that give you a more permanent, strategic understanding of this transformative technology. In this book, you will learn to:
Stop feeling overwhelmed. Start leading with confidence. By the time you finish Almost Timeless, you won’t just know what to do; you will understand why you are doing it. And in an age of constant change, that understanding is the only real competitive advantage. 👉 Order your copy of Almost Timeless: 48 Foundation Principles of Generative AI today! Get Back To Work!Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you’re looking for work, check out these recent open positions, and check out the Slack group for the comprehensive list. Advertisement: New AI Strategy CourseAlmost every AI course is the same, conceptually. They show you how to prompt, how to set things up - the cooking equivalents of how to use a blender or how to cook a dish. These are foundation skills, and while they’re good and important, you know what’s missing from all of them? How to run a restaurant successfully. That’s the big miss. We’re so focused on the how that we completely lose sight of the why and the what. This is why our new course, the AI-Ready Strategist, is different. It’s not a collection of prompting techniques or a set of recipes; it’s about why we do things with AI. AI strategy has nothing to do with prompting or the shiny object of the day — it has everything to do with extracting value from AI and avoiding preventable disasters. This course is for everyone in a decision-making capacity because it answers the questions almost every AI hype artist ignores: Why are you even considering AI in the first place? What will you do with it? If your AI strategy is the equivalent of obsessing over blenders while your steakhouse goes out of business, this is the course to get you back on course. How to Stay in TouchLet’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:
Listen to my theme song as a new single: Advertisement: Ukraine 🇺🇦 Humanitarian FundThe war to free Ukraine continues. If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs your ongoing support. 👉 Donate today to the Ukraine Humanitarian Relief Fund » Events I’ll Be AtHere are the public events where I’m speaking and attending. Say hi if you’re at an event also:
There are also private events that aren’t open to the public. If you’re an event organizer, let me help your event shine. Visit my speaking page for more details. Can’t be at an event? Stop by my private Slack group instead, Analytics for Marketers. Required DisclosuresEvents with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them. Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them. My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well. Thank YouThanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness. Please share this newsletter with two other people. See you next week, Christopher S. Penn Invite your friends and earn rewardsIf you enjoy Almost Timely Newsletter, share it with your friends and earn rewards when they subscribe. |

Comments