How to write easily-scannable notes
About two years ago, I started a project to digitize all of my university degree notes. I've been doing it off and on, and here are the things I would have done while writing the notes in order to make them easily scannable.
- The most important thing: don't use thin paper. Only use 20lb or 24lb paper: this doesn't flex as much when it's forcefully pulled through the sheet feeder.
- Write on one side of the paper. Yes, this doubles the size of your notes, but trust me, it's worth it: you don't need to find a sheet-fed scanner that can do double-sided paper, or do weird magic with scanning one side, then the other, then interleaving the scanned files
Some other 'ideal' (but probably not reasonable) circumstances
- Don't hole-punch the notes. Especially if they're on thin paper
- Don't staple: this bends the paper and makes the sheet feeder work harder to pull the pages apart. If you haven't stapled too far into the paper, cut the stapled portion off the paper with a utility knife: this keeps the sheets smooth, and as long as your sheet-feeder doesn't grab the paper where it was stapled, you should be alright.
Other tips you might find useful after the fact:
- A scanner with a sheet feeder is worth the extra money. It saves so much time, even if you have to feed sheets into it by hand. This is faster than trying to make two passes on a pile of paper, if you've got lots of double-sided notes.
- Don't bother with color-correcting and cropping: it's not worth the time when you have 800+ course-hours of notes, and there's always the chance you won't need to reference large sections of those notes ever again. If your scanner is good you need not worry about horribly misaligned notes: it just will not happen.
- Scan in grayscale at 150 DPI: modern scanners can do this all in one pass. It makes scanning quick and easy. If scanning in grayscale is not feasible (e.g. colored diagrams) do as much in greyscale as possible.
- Save as the files as png: there will be lots of whitespace, and run-length encoding will remove the disk space that whitespace would otherwise consume. As a result, a 4 MB raw file will condense down to 30-50kB
I suppose, in an ideal world that you'd buy 24lb white bond paper, go to class each day, take your notes, return home and scan them, then holepunch them and put them chronologically in a binder. But how close do we ever get to 'ideal'?
Any other tips that people want to offer for scanning large volumes of paper?
Comments
Always check to make sure the scanner got each page, and didn't pull a few through together.
Many modern scanners can scan straight to PDF. For a course-note situation, if you can get it to scan to Searchable PDF, that can be even better. Note that this scenario involves some ocr, but leaving all the imagery exactly intact (rather than regular ocr that screws up everything and puts it into Word).
The problem I'm currently having is that I'm trying to have a Brother multifunction dealie scan to 300 dpi in Black and White (not by choice, this is required), but the Brother refuses, flat out refuses, to scan in Black and White. Which means I then have to go into PaperPort and convert the PDF to Black and White. Which is doable, but a pain as it adds an extra, theoretically unnecessary step.
Oh, PaperPort can be superhandy, at that. It can stack and unstack PDF pages (as long as they're made up of images), which is handy if you have to add in a page the scanner missed or something. Other brands have similar software, Sharp's SharpDesk for example.
> Always check to make sure the scanner got each page, and didn’t pull a few through together.
I have an easy way to ensure this: my scanner jams with an almost 100% certainty if it does.
As far as the source image, I usually scan direct to bitmaps, then run a mass-converter on the results (mogrify is a wonderful, wonderful thing) to png.
Recently I've been scanning my notes at 200dpi: I find it doesn't take any longer per page. For some reason, 300DPI takes more than twice as long with this scanner.
if i was in a world where i have tonnes of free time,i would scan a page,straighten it and color correct it,ocr it to word,then save it as pdf then add metadata and bookmarks to the pdf
until then i have a photoshop action (macro) that scans them as 200dpi color from the sheetfeeder,then applys metadata based on that course being scanned and then saves them sequentually,all while i'm having lunch.when i get some free time i ocr and pdf them
acrobat 8 has ocr built in
G,
I had the same problem with a 1-bit scan on a Brother MFC-8860DN converting into an 8-bit JPEG when scanning straight to PDF.
Brother told me to scan directly into PaperPort instead of using ControlCenter, and that fixes the problem. A B&W scan into PDF produces a PDF that is just B&W. No conversion to an 8-bit JPEG.
Thanks Bob, that saves me a call to Brother.
I had discovered what PaperPort could do, but I was still going to call Brother because the PaperPort solution isn't as ideal as if ControlCenter actually worked properly. PaperPort adds steps - extra software open, onscreen dialogs, and it saves into whatever folder you last looked in rather than a default. I'll have to check the dialogs again to see if I can at least set a default folder, else it's going to cause chaos.
I have to have it ready for someone low-tech to use at times, so I may end up going the ControlCenter route + conversion if I can't get PaperPort to act predictably.
Post new comment