CommandLineFu – REMOVE BATES NUMBER REFERENCE FROM OCR TEXT FILES

find . -type f -iname “*.TXT” -exec sed -i ‘s/<< …-.-…….. >>//g’ ‘{}’ \;
*Assuming original Bates number uses the following format ABC-X-01234567

Depending on how cygwin is configured be sure to convert files back to ‘DOS’ format after using sed
find . -type f -iname “*.TXT” -exec unix2dos ‘{}’ \;

The find command is good if your files are not organized in a cleanly numbered subfolder structure, or are mixed in with other file types.  Usually simply globbing will do the trick a bit faster:

sed -i ‘s/foo/bar/g’ IMAGES/00/0[0-9]/*.TXT ; unix2dos IMAGES/00/0[0-9]/*.TXT

sed == http://sed.sourceforge.net/sedfaq.html

_____________________________________________________________________________________

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s