Pages

Thursday 21 June 2012

Extract Text from PDF, DOC, HTML, CHM, and RTF Files


Have a document in PDF format that you would like to convert to a text document? Or maybe anHTML or CHM (Windows Help File) that you need to convert into simply plain text? Why might this be useful you ask? Most PDF documents are not editable and selecting the text manually can be a tedious process.
You can use Text-Mining-Tool to automatically extract text from a PDF file so that you can use it in any program freely. Or if you cannot open a PDF file because you do not have a PDF viewer installed, you can use this tool to extract the text and read the document.
Text Mining Tool is completely free and does not even require an installation, simply unzip it and run the program to use it.
text mining tool
Click the Open button and choose your file that you want to convert to text. Click ok and the large window below the buttons will eventually fill with all of the text extracted from the document.
extract text
Click Save to save the extracted text to your computer. You can also click Clipboard to copy the mined text to the Windows clipboard.
For convenience, the following hotkeys can be used to perform the operations:
  • Open – F3 or O.
  • Save – F2 or S.
  • Clipboard – F5 or C.
  • Exit – F10 or Escape.
You can also use the minetext console tool to create a batch script for extracting text from multiple files. This can be useful if you have a directory with a large number of files that need to have text extracted.
The included console tool minetext has the following syntax:
minetext <input file>

minetext <input file> <output file>

where:

  <input file>  - any file with one of the following extensions:
                  pdf, doc, rtf, chm, htm, html
  <output file> - file you want to write text mined from input file
If you’re a web designer, this program can be very useful to grab the text from a Word document without getting all of the extra Microsoft Office styling code included with the text.
This is a very simple program that is very simple to use! It has one basic purpose and it does a good job! Enjoy!

No comments:

Post a Comment

Find this on Google+