PDF-to-Text

A few simple Python scripts to extract text from text-based or OCR-ed PDF files:

PDF-to-Text-A

This code searches only through the specified directory for PDF files, extracts their text, and saves them as individual text files in the specified output directory.

PDF-to-Text-B

This code searches only through the specified directory for PDF files, extracts their text, and combines it to save it as one text files in the specified output directory.

PDF-to-Text-C

This code searches through the specified directory and all its subdirectories for PDF files, extracts their text, and saves them as individual text files in the specified output directory.

PDF-to-Text-D

This code searches through the specified directory and all its subdirectories for PDF files, extracts their text, aand combines it to save it as one text files in the specified output directory.

How to use

PDF-to-Text-A

Open the Python script in your code editor.
In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
Save the script and you're ready to go.

PDF-to-Text-B

Open the Python script in your code editor.
In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
Rename the output file 'combined_text.txt' as desired.
Save the script and you're ready to go.

PDF-to-Text-C

Open the Python script in your code editor.
In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
Save the script and you're ready to go.

PDF-to-Text-D

Open the Python script in your code editor.
In pdf_directory = '/path/to/pdf/files' replace /path/to/pdf/files with the actual directory path.
In output_directory = '/path/to/output/directory' replace /path/to/output/directory with the desired output directory path.
In combined_text_file_name = 'combined_text.txt' rename the output file as desired.
Save the script and you're ready to go.

Requirements

To run either of these Python scripts you need to have the PyPDF2 library in your terminal, you can install it using pip: pip install PyPDF2.

Scripts written with the help of GPT-3.5.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PDF-to-Text-A.py		PDF-to-Text-A.py
PDF-to-Text-B.py		PDF-to-Text-B.py
PDF-to-Text-C.py		PDF-to-Text-C.py
PDF-to-Text-D.py		PDF-to-Text-D.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-to-Text

PDF-to-Text-A

PDF-to-Text-B

PDF-to-Text-C

PDF-to-Text-D

How to use

PDF-to-Text-A

PDF-to-Text-B

PDF-to-Text-C

PDF-to-Text-D

Requirements

About

Languages

damianodamiani/PDF-to-Text

Folders and files

Latest commit

History

Repository files navigation

PDF-to-Text

PDF-to-Text-A

PDF-to-Text-B

PDF-to-Text-C

PDF-to-Text-D

How to use

PDF-to-Text-A

PDF-to-Text-B

PDF-to-Text-C

PDF-to-Text-D

Requirements

About

Resources

Stars

Watchers

Forks

Languages