Skip to content

Latest commit

 

History

History
185 lines (119 loc) · 11.8 KB

README.md

File metadata and controls

185 lines (119 loc) · 11.8 KB

pdf2epubEX

This Bash script uses the pdf2htmlEX tool to convert a PDF file to an ePub file.

The result is a fixed layout ePub version 3: the layout is perfectly retained and all the fonts are embedded.

The pdf2htmlEX tool converts a PDF file into HTML5 (with CSS, JS, fonts, and bitmap and/or vector images). This means that the pages are not just converted into images as a lot of converters are doing.

Using the Bash script

Usage

To convert myfile.pdf to myfile.epub, run the following command in the directory where the PDF file is located:

./pdf2epubEX.sh myfile.pdf

Result will be: myfile.epub

Prerequisites

  • Download the Bash script: pdf2epubEX.sh.
  • Install pdf2htmlEX and some other utilities: poppler-utils, bc, zip and file. If you are using Linux Debian or a Debian based Linux distribution (Ubuntu, Mint, etc.):
apt-get install ./pdf2htmlEX-0.18.8.rc1-master-20200630-Ubuntu-focal-x86_64.deb
apt-get install poppler-utils bc zip file

The Debian package (.deb) is available in this repository.

If you install Git, you can also just do:

git clone https://github.com/dodeeric/pdf2epubEX.git

Using the Docker image

A Docker image is vailable on my DockerHub repository.

Usage

To convert myfile.pdf to myfile.epub, run the following command in the directory where the PDF file is located:

docker run -ti --rm -v `pwd`:/pdf dodeeric/pdf2epubex pdf2epubEX.sh myfile.pdf

The result will be: myfile.epub

You can also use pdf2htmlEX with this same Docker image:

To convert myfile.pdf to myfile.html, run the following command in the directory where the PDF file is located:

docker run -ti --rm -v `pwd`:/pdf dodeeric/pdf2epubex pdf2htmlEX myfile.pdf

The result will be: myfile.html

pdf2htmlEX has a lot of parameters. To see them:

docker run -ti --rm -v `pwd`:/pdf dodeeric/pdf2epubex pdf2htmlEX --help

Prerequisites

You need to install Docker which is available for all computer OS: Windows, MacOS, Linux and Unix. See here.

Parameters

Once you launch pdf2epubEX.sh, some information will be displayed like the book/PDF width and height (in inches and cm), then some questions will be asked like:

  • Format of the images in the epub (png, jpg or svg) [default: jpg]
  • Resolution of the images in the epub in dpi (e.g.: 150 or 300) [default: 150]
  • Title, Author, Publisher, Year, Language: (e.g.: fr), ISBN number, Subject (e.g.: history)

If you want, you can hit ENTER to all the questions.

Image formats:

  • if you chose png or jpg (bitmap formats), the vector images of the PDF will be converted in bitmap images (rasterized).
  • if you chose svg (vector and bitmap format), the vector images of the PDF will remain in vector format, but: a) you cannot chose the resolution of the bitmap images (it is the one from the PDF); b) the bitmap images will be included in the svg files (Base64 coded); c) this format is not always correctly rendered by eBook readers; d) the generated epub file is not always passing the epub check.

A vector image can be as simple as a line, a rectangle, a table frame, a colored background, etc.

For eBooks with a lot of bitmap images, it is better to chose JPG (compression with loss) to not have a file too big. For eBooks with mainly vector images, it is better to chose PNG (lossless compression).

The ePub cover image will be made from the first page of the PDF file (png format).

Examples

In the examples below, the HTML version is one big file including everything (all the pages with HTML5, CSS, JS, fonts and images; fonts and images are coded in Base64, which can make the file quite big). pdf2htmlEX can also put all that content in different files (.html, .css, .js, .woff, .png, .jpg, .svg); that's in fact what basicaly the pdf2epubEX.sh script does before wripping all the files in one ePub container file (.epub). Sometime, ePub is referred as "website in a box".

Legends:

  • Number in parentheses: the size of the file in MB.
  • Hashtag in parentheses: the ePub file does not pass the epub check validation using version epub 3.2 rules (commands not allowed in some svg files). This does not mean the ePub will not be displayed properly in most ePub readers.
  • ePub written in bold: the recommended ePub version.

CEB 2015 - Solides et figures
(24 pages, only vector images in the PDF)

150 DPI 300 DPI
PDF PDF (0.3)
SVG ePub (1.0)(#)
JPG ePub (0.6) ePub (1.1)
PNG ePub (0.7) ePub (1.5)
SVG HTML (2.2)
JPG HTML (1.8) HTML (5.7)
PNG HTML (1.1) HTML (2.5)

Install your own OpenStack Cloud
(49 pages, bitmap and vector images in the PDF)

150 DPI 300 DPI
PDF PDF (1.0)
SVG ePub (1.4)
JPG ePub (1.5) ePub (2.0)
PNG ePub (1.6) ePub (3.2)
SVG HTML (2.9)
JPG HTML (5.3) HTML (14.0)
PNG HTML (3.0) HTML (6.4)

La dynastie belge en images
(248 pages, lot of bitmap images in the PDF)

150 DPI 300 DPI
PDF PDF (56) PDF (396)
SVG ePub (78)(#) ePub (504)(#)
JPG ePub (48) ePub (150)
PNG ePub (209) ePub (628)
SVG HTML (142) HTML (895)
JPG HTML (69) HTML (217)
PNG HTML (296) HTML (869)

Vector image quality in different formats (zoom of 500 %):

  1. SVG (vector format):
Vector
SVG
  1. PNG (bitmap format, lossless compression):
150 DPI 300 DPI
PNG-150 PNG-300
  1. JPG (bitmap format, compression with loss):
150 DPI 300 DPI
JPG-150 JPG-300

Additional information

Book

The script is based on the method described in my book published in 2014: Fixed Layout ePub: A Practical Guide to Publish eBooks from PDF Files. It is available on Amazon and on Googgle Play Books.

Fix Layout ePub

To read a fix layout ePub, the best device is a tablet (Android or iOS/iPad). A smartphone is not adapted most of the time because of the too small screen size.

A lot of ePub reader apps exist (to read reflowable text ePub and fixed layout ePub) available on different platforms (Android, iOS, Windows, MacOS, or Linux): Google Play Books, BookShelf, PocketBook, Adobe Digital Editions, Apple Books (only on iOS; formely known as Apple iBooks), etc.

Amazon Kindle does not support the standard ePub format (they have their own format which is based on the ePub format).

To use Google Play Books, you have to go to Settings, then set Enable uploading. The uploaded eBooks (PDF or ePub) will be available on all devices using the same Google account. You can also upload eBooks from the Google Play Books web interface (see the Upload files button on the top right corner). Please note that the ePub file has to pass a pre-check to be able to be hosted in the Google cloud.

More about fixed layout (FXL) ePub version 3 specifications (IDPF / W3C): Fixed Layouts (EPUB Content Documents 3.2) and Fixed-Layout Properties (EPUB Packages 3.2).

Other Git Repositories

Repositories for pdf2htmlEX: the original one and the new one (with updated .deb packages).

This script is based on the Bash scripts written by Robert Clayton (RNCTX) and available in his Git repository.