PDF to ePub conversion is a lot of hard work. The task of PDF to reflowable ePub3 conversion can get even more daunting when eBooks are written in native languages. It is a lot of sweat for the type designer and linguist expert, who have to work in perfect sync all along to put the character coding pieces of the puzzle together.
This is Part 1 of our Digital Publishing series blog that talks about the major obstacles you should expect to hit you when you convert PDF to ePUB on a similar conversion route with PDF books written in local script languages.
With millions of pages of ePub3 conversions already done with our cloud publishing platform Kitaboo, creating multiple volumes of eBooks written in native Indian languages from PDF files was an instant, “Yes we can!”
No sooner we started than we realized that our time taken for each PDF to ePub conversion calculations had to be put on reset mode. And, what followed was a journey of finding the shortest, fastest and most accurate route of ePub conversion to deliver on timelines.
The two major hurdles that were impacting our (converting PDF to ePub) eBooks conversion process speed was:
- PDF is a character-position driven document and it defines each character by an X &Y axis coordinates only, whereas, ePub3 depends on character sequence/order for creation of eBooks.
- Character encoding and font shaping had to be matched to render the linguistically correct reading order in ePub3. Thousands of errors appeared during the eBook conversion, and we realized it was a long road ahead, that too in the reverse gear, to untangle the character representation problems in HTML5.
The character tantrums had to be disciplined and put in order. What followed was a journey of in-depth complex script analysis, prediction of character behaviour, font shaping & matchmaking in reflowable ePub3. The team had to manually look-up for errors word-by-word, carry-out corrections on each page and proofread the final pages all along the conversion route.
Speed Breakers white converting PDF to ePub
A broad breakdown of major hurdles that were faced included:
- PDF character encoding: Our first speed-breaker was the show-up of multiple aberrations in the character encoding order for the local Indian languages during PDF to ePub3 conversion. The PDF did not recognize the sequence and order of characters in the words/sentences which was a prerequisite for reflowable ePub3. This led to many mismatch errors that had to be looked into in detail with character-by-character sequence/order analysis.
- Character mismatch: The native language font had many pre-conditions to the character sequencing with other forms of consonants, vowels, and ligatures that had to be manipulated to sync the logical order and visual order of the text. The linguistic, phonetic and graphical order was incorrect in many words/sentences representation.
- Font/character mapping: The shaping features of characters, its composition and decomposition were inconsistent and had to be constantly monitored for all the book pages with respect to the universal shaping engine. Continuous re-testing had to be done to make sure the font rules & specifications were running successfully.
- Manual proof-reading: A large number of errors in characters encoding made the PDF to ePub conversion process very slow and time-taking process with high manual dependence for proof-reading. To make the eBooks error-free reflowable ePub3 was taking days for the team, going back and forth character-wise validation manually. Even OCR did not give a 100 percent error-free document and required manual re-checking.
- Formatting errors: There were many challenges with the overall synchronized reading order, positioning of tables, super-script, sub-script, header footer format, and images layout.
The speed we were at was absolutely unviable and we had to think of a faster way that could accelerate our performance without compromising on the quality.
The result – an innovative character encoding tool was developed by the Kitaboo team to support automation of eBooks in native languages.
The fastest way to convert PDF to ePub with Kitaboo:
Books written in native languages need ePub super specialists to solve the character encoding maze. Kitaboo team – a pro at eBook publishing was successful in solving this cumbersome task for the worlds’ largest online book publisher crashing their time to market by more than 70 percent.
If you have books written in native languages and looking to convert them to ebooks, all you need to do is sign up for a free demo here
Latest posts by Mike Harman (see all)
- How to Choose The Best eReader for Your Institution - January 24, 2020
- PDF or ePUB- Which is the Better Format for ePublishing? - January 16, 2020
- All You Wanted to Know About Digital Textbooks - January 14, 2020