How to fix a broken ePUB for Smashwords
Summary: I show you how to troubleshoot and repair EPUBs generated by InDesign which fail to pass the Smashwords and Apple validators.
A couple of months ago, my cover designer asked me to troubleshoot an ebook she was creating for another client. Like all good designers, she was creating the book in InDesign, Adobe’s powerful desktop publishing application. InDesign began life as a print competitor to QuarkExpress but since CS5 or 5.5, it’s allowed designers to export their creations to the popular EPUB format.
EPUB is an open format based on standard web technologies (HTML, CSS, XML) and because it’s open, not only is it widely supported by reading devices, it’s commonly used as the basis for transforming into other formats – notably Amazon’s proprietary Kindle format.
Case in point is Smashwords, an ebook distribution service that will take an EPUB and distribute it to just about anywhere. This includes Apple’s iBooks platform and herein begins our problem.
Apple is extremely strict about EPUB compliance – much more than other vendors – and so to avoid rejections, Smashwords has adopted the same standards. Unfortunately, InDesign can be a bit loose1 with the EPUB standard and so ebooks generated with it can fail to pass the test.
So how to fix them?
1. Identify the problem
Smashwords will provide you with a list of issues if your file isn’t correctly validated. Unfortunately, the information can be confusing if you are not familiar with HTML and the XML schema on which EPUB is based.
If the information isn’t clear you can use the official EPUB Validator. This online service is provided by the developers of the EPUB standard using the open source EpubCheck tool. It can check against the EPUB2 and 3 standards but the online version is limited to files of less than 10MB in size. If your file is larger than this, you will have to download the Java version from GitHub and use it offline2.
If you use the online version, you’ll get a table reporting any errors and warnings, for example:
|ERROR||OEBPS/Text/ch22.xhtml||228||82||Error while parsing file 'element "font" not allowed anywhere; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns="http://www.w3.org/2000/svg")'.|
|ERROR||OEBPS/Text/ch22.xhtml||228||145||Error while parsing file 'element "font" not allowed anywhere; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns="http://www.w3.org/2000/svg")'.|
|ERROR||OEBPS/Text/bio.xhtml||49||193||Referenced resource could not be found in the EPUB.|
|ERROR||OEBPS/Text/title.xhtml||49||143||Referenced resource could not be found in the EPUB.|
|WARNING||OEBPS/Styles/idGeneratedStyles.css||223||2||CSS selector specifies absolute position.|
|WARNING||OEBPS/Styles/idGeneratedStyles.css||235||2||CSS selector specifies absolute position.|
Anything that’s an ERROR must be fixed otherwise Apple (and Smashwords) will reject it. You can ignore warnings if you like, but it’s a good idea to fix them if you can.
The table above shows fours errors. The first two are syntax errors, i.e. malformed XHTML, while the second two reveal that the EPUB has broken links.
The errors are most likely caused by a bug in InDesign so you can’t actually fix it in your InDesign project – instead you have to fix the generated EPUB manually.
2. Fix the errors
EPUB files are constructed similarly to a website, however they are zipped inside a directory to give them the appearance of being a single file. This means you can easily crack them open to repair the contents.
The easiest way to do so is to use Sigil, an open-source EPUB editor for Linux, macOS, and Windows. Alternatively, you can change the file extension to .zip and then open it with any archive extractor and fix the broken files with a programmers text editor.
With the EPUB open in Sigil, it’s simply a matter of opening the problematic file and locating the problem. Thankfully EpubCheck makes this easy by providing us with the file name as well as the LINE number and POSITION.
To find it, open the file, in my example ch22.xtml, and switch to the code view (View -> Code View). Sigil helpfully numbers the lines and in my example the offending bit of code is:
<h2 class="Basic-Paragraph ParaOverride-8"> <font face="Adobe Garamond Pro"> <span style="font-size: 15px; line-height: 17px;"> </span> </font> </h2>
The message in the table reveals the problem. It’s a schema error because InDesign has included an errant font element that breaks the page’s schema definition3. Given that this piece of code only adding is empty space, it can be safely deleted. If there was text inside, instead I’d rewrite the code manually. Since the font element is not permitted, I’d remove it and ensure the element’s tag references the correct CSS class.
The next problem is our missing references in the bio.xhtml and title.xhtml files.
Here’s the first one:
<p class="Basic-Paragraph ParaOverride-6" style="text-align: center;"> <span class="CharOverride-8">Website</span> <span class="CharOverride-7">: </span> <a href="www.REDACTED.com">www.REDACTED.com</a> </p>
This is a simple fix; the href attribute is missing the HTTP prefix – www alone does not make a fully qualified link.
<p class="Basic-Paragraph ParaOverride-6" style="text-align: center;"> <span class="CharOverride-8">Website</span> <span class="CharOverride-7">: </span> <a href="http://www.REDACTED.com">www.REDACTED.com</a> </p>
The second file, title.xhtml (line 49, position 143) is looking for a jpeg file that doesn’t exist.
<div id="_idContainer002"> <img alt="" class="_idGenObjectAttribute-1" src="../Images/REDACTED.jpg"/> </div>
A quick search of the images directory (look in the Book Browser in Sigil’s left-hand column) showed it was incorrectly named, so all I had to do was fix the reference in the img tag’s src attribute to show the actual name.
2.1 Fixing the warnings
I’m not bothering to fix the two WARNINGS for the following reasons:
- Warnings will pass validation and Smashwords/Apple’s checkers.
- I don’t want to muck with the designer’s styles.
- Many eReaders typically ignore styles anyway, or only use a subset of what’s available in the CSS standards. For example, Apple’s iBooks allows users to override many style settings.
3. Save and re-validate
Once you’ve fixed each error, save the file and re-validate using EpubCheck.
Another thing I did was to load the EPUB onto my iPad so I could visually check and confirm that everything was still working and looked good in iBooks.
Assuming you’ve fixed everything you should pass the validator, however I encountered one more problem thanks to the file’s embedded fonts…
3.1 Missing META-INF/encryption.xml file
If you’ve chosen to generate your EPUB with embedded fonts, you may receive the following error after editing you EPUB manually:
File ‘META-INF/encryption.xml’ in EPUB not listed in manifest! Your .epub file is missing one or more elements in its manifest. A complete manifest is required for distribution to Apple. Here’s how Wikipedia (http://en.wikipedia.org/wiki/EPUB ) defines “Manifest”:“The manifest element lists all the files contained in the package. Each file is represented by an item element, and has the attributes id, href, media-type. All XHTML (content documents), stylesheets, images or other media, embedded fonts, and the NCX file should be listed here.”
The META-INF/encryption.xml file is added by InDesign when you attempt to embed fonts. To prevent font piracy, InDesign rather unhelpfully encrypts the font and so this file is needed by the reader to be able to decrypt the file.
Note that you cannot see this file in Sigil as it is located one directory up from OEPBS. You must extract the file in an archive manager to access the /META-INF directory.
If Sigil for some reason removes this file4, you must either restore it or you must remove the encrypted font files from the /OEBPS/Fonts/ directory and replace them with the unencrypted originals. Ensure the file names are exactly the same, otherwise your CSS will break and your file will fail validation.
So there you have it, a guide to troubleshooting and fixing InDesign’s broken EPUBs. Note this isn’t exhaustive (there’s heaps of ways you can screw up an EPUB) but it got us over the line for this particular client and her work is now available for sale in iTunes. Yes, the fix requires a little technical knowledge, but it’s not hard to learn and Sigil can certainly make your life easier.
The fact you have to fix InDesign’s EPUB output beggars the question: should you use InDesign in the first place?
My cautious answer is yes but I’ll add the caveat that only if you are also creating a print version too. Designing for multiple platforms in the one app is obviously a big time saver. The troubleshooting process isn’t particularly onerous and really, 4 errors in a 75,000 word manuscript isn’t bad.
However, if you are only interested in ebook production, there are much better and cheaper tools out there – not least of course is Sigil.