HPR3384: Page Numbers in EPUB eBook Files




Hacker Public Radio show

Summary: This episode is a response to hpr3367 by Andrew Conway and Dave Morriss. One of the topics they brought up was the thorny issue of page numbers in e-books. Most of the time you don't need to worry about page numbers in ebooks, if you're reading fiction for example. The whole point of an ebook is that the texts can reflow to fit the page no matter what size the screen is or what font-size you've chosen. This is a major accessibility feature of all e-book formats. One reason you might want to specify actual page numbers, though, is if you're dealing with a technical or academic book, and you need to be able to refer to specific passages in the book by page number, as you are expected to do in academic research. Or, as Andrew and Dave were discussing, you might need to create an index in your ebook that would send your readers back to specific pages like in a paper book. I've thought about this before but never really gotten into the weeds and figured out how to make it happen. In fact, when I was creating the new digital editions of the Counterpoint textbooks like I discussed in hpr1512, I actually took the trouble to put page number anchors through the entire thing, so that at a future date I would be able to enable real page numbers. This was a key part of the source file's infrastructure, which helped me quickly find the passages I was working on in my huge HTML file. Those anchors are not quite in the correct format for EPUB, but they are consistent and I will easily be able to write a script to fix them. I haven't done that yet, but now that I figured out how to do it on some smaller examples, this is on my to-do list. Anyway while I was listening to Dave and Andrew talk about this, I thought I remembered reading somewhere that in the newest ePub specification, EPUB 3, there was support for publisher's page numbers to deal with precisely this issue. Their discussion prompted me to see if I could make it work. I'm happy to report success, although with some qualifications, which I will get into. Converting to EPUB 3 The first thing to do is to upgrade your ebook from EPUB2 to EPUB3. There are a couple of ways to do this. The way I did it was to use the ebook editor in a recent version of Calibre. When you open up the EPUB for editing, go to the Tools menu and choose Upgrade book internals. This will create the new navigation file nav.xhtml to replace the old toc.ncx file. You'll need to edit this new file later to enable the page numbers. Insert page anchors Next you need to put your page anchors in there. This could be very tedious if you haven't done any preparatory work, such as putting visible page numbers in plain sight in square brackets [21] the way I did for a couple of ebooks. It wasn't very elegant, but at least it was easy to find where the page breaks were. I have a Blather voice command that triggers a python script to create these things. Here's an example of page number anchor, which goes in the main text of the book wherever you want to insert a page number. This will not be visible to the reader inline. This is for page 57: <span epub:type="pagebreak" id="page57" title="57"></span> Page List in Navigation File Finally you need to put a page list in the new navigation file. This is simply an ordered list with hyperlinks to every page anchor that you put in your ebook. This will not be visible to the reader, but it's critical to making everything work. Here's a minimal example from my first attempt. This only covers Pages 122 to 126. This is the kind of page numbering you might need if you created an ebook from a five-pa