Canonical Fragment Identifier

Summary

Canonical Fragment Identifier (epubcfi) defines standard method for referencing arbitrary content within an ePub publication through the use of fragment identifier.

Features

It supports interoperable i.e. references to a reading position created by one reading system can be used by other system.
Document references to ePub content is enabled in the same way that existing hyperlink enable references through the web.
Each location in an ePub file can be identified without the need to modify the document.
All fragment identifiers that reference the same location are equal when compare.
Comparison operations; including tests for sorting and comparison, can be performed without accessing the referenced files.
Simple manipulation is possible without access the original files (eg. given a reference deep in a file, it is possible to generate reference to the start of file.
Identifier resolution are reasonably efficient (eg. processing of first chapter is not required to resolve a fragment identifier points to last chapter)
References are able to recover their target location through parser variation and document revision
Expression of simple, contiguous ranges are supported
An extensive mechanism to accommodate future reference recovery heuristic are provided.

Following are two types of CFI

Standard EPUB CFI
Intra publication EPUB CFI

EPUB CFI Syntax

Following are few points on EPUB CFI syntax

For HTML Documents, IDs and named anchors are used as fragment identifiers
For XML Documents, the shorthand XPointer [XPTRSH] notation is used to refer a given point.

Following is one sample example

book.epub#epubcfi(/6/4[chapter01ref]!/4[body01]/10[para05]/3:10)

CFI consists of initial sequence epubcfi that identifies this particular reference method, and a parenthesized path or range.
A path is built of as a sequence of structural steps to reference a location
A range is a path followed by two local (or relative) paths that identify start and end of range.
Steps are denoted by the forward slash character (/), and are used to traverse XML content.
The last steps in CFI path represents a location within a document, either structural element, textual or visual.
Such terminating steps may be complemented by an optional offset; which denotes a particular character position, temporal or spatial fragment.
Substring in brackets are extensible assertion that improve the robustness of traversing paths and migrating them from one revision of document to another.
These assertions preserve additional information about traversed elements of the document, which makes it possible to recover intended location even after some modification are made.

EPUB Example

Package Document

<?xml version="1.0"?>

<package version="2.0" 
         unique-identifier="bookid" 
         xmlns="http://www.idpf.org/2007/opf"
         xmlns:dc="http://purl.org/dc/elements/1.1/" 
         xmlns:opf="http://www.idpf.org/2007/opf">
    
    <metadata>
     <dc:title>…</dc:title>
     <dc:identifier id="bookid">…</dc:identifier>
     <dc:creator>…</dc:creator>
        <dc:language>en</dc:language>
    </metadata>
    
    <manifest>
        <item id="toc"
              properties="nav"
              href="toc.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="titlepage" 
              href="titlepage.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter01" 
              href="chapter01.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter02" 
              href="chapter02.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter03" 
              href="chapter03.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter04" 
              href="chapter04.xhtml" 
              media-type="application/xhtml+xml"/>
    </manifest>
    
    <spine>
        <itemref id="titleref"  idref="titlepage"/>
        <itemref id="chap01ref" idref="chapter01"/>
        <itemref id="chap02ref" idref="chapter02"/>
        <itemref id="chap03ref" idref="chapter03"/>
        <itemref id="chap04ref" idref="chapter04"/>
    </spine>
    
</package>

Sample chapter01.xml

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
     <title>…</title>
    </head>
    
    <body id="body01">
     <p>…</p>
     <p>…</p>
     <p>…</p>
     <p>…</p>
        <p id="para05">xxx<em>yyy</em>0123456789</p>
     <p>…</p>
     <p>…</p>
     <img id="svgimg" src="foo.svg" alt="…"/>
     <p>…</p>
     <p>…</p>
    </body>
</html>

Sample EPUB CFI

epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)

Reference to the img element.

epubcfi(/6/4[chap01ref]!/4[body01]/16[svgimg])

Reference to the location just before xxx.

epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/1:0)

Reference to the location just before yyy.

epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:0)

Reference to the location just after yyy.

epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3)

<a href="../pub.opf#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3[;s=b])">location</a>

EPUB CFI Processing

Step with a slash (/)

A step with slash (/) followed by positive integer refers to either child element or a chunk of character data.

Canonical Fragment Identifier

Monday, November 18, 2013