Summary
Canonical Fragment Identifier (epubcfi) defines standard method for referencing arbitrary content within an ePub publication through the use of fragment identifier.
Features
- It supports interoperable i.e. references to a reading position created by one reading system can be used by other system.
- Document references to ePub content is enabled in the same way that existing hyperlink enable references through the web.
- Each location in an ePub file can be identified without the need to modify the document.
- All fragment identifiers that reference the same location are equal when compare.
- Comparison operations; including tests for sorting and comparison, can be performed without accessing the referenced files.
- Simple manipulation is possible without access the original files (eg. given a reference deep in a file, it is possible to generate reference to the start of file.
- Identifier resolution are reasonably efficient (eg. processing of first chapter is not required to resolve a fragment identifier points to last chapter)
- References are able to recover their target location through parser variation and document revision
- Expression of simple, contiguous ranges are supported
- An extensive mechanism to accommodate future reference recovery heuristic are provided.
Following are two types of CFI
- Standard EPUB CFI
- Intra publication EPUB CFI
EPUB CFI Syntax
Following are few points on EPUB CFI syntax
- For HTML Documents, IDs and named anchors are used as fragment identifiers
- For XML Documents, the shorthand XPointer [XPTRSH] notation is used to refer a given point.
Following is one sample example
book.epub#epubcfi(/6/4[chapter01ref]!/4[body01]/10[para05]/3:10)
- CFI consists of initial sequence epubcfi that identifies this particular reference method, and a parenthesized path or range.
- A path is built of as a sequence of structural steps to reference a location
- A range is a path followed by two local (or relative) paths that identify start and end of range.
- Steps are denoted by the forward slash character (/), and are used to traverse XML content.
- The last steps in CFI path represents a location within a document, either structural element, textual or visual.
- Such terminating steps may be complemented by an optional offset; which denotes a particular character position, temporal or spatial fragment.
- Substring in brackets are extensible assertion that improve the robustness of traversing paths and migrating them from one revision of document to another.
- These assertions preserve additional information about traversed elements of the document, which makes it possible to recover intended location even after some modification are made.
EPUB Example
Package Document
<?xml version="1.0"?>
<package version="2.0"
unique-identifier="bookid"
xmlns="http://www.idpf.org/2007/opf"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf">
<metadata>
<dc:title>…</dc:title>
<dc:identifier id="bookid">…</dc:identifier>
<dc:creator>…</dc:creator>
<dc:language>en</dc:language>
</metadata>
<manifest>
<item id="toc"
properties="nav"
href="toc.xhtml"
media-type="application/xhtml+xml"/>
<item id="titlepage"
href="titlepage.xhtml"
media-type="application/xhtml+xml"/>
<item id="chapter01"
href="chapter01.xhtml"
media-type="application/xhtml+xml"/>
<item id="chapter02"
href="chapter02.xhtml"
media-type="application/xhtml+xml"/>
<item id="chapter03"
href="chapter03.xhtml"
media-type="application/xhtml+xml"/>
<item id="chapter04"
href="chapter04.xhtml"
media-type="application/xhtml+xml"/>
</manifest>
<spine>
<itemref id="titleref" idref="titlepage"/>
<itemref id="chap01ref" idref="chapter01"/>
<itemref id="chap02ref" idref="chapter02"/>
<itemref id="chap03ref" idref="chapter03"/>
<itemref id="chap04ref" idref="chapter04"/>
</spine>
</package>
Sample chapter01.xml
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>…</title>
</head>
<body id="body01">
<p>…</p>
<p>…</p>
<p>…</p>
<p>…</p>
<p id="para05">xxx<em>yyy</em>0123456789</p>
<p>…</p>
<p>…</p>
<img id="svgimg" src="foo.svg" alt="…"/>
<p>…</p>
<p>…</p>
</body>
</html>
Sample EPUB CFI
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)
Reference to the img element.
epubcfi(/6/4[chap01ref]!/4[body01]/16[svgimg])
img element.
Reference to the location just before xxx.
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/1:0)
xxx.
Reference to the location just before yyy.
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:0)
yyy.
Reference to the location just after yyy.
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3)
yyy.
<a href="../pub.opf#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3[;s=b])">location</a>
EPUB CFI Processing
Step with a slash (
/)
A step with slash (/) followed by positive integer refers to either child element or a chunk of character data.