Monday, November 18, 2013

Canonical Fragment Identifier (epubcfi) Specification

Summary

Canonical Fragment Identifier (epubcfi) defines standard method for referencing arbitrary content within an ePub publication through the use of fragment identifier. 

Features

  1. It supports interoperable i.e. references to a reading position created by one reading system can be used by other system.
  2. Document references to ePub content is enabled in the same way that existing hyperlink enable references through the web.
  3. Each location in an ePub file can be identified without the need to modify the document.
  4. All fragment identifiers that reference the same location are equal when compare.
  5. Comparison operations; including tests for sorting and comparison, can be performed without accessing the referenced files.
  6. Simple manipulation is possible without access the original files (eg. given a reference deep in a file, it is possible to generate reference to the start of file.
  7. Identifier resolution are reasonably efficient (eg. processing of first chapter is not required to resolve a fragment identifier points to last chapter)
  8. References are able to recover their target location through parser variation and document revision
  9. Expression of simple, contiguous ranges are supported
  10. An extensive mechanism to accommodate future reference recovery heuristic are provided.
Following are two types of CFI
  1. Standard EPUB CFI
  2. Intra publication EPUB CFI

EPUB CFI Syntax

Following are few points on EPUB CFI syntax
  • For HTML Documents, IDs and named anchors are used as fragment identifiers 
  • For XML Documents, the shorthand XPointer [XPTRSH] notation is used to refer a given point.
Following is one sample example 
book.epub#epubcfi(/6/4[chapter01ref]!/4[body01]/10[para05]/3:10)

  • CFI consists of initial sequence epubcfi that identifies this particular reference method, and a parenthesized path or range.
  • A path is built of as a sequence of structural steps to reference a location
  • A range is a path followed by two local (or relative) paths that identify start and end of range.
  • Steps are denoted by the forward slash character (/), and are used to traverse XML content.
  • The last steps in CFI path represents a location within a document, either structural element, textual or visual.
  • Such terminating steps may be complemented by an optional offset; which denotes a particular character position, temporal or spatial fragment. 
  • Substring in brackets are extensible assertion that improve the robustness of traversing paths and migrating them from one revision of document to another. 
  • These assertions preserve additional information about traversed elements of the document, which makes it possible to recover intended location even after some modification are made. 

EPUB Example

Package Document
<?xml version="1.0"?>

<package version="2.0" 
         unique-identifier="bookid" 
         xmlns="http://www.idpf.org/2007/opf"
         xmlns:dc="http://purl.org/dc/elements/1.1/" 
         xmlns:opf="http://www.idpf.org/2007/opf">
    
    <metadata>
     <dc:title>…</dc:title>
     <dc:identifier id="bookid">…</dc:identifier>
     <dc:creator>…</dc:creator>
        <dc:language>en</dc:language>
    </metadata>
    
    <manifest>
        <item id="toc"
              properties="nav"
              href="toc.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="titlepage" 
              href="titlepage.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter01" 
              href="chapter01.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter02" 
              href="chapter02.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter03" 
              href="chapter03.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter04" 
              href="chapter04.xhtml" 
              media-type="application/xhtml+xml"/>
    </manifest>
    
    <spine>
        <itemref id="titleref"  idref="titlepage"/>
        <itemref id="chap01ref" idref="chapter01"/>
        <itemref id="chap02ref" idref="chapter02"/>
        <itemref id="chap03ref" idref="chapter03"/>
        <itemref id="chap04ref" idref="chapter04"/>
    </spine>
    
</package>

Sample chapter01.xml
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
     <title>…</title>
    </head>
    
    <body id="body01">
     <p>…</p>
     <p>…</p>
     <p>…</p>
     <p>…</p>
        <p id="para05">xxx<em>yyy</em>0123456789</p>
     <p>…</p>
     <p>…</p>
     <img id="svgimg" src="foo.svg" alt="…"/>
     <p>…</p>
     <p>…</p>
    </body>
</html>
Sample EPUB CFI
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)

Reference to the img element.
epubcfi(/6/4[chap01ref]!/4[body01]/16[svgimg])

Reference to the location just before xxx.
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/1:0)

Reference to the location just before yyy.
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:0)

Reference to the location just after yyy.
epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3)

<a href="../pub.opf#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/2/1:3[;s=b])">location</a>

EPUB CFI Processing

Step with a slash (/)
A step with slash (/) followed by positive integer refers to either child element or a chunk of character data. 

1 comment:

  1. thanks for the information. I want to know how we can generate CFI for the epub

    ReplyDelete