


1- INTRODUCTION.

This document describes actual and proposed changes to the djvu
format since the release of the DjVu3 specification by Lizardtech in
november 2005.



2- ESCAPE SEQUENCES IN ANNOTATION CHUNK STRINGS.

The treatment of escape sequence in annotation chunk strings has
historically been slightly different in Lizardtech DjVu and
DjVuLibre.  We are expecting that the DjVuLibre solution will
eventually become the standard.

Lizardtech DjVu uses the "old" rule described in section 8.3.4.2.
The sequence of characters BACKSLASH DOUBLEQUOTE represents a
DOUBLEQUOTE character without terminating the string.  There are no
other escape sequences. All other utf8 characters are written
directly. The main drawback of this approach is the inability to
write a string containing the sequence of characters BACKSLASH
DOUBLEQUOTE since there is no way to escape the first BACKSLASH
character.

DjVuLibre has introduced a more flexible scheme a few years ago.
Annotation strings are similar to strings in the C language.
Character sequences starting with a backslash have special meaning.
A BACKSLASH followed by "a", "b", "t", "n", "v", "f", "r", or "\"
stands for the ascii character BEL, BS, HT, LF, VT, FF, CR,
BACKSLASH or DOUBLEQUOTE.  A BACKSLASH followed by one to three
digits stands for the byte whose octal code is expressed by the
digits.  All other backslash sequences are illegal.  Non printable
ascii characters must be escaped. Multibyte characters should either
be entered directly, or represented using octal sequences.

DjVuLibre minimizes the compatibility problems by searching illegal
escape sequences in the annotation chunk.  If any illegal sequence
is found, the Lizardtech rule is used instead of the DjVuLibre rule.
It is expected that Lizardtech will at some point adopt the improved
DjVuLibre rules. We will then be able to state that all DjVu files
with version greater than some constant use the new convention.



3- PAGE TITLES.

Each page in a DjVu document is identified by three strings named
the ID, the NAME, and the TITLE. The semantic distinction between
these three strings is no longer very clear. DjVu libraries
does not work consistently when these strings are different.  See
the comment in the table, section 8.3.2.2 of the specification.

Recent versions of DjVuLibre still require that the ID and NAME
string are equal. The TITLE string however can be different and
should be used to display friendly page names. The djvused program
now features a command 'set-page-title' to install a TITLE different
from the ID string. The djview program then displays and recognizes
these page titles in lieu of the sequential page numbers.

Things get complicated when some page titles are purely numerical.
For instance, the page titled "4" might not be the fourth page of 
the document, and there might be many pages titled "4".
The djview4 viewer (in preparation) uses the following rules:


3.1- PAGE REFERENCES IN MAPAREA LINKS.

Page references in hyperlink annotations 
always start with the character "#".

"#$N": If N is numerical, go to page number N.
"#+N": If N is numerical, go N pages forward.
"#-N": If N is numerical, go N pages backward.
"#X":  Search a page with title "X".
       Otherwise, if X is numerical, go to the X-th page.
       Otherwise, search a page with NAME "X".
       Otherwise, search a page with ID "X".

Title searches progress forward from the page containing 
the hyperlink and wrap around the end of the document.
The first matching page is selected.


3.2- DJVU CGI ARGUMENT "PAGE=".

The meaning of the djvu cgi argument "page=" 
depends whether the first character is a "#" symbol.

"...?djvuopts&page=$N":  
    If N is numerical, go to page number N.

"...?djvuopts&page=X":
    First search a page with TITLE "X".
    Otherwise, if X is numerical, go to the X-th page.
    Otherwise, search a page with NAME "X".
    Otherwise, search a page with ID "X".




4- METADATA ANNOTATIONS.

DjVuLibre has introduced metadata annotations a few years ago.
Metadata entries for each page is represent by key/value pairs
located in a metadata directive in the annotation chunk.

The metadata directive has the form

  (metadata ... (key "value") ... )

Each entry is identified by a symbol <key> representing the nature
of the meta data entry.  Typical keys include 'year', 'booktitle',
'editor', 'author', etc.  It is suggested to use the same key names
as the BibTeX bibliography system.  The string <"value"> represents
the value associated with the corresponding key.



5- ANNOTATIONS FOR THE DOCUMENT.

This scheme provides a simple way to specify metadata for each page.
But it is often useful to provide metadata that applies to the whole
document. Document wide metadata are represented using one or
several metadata directives in the shared annotations chunk.

This scheme has a potential drawback. Since the shared annotations
is included by all pages, the document wide metadata also appears as
page metadata for all pages. This might not be adequate for some
uses. As a workaround, the djview4 viewer (in preparation) only
displays page metadata that differ from the document metadata.

A more definitive answer would be the definition of a document
annotation chunk located after the DIRM chunk and before any
component file. This space is already used by the NAVM chunk.  
This is being considered.



6- THE SO CALLED "SECURE DJVU" FORMAT.

Recently Lizardtech introduced an incompatible "Secure DJVU" format.
This format encrypt djvu data in the hope of controlling
whether users can copy or print djvu documents.
The specification is not open because there is no 
durable way to enforce such constraints.
This is just "security through obscurity".

The current djvulibre library simply emits an 
error message when encountering such files.

Some observations regarding these files:

- They use the same IFF85 structure as djvu files.
  Chunk "SINF" contains a scrambled version of the decryption 
  key and describes which actions are authorized or denied.
  Chunks "CELX" encapsulate the regular DjVu chunks.  
  
- Each CELX chunk starts with four bytes for the original chunk 
  name, and four bytes for the original chunk length.
  This is followed encrypted data, composed of enough
  blocks of 8 bytes to covert the initial chunk length.

- Lizardtech claims encryption is 32 bit blowfish.
  It appears to be in fact composed of 64 bits block
  as you would expect with blowfish.

- Despite this strong encryption, the scheme is very weak.
  The file contains the key at a relatively well known location.
  What is not known is the key scrambling scheme.
