============================================
Document library: an open Zope 3 application
============================================

Introduction
------------

The Document Library is an open source web application written on top
of the Zope 3 application server platform. Information in the Document
Library can be accessed using the Open Archives Initiative Protocol
for Metadata Harvesting (OAI-PMH), meaning that besides being open
source, the Document Library is also a good example of an *open data*
application. Because it is open data, the Document Library is easier
to integrate with other applications, such as the Silva CMS or any
other application capable of OAI-PMH harvesting.

Document Library goals
----------------------

Organizations deal with numerous documents, such as word processor
documents and PDFs. These documents often reside on someone's computer
and are not network accessible. Versions of documents are hard to
track - the same document may be passed around using email in multiple
versions over time. In large organizations it therefore becomes
important to structure the flow of documents. This is typically done
using a document management system. 

The Document Library is one such document management system. It can
help organizations in the following ways:

* Internal communication in an organization about documents is
  enhanced as documents are all available in a central location.

* Organizations, especially public ones, have to deal with more and
  more legal requirements concerning retention and publication of
  documents. The Document Library can help an organization in making
  sure it is compliant with legal requirements surrounding documents.

* Information about documents in an organization can be accessed and
  published, for instance on a website.

The main focuses of the Document Library are:

* It's easy to use. Users are not exposed to many complicated screens
  when they just want documents to appear in the system. There's just
  a single screen that takes care of everything if you want to put a
  document in the document library. It is web-based, so no custom
  client installation is necessary.

* It integrates with other systems. The Document Library not a
  monolithic black box in which documents and metadata disappear and
  cannot be retrieved anymore, but instead makes sure it's easy to
  integrate with other systems, such as web sites that publish content
  in the document library. Features can often be added outside the
  Document Library instead of having to expand the scope of the
  Document Library beyond document management itself. Web publication
  of documents is for instance better left to a CMS (like Infrae's
  Silva_).

* It can be made to scale. Uploads and downloads of documents can be
  handled seamlessly using sophisticated Apache integration technology
  (tramline_).

.. _Silva: http://www.infrae.com/products/silva

.. _tramline: http://www.infrae.com/products/tramline

A document life cycle
---------------------

This is the typical lifecycle of a document in the Document Library:

* A user submits a document to the Document Library. A mail on the
  submission is sent by the DL to all authors listed for the document,
  and another is sent to all librarians responsible for the section in
  which the document was submitted.

* A librarian receives the email and knows a new document was
  submitted. A librarian can also see all newly submitted documents in
  the sections the librarian is responsible for in an overview page in
  the Document Library. 

* The librarian reviews the document and now either rejects or
  approves the document for publication. An email is sent to all
  authors listed for the document. An email is also sent to all
  librarians that manage this document's section.

* The *available date* of the document determines when the document is
  made available. Once the document becomes available, it shows up in
  the OAI-PMH feed and can be harvested. External systems can then
  publish the metadata and a link to the document.

* Documents may automatically expire or can become
  retracted. Documents leave information behind even when they are
  deleted, so that it is always possible to find out what happened to
  them. This is important for instance in the context of freedom of
  information legislation.

Features of the Document Library
--------------------------------

* Automatic conversion service: using OpenOffice, the Document Library
  can convert Word documents into PDFs and plain text, PDFs into plain
  text. The plain text version is important in that it allows for
  full-text indexing of document contents, and also makes documents
  more accessible to people with disabilities.

* Publication workflow: documents only become available for harvesting
  and download after a review process.

* Delegation of control: reviewers ("librarians") can be assigned to
  particular sections.

* Dynamic access: authors have automatic access to all the documents
  that list them as an author.

* Versions: multiple versions of the same document can coexist, one
  public and one under preparation.

* Email reminder functionality: users receive emails of the progress
  of the document through the workflow.

* OAI-PMH data provider: allows other systems to harvest document
  metadata using standard protocol.

* Integration with Silva CMS (using OAI-PMH).

* Fast upload and download integration with Apache using Tramline.

* Easy overview screens for librarians.

* Smart file upload user interface: files need to be uploaded only once
  even if rest of form needs to be amended.

Silva integration features
--------------------------

The Document Library can be integrated with external systems using the
OAI-PMH protocol. Much more about this protocol in the next
section. In this section we will discuss the features of the Document
Library's integration with the Silva CMS.

* Uses OAI-PMH standard to harvest documents from Document Library, but
  is aware of Document Library specific metadata.

* Ability to add references to individual documents in CMS documents.

* Ability to add listings of document references, based on metadata
  selection criteria, in CMS documents.

* Ability to create search pages for documents in the Document Library
  in the CMS. Not only metadata is indexed, but also the full-text
  content of these documents. This means that end users of the website
  can do full-text searches in document contents.

* Document download (.doc, .pdf, .txt, etc) is handled by the Document
  Library, CMS just handles presentation.

Conclusion
----------

The Document Library is a document management system with a wide set
of features but simple usability that can be introduced into an
organization relatively easily. It does not try to take over all
document-related activities such as their publication on the web, just
the management of the documents themselves in a single repository.

By using integration using OAI-PMH, the information in the Document
Library can be used in numerous ways, such as web publication using
Silva.
