Document Management Jargon Buster

Electronic Document Management Definition of terms

Acrobat:suite of software products from Adobe creating PDF files

ADF (Automatic Document Feed):either an attachment to a document scanner allowing a collection of pages to be scanned one after another, or the process of scanning in this way.

Bar Code:a machine readable code in the form of stripes representing standard alphanumeric characters and punctuation

Bar Code Recognition:Used to identify scanned documents. Data found on bar codes is often used to index and store documents. (See Indexing)

Batch processing:carrying out the same actions on more than one document as a result of a single command

Batch separator:specific pages or identifiers within a batch of documents indicating individual documents

Batch:a collection of documents

Bates Numbering: process in which a unique, often sequential, code is either applied to an electronic document or printed on the original paper document during scanning

Bitonal: an image consisting of only black or white pixels

Business Process Management: Software that enables implementation of computer-based rules to automate paper-intensive business processes.

CAD (Computer Aided Design):vector based design packages such as AutoDesk's AutoCAD

Capturing: the process by which paper based information is transferred to an electronic medium by use of a scanner or keyboard

CD-R (Compact Disc - Recordable): a type of compact disc written by a computer

CM (Content Management):see content management

Compression: the process by which computer data is rewritten to make the original file size smaller

COLD/ERM Processing: Automatic import and indexing of large text reports into smaller files.

Content management system: : in which the contents of a document is separated from the way in which it is displayed meaning that the display of the content can be controlled dynamically

Data compression: see compression

Data Mining:Software that enables users to enter complex search criteria and then locates pertinent data from within the stored archive.

Database: a structured set of data held in a computer usually accessible in many different ways

Digitisation: see scanning

DM (Document Management): see document management

DMS (Document Management System): see document management

Document: one or more pages of written or printed matter, or their electronic equivalent, providing a record of events, identification of ownership, an agreement, etc.

Desktop Publishing: Software providing the ability to scan, edit and incorporate images with other graphics for professional publishing.

Document Distribution:Software application that provides for documents to be distributed via email, fax or to archive

Document retrieval: process by which an archived document is accessed

Document scanner: see scanning

Document Imaging: see scanning

Document Imaging Software: applications that convert paper documents to a digital format, usually through scanning.

Document Management System:proprietary electronic system that scans, stores secures and retrieves documents received or created by an organization.

Document Management: Software designed to manage all types of documents, including scanned, electronic and paper.

Double feed: an instance of two (or more) pages being feed into a document scanner at the same time resulting in a single image

DPI (dots per inch): a measure of the resolution of a scanned image referring to the number of pixels per linear inch (used incorrectly in place of PPI [pixels per inch])

Drawing:a document, conventionally larger than A3, containing engineering or architectural plans, schematic designs, etc; often created by CAD software today

Duplex:either the ability of a scanner to scan both sides of a piece of paper in one pass, or a scanning process in which both sides of a piece of paper are scanned (cf. simplex)

DVD: essentially a new generation of CD with a greater storage capacity; there are many different formats by different manufacturers although the majority of readers can read all formats

DWG: file extension denoting a vector graphics file created by AutoCAD

DXF (Drawing Interchange File):a standard text file containing details of CAD drawings which can be read and interpreted by different CAD packages

ECM (Enterprise Content Management): the use of electronic techniques for organizing, processing, managing, globalizing, updating and presenting digitally created content.

EDM (Electronic Document Management):document management systems commonly provide storage, versioning, metadata, security, as well as indexing and retrival capabilities. Here is a description of these components.

  • Metadata

Metadata is typically stored for each document. Metadata may, for example, include the date the document was stored and the identity of the user storing it. The DMS may also extract metadata from the document automatically or prompt the user to add metadata. Some systems also use optical character recognition on scanned images, or perform text extraction on electronic documents. The resulting extracted text can be used to assist users in locating documents by identifying probable keywords or providing for full text search capability, or can be used on its own. Extracted text can also be stored as a component of metadata, stored with the image, or separately as a source for searching document collections.

  • Integration

Many document management systems attempt to integrate document management directly into other applications, so that users may retrieve existing documents directly from the document management system repository, make changes, and save the changed document back to the repository as a new version, all without leaving the application. Such integration is commonly available for office suites and e-mail or collaboration/groupware software. Integration often uses open standards such as ODMA, LDAP, WebDAV and SOAP to allow integration with other software and compliance with internal controls.

  • Capture

Images of paper documents using scanners or multifunction printers. Optical Character Recognition (OCR) software is often used, whether integrated into the hardware or as stand-alone software, in order to convert digital images into machine readable text.

  • Indexing

Track electronic documents. Indexing may be as simple as keeping track of unique document identifiers; but often it takes a more complex form, providing classification through the documents' metadata or even through word indexes extracted from the documents' contents. Indexing exists mainly to support retrieval. One area of critical importance for rapid retrieval is the creation of an index topology.

  • Storage

Store electronic documents. Storage of the documents often includes management of those same documents; where they are stored, for how long, migration of the documents from one storage media to another (Hierarchical storage management) and eventual document destruction.

  • Retrival

Retrieve the electronic documents from the storage. Although the notion of retrieving a particular document is simple, retrieval in the electronic context can be quite complex and powerful. Simple retrieval of individual documents can be supported by allowing the user to specify the unique document identifier, and having the system use the basic index (or a non-indexed query on its data store) to retrieve the document. More flexible retrieval allows the user to specify partial search terms involving the document identifer and/or parts of the expected metadata. This would typically return a list of documents which match the user's search terms. Some systems provide the capability to specify a Boolean expression containing multiple keywords or example phrases expected to exist within the documents' contents. The retrieval for this kind of query may be supported by previously-built indexes, or may perform more time-consuming searches through the documents' contents to return a list of the potentially relevant documents. See also Document retrieval.

  • Distribution
  • Security
  • Workflow
  • Collaboration
  • Versioning

Email management: As the de facto standard for business communication, removing emails from the server and saving them to a repository isn't enough. Email must be classified, stored and destroyed consistent with business standards just as any other document or record.

Endorsing:process by which a reference is printed onto a scanned piece of paper to indicate it has been scanned

Flatbed (scanner): usually a small scanner designed for very low volumes of paper in which the paper remains stationary on a glass bed during scanning; used for specialised applications but is very slow for production scanning

Index:either the information contained within an index field or the process by which the information is put into the index fields

Form: see structured form

Forms Processing:Software designed to extract data from a form as it is scanned into the system. The form is then routed or processed, based on the extracted data. Forms can contain typed coded or handwritten text.

Grey scale:an image made up of greys between black and white with no colour information

ICR (Intelligent Character Recognition): process by which a computer interprets an image to convert scanned handwriting to editable text usually within the confines of a structured form (cf. OCR)

Image: an electronic copy of a scanned page

Imaging:see scanning

Image Enhancement/Editing: Software applications used to improve the quality of scanned or existing digital graphic image files. Quality improvements vary from simple image de-speckling and contrast cleanup to modifying the image with an editing tool.

Index fields: search terms taken from each document to identify it within a batch

Indexing: the process of assigning descriptive searchable data to each scanned document.

Integration: merging processes, functions and data between two or more systems so the end result is a seamless and tight-knit singular system.

ISIS:scanner drivers developed by Pixel Translations (www.pixtran.com) for production scanners

JPG (shorted form of JPEG):Windows file extension denoting an image file in a JPEG format

JPEG: (Joint Photographic Experts Group) compression format for photographic images, commonly used to transmit images over the Internet.

Key field: sometimes used as a synonym for index field, also refers to a collection of unique search terms within a set of documents

KM (Knowledge Management): see knowledge management

Knowledge management: system by which information is controlled and made available to all users, usually allowing for collaboration on developing documents within the system

Long-term Archival: Content that must be preserved over decades must be saved to media, such as paper and film-based imaging, with longevity to match.

Metadata: information about the data in a system.

Microfilm: film containing photographs of documents or pages of books and journals; the processing of reproducing images on microfilm

OCR: Optical Character Recognition is a recognition software process that "reads" images and returns text versions. This text can be used to re-create a scanned document of fax for editing, indexing or full-text searches.

Online backup solution:provides the most advanced answer for data protection by offering a superior facility, faster recovery and virtually no margin for data-compromising error.

OMR:Optical Mark Recognition is used to scan paper-based documents and recognize data or marks in a predefined location.

Orientation: the position of a page, either horizontal (landscape) or vertical (portrait) depending on the information on the page

Parallel processing:many people performing work on a document simultaneously.

PDF:(Portable Document Format) self-contained, cross-platform document format used for representing documents in a manner that is independent of the original application software, hardware, and operating system.

Pixel:the smallest graphic element of an image that makes up the overall picture

Production Capture: Software products designed to capture high volumes of paper. Scanned images are then released to other software applications for storage and retrieval.

Publishing: Routing of content to the appropriate recipients either through paper or electronically through portals, Intranet, Extranet, Email or fax.

Raster (image): an image consisting of individual pixels to create a larger picture

Records Management: Content of long-term business value are deemed records and managed according to a retention schedule that determines how long a record is kept based on either outside regulations or internal business practices. Any piece of content can be designated a record.

Repositories: searchable space where data is stored for retrieval and organisation.

Sarbanes-Oxley Act: targets the accountability of financial practices of publicly traded companies assigns liability to CEOs and CFOs in regard to information released in company financials.

Scanner transport:mechanism by which paper is moved through a production scanner past the scan head to obtain an image

Scanner:a device usually attached to a computer used to convert media, such as paper and microfilm, to electronic images

Scanning: typical way of entering information into a system, either through a scanner or a Multi-function device.

SCSI (Small Computer Systems Interface): standard method of connecting external devices to a computer; specification defining both hardware and software standards for such a connection

Simplex:either the ability of a scanner to scan just one side of a piece of paper in one pass, or a scanning process in which one side of a piece of paper is scanned (cf. duplex)

Storage Technology: Ranges from optical disks to magnetic, tape, microfilm, RAID, and paper to provide a solution for access to online or nearby data infrequently needed.

Structured form:a document specifically designed for inserting information into pre-defined areas; completed forms are often scanned and the data is extracted by techniques including ICR, OMR, and OCR

Syndication: Distribution of content for reuse and integration into other content.

Systems audit:a log of every action performed in the system, to present a record in case of auditing.

TIFF: (Tagged Image File Format) compression format used for storing raster images such as photographs and line art.

TWAIN:standardised software method to link computer applications and image acquisition devices (usually scanners)

Vector (image):an image consisting of a series of instructions giving details about the magnitude and direction of each element of the picture as a whole; usually found in CAD application

Vectorise: the process of converting a raster image to a vector image using a computer to interpret the scanned image

Web Content Management: technology that addresses the content creation, review, approval and publishing processes of Web-based content. Key features include creation and authoring tools or integrations, input and presentation template design and management, content re-use management and dynamic publishing capabilities.

Workflow:the step by step series of tasks or transactions that comprise a business process. In today's document management systems, workflow refers to the automatic routing of electronic documents through these steps.

ZIP: Windows file extension denoting a compressed file usually created using WinZip, or the process of creating a ZIP file

Zonal OCR: process by which a computer interprets predefined areas of an image to obtain data to populate index fields (q.v. OCR)

Zone: an area, usually rectangular, on an image to be used for further processing, often some form of OCR