Electronic Document Management Definition of terms
Acrobat:suite of software products from Adobe creating PDF files
ADF (Automatic Document Feed):either an attachment to a document scanner allowing a collection of pages to be scanned one after another, or the process of scanning in this way.
Bar Code:a machine readable code in the form of stripes representing standard alphanumeric characters and punctuation
Bar Code Recognition:Used to identify scanned documents. Data found on bar codes is often used to index and store documents. (See Indexing)
Batch processing:carrying out the same actions on more than one document as a result of a single command
Batch separator:specific pages or identifiers within a batch of documents indicating individual documents
Batch:a collection of documents
Bates Numbering: process in which a unique, often sequential, code is either applied to an electronic document or printed on the original paper document during scanning
Bitonal: an image consisting of only black or white pixels
Business Process Management: Software that enables implementation of computer-based rules to automate paper-intensive business processes.
CAD (Computer Aided Design):vector based design packages such as AutoDesk's AutoCAD
Capturing: the process by which paper based information is transferred to an electronic medium by use of a scanner or keyboard
CD-R (Compact Disc - Recordable): a type of compact disc written by a computer
CM (Content Management):see content management
Compression: the process by which computer data is rewritten to make the original file size smaller
COLD/ERM Processing: Automatic import and indexing of large text reports into smaller files.
Content management system: : in which the contents of a document is separated from the way in which it is displayed meaning that the display of the content can be controlled dynamically
Data compression: see compression
Data Mining:Software that enables users to enter complex search criteria and then locates pertinent data from within the stored archive.
Database: a structured set of data held in a computer usually accessible in many different ways
Digitisation: see scanning
DM (Document Management): see document management
DMS (Document Management System): see document management
Document: one or more pages of written or printed matter, or their electronic equivalent, providing a record of events, identification of ownership, an agreement, etc.
Desktop Publishing: Software providing the ability to scan, edit and incorporate images with other graphics for professional publishing.
Document Distribution:Software application that provides for documents to be distributed via email, fax or to archive
Document retrieval: process by which an archived document is accessed
Document scanner: see scanning
Document Imaging: see scanning
Document Imaging Software: applications that convert paper documents to a digital format, usually through scanning.
Document Management System:proprietary electronic system that scans, stores secures and retrieves documents received or created by an organization.
Document Management: Software designed to manage all types of documents, including scanned, electronic and paper.
Double feed: an instance of two (or more) pages being feed into a document scanner at the same time resulting in a single image
DPI (dots per inch): a measure of the resolution of a scanned image referring to the number of pixels per linear inch (used incorrectly in place of PPI [pixels per inch])
Drawing:a document, conventionally larger than A3, containing engineering or architectural plans, schematic designs, etc; often created by CAD software today
Duplex:either the ability of a scanner to scan both sides of a piece of paper in one pass, or a scanning process in which both sides of a piece of paper are scanned (cf. simplex)
DVD: essentially a new generation of CD with a greater storage capacity; there are many different formats by different manufacturers although the majority of readers can read all formats
DWG: file extension denoting a vector graphics file created by AutoCAD
DXF (Drawing Interchange File):a standard text file containing details of CAD drawings which can be read and interpreted by different CAD packages
ECM (Enterprise Content Management): the use of electronic techniques for organizing, processing, managing, globalizing, updating and presenting digitally created content.
EDM (Electronic Document Management):document management systems commonly provide storage, versioning, metadata, security, as well as indexing and retrival capabilities. Here is a description of these components.
Metadata is typically stored for each document. Metadata may, for example, include the date the document was stored and the identity of the user storing it. The DMS may also extract metadata from the document automatically or prompt the user to add metadata. Some systems also use optical character recognition on scanned images, or perform text extraction on electronic documents. The resulting extracted text can be used to assist users in locating documents by identifying probable keywords or providing for full text search capability, or can be used on its own. Extracted text can also be stored as a component of metadata, stored with the image, or separately as a source for searching document collections.
Many document management systems attempt to integrate document management directly into other applications, so that users may retrieve existing documents directly from the document management system repository, make changes, and save the changed document back to the repository as a new version, all without leaving the application. Such integration is commonly available for office suites and e-mail or collaboration/groupware software. Integration often uses open standards such as ODMA, LDAP, WebDAV and SOAP to allow integration with other software and compliance with internal controls.
Images of paper documents using scanners or multifunction printers. Optical Character Recognition (OCR) software is often used, whether integrated into the hardware or as stand-alone software, in order to convert digital images into machine readable text.
Track electronic documents. Indexing may be as simple as keeping track of unique document identifiers; but often it takes a more complex form, providing classification through the documents' metadata or even through word indexes extracted from the documents' contents. Indexing exists mainly to support retrieval. One area of critical importance for rapid retrieval is the creation of an index topology.
Store electronic documents. Storage of the documents often includes management of those same documents; where they are stored, for how long, migration of the documents from one storage media to another (Hierarchical storage management) and eventual document destruction.
Retrieve the electronic documents from the storage. Although the notion of retrieving a particular document is simple, retrieval in the electronic context can be quite complex and powerful. Simple retrieval of individual documents can be supported by allowing the user to specify the unique document identifier, and having the system use the basic index (or a non-indexed query on its data store) to retrieve the document. More flexible retrieval allows the user to specify partial search terms involving the document identifer and/or parts of the expected metadata. This would typically return a list of documents which match the user's search terms. Some systems provide the capability to specify a Boolean expression containing multiple keywords or example phrases expected to exist within the documents' contents. The retrieval for this kind of query may be supported by previously-built indexes, or may perform more time-consuming searches through the documents' contents to return a list of the potentially relevant documents. See also Document retrieval.
Email management: As the de facto standard for business communication, removing emails from the server and saving them to a repository isn't enough. Email must be classified, stored and destroyed consistent with business standards just as any other document or record.
Endorsing:process by which a reference is printed onto a scanned piece of paper to indicate it has been scanned
Flatbed (scanner): usually a small scanner designed for very low volumes of paper in which the paper remains stationary on a glass bed during scanning; used for specialised applications but is very slow for production scanning
Index:either the information contained within an index field or the process by which the information is put into the index fields
Form: see structured form
Forms Processing:Software designed to extract data from a form as it is scanned into the system. The form is then routed or processed, based on the extracted data. Forms can contain typed coded or handwritten text.
Grey scale:an image made up of greys between black and white with no colour information
ICR (Intelligent Character Recognition): process by which a computer interprets an image to convert scanned handwriting to editable text usually within the confines of a structured form (cf. OCR)
Image: an electronic copy of a scanned page
Imaging:see scanning
Image Enhancement/Editing: Software applications used to improve the quality of scanned or existing digital graphic image files. Quality improvements vary from simple image de-speckling and contrast cleanup to modifying the image with an editing tool.
Index fields: search terms taken from each document to identify it within a batch
Indexing: the process of assigning descriptive searchable data to each scanned document.
Integration: merging processes, functions and data between two or more systems so the end result is a seamless and tight-knit singular system.
ISIS:scanner drivers developed by Pixel Translations (www.pixtran.com) for production scanners
JPG (shorted form of JPEG):Windows file extension denoting an image file in a JPEG format
JPEG: (Joint Photographic Experts Group) compression format for photographic images, commonly used to transmit images over the Internet.
Key field: sometimes used as a synonym for index field, also refers to a collection of unique search terms within a set of documents
KM (Knowledge Management): see knowledge management
Knowledge management: system by which information is controlled and made available to all users, usually allowing for collaboration on developing documents within the system
Long-term Archival: Content that must be preserved over decades must be saved to media, such as paper and film-based imaging, with longevity to match.
Metadata: information about the data in a system.
Microfilm: film containing photographs of documents or pages of books and journals; the processing of reproducing images on microfilm
OCR: Optical Character Recognition is a recognition software process that "reads" images and returns text versions. This text can be used to re-create a scanned document of fax for editing, indexing or full-text searches.
Online backup solution:provides the most advanced answer for data protection by offering a superior facility, faster recovery and virtually no margin for data-compromising error.
OMR:Optical Mark Recognition is used to scan paper-based documents and recognize data or marks in a predefined location.
Orientation: the position of a page, either horizontal (landscape) or vertical (portrait) depending on the information on the page
Parallel processing:many people performing work on a document simultaneously.
PDF:(Portable Document Format) self-contained, cross-platform document format used for representing documents in a manner that is independent of the original application software, hardware, and operating system.
Pixel:the smallest graphic element of an image that makes up the overall picture
Production Capture: Software products designed to capture high volumes of paper. Scanned images are then released to other software applications for storage and retrieval.
Publishing: Routing of content to the appropriate recipients either through paper or electronically through portals, Intranet, Extranet, Email or fax.
Raster (image): an image consisting of individual pixels to create a larger picture
Records Management: Content of long-term business value are deemed records and managed according to a retention schedule that determines how long a record is kept based on either outside regulations or internal business practices. Any piece of content can be designated a record.
Repositories: searchable space where data is stored for retrieval and organisation.
Sarbanes-Oxley Act: targets the accountability of financial practices of publicly traded companies assigns liability to CEOs and CFOs in regard to information released in company financials.
Scanner transport:mechanism by which paper is moved through a production scanner past the scan head to obtain an image
Scanner:a device usually attached to a computer used to convert media, such as paper and microfilm, to electronic images
Scanning: typical way of entering information into a system, either through a scanner or a Multi-function device.
SCSI (Small Computer Systems Interface): standard method of connecting external devices to a computer; specification defining both hardware and software standards for such a connection
Simplex:either the ability of a scanner to scan just one side of a piece of paper in one pass, or a scanning process in which one side of a piece of paper is scanned (cf. duplex)
Storage Technology: Ranges from optical disks to magnetic, tape, microfilm, RAID, and paper to provide a solution for access to online or nearby data infrequently needed.
Structured form:a document specifically designed for inserting information into pre-defined areas; completed forms are often scanned and the data is extracted by techniques including ICR, OMR, and OCR
Syndication: Distribution of content for reuse and integration into other content.
Systems audit:a log of every action performed in the system, to present a record in case of auditing.
TIFF: (Tagged Image File Format) compression format used for storing raster images such as photographs and line art.
TWAIN:standardised software method to link computer applications and image acquisition devices (usually scanners)
Vector (image):an image consisting of a series of instructions giving details about the magnitude and direction of each element of the picture as a whole; usually found in CAD application
Vectorise: the process of converting a raster image to a vector image using a computer to interpret the scanned image
Web Content Management: technology that addresses the content creation, review, approval and publishing processes of Web-based content. Key features include creation and authoring tools or integrations, input and presentation template design and management, content re-use management and dynamic publishing capabilities.
Workflow:the step by step series of tasks or transactions that comprise a business process. In today's document management systems, workflow refers to the automatic routing of electronic documents through these steps.
ZIP: Windows file extension denoting a compressed file usually created using WinZip, or the process of creating a ZIP file
Zonal OCR: process by which a computer interprets predefined areas of an image to obtain data to populate index fields (q.v. OCR)
Zone: an area, usually rectangular, on an image to be used for further processing, often some form of OCR

