<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSpy v2006 sp2 U (http://www.altova.com) by Carlo Blum (Bibliothèque Nationale) -->
<METS_Profile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.bnl.lu/schemas/mets.profile.xsd">
	<URI LOCTYPE="URN"/>
	<title>BnL profile for digitized newspapers</title>
	<abstract>This profile describes the XML output needed for the Bibliothèque nationale de Luxemnnbourg (BnL) digitalization project.
The default XML output basically consists of a METS file that describes the structure of a printed document.The content files, typically image files, are described by ALTO XML files.		This profile explains the core elements of the METS file the physical structure (image page linking), the logical structure of the document (front, main, back, issues, chapters, paragraphs, ...),descriptive, administrative and technical meta data and especially descriptive meta data of structure elemets
	</abstract>
	<date>2006-08-11T12:00:00</date>
	<contact>
		<name>Blum Carlo</name>
		<address>Bibliothèque nationale de Luxembourg</address>
		<phone>00352 26 09 59 227</phone>
		<email>carlo.blum@cie.etat.lu</email>
	</contact>
	<related_profile>There are no related profiles.</related_profile>
	<extension_schema>
		<name>METS Schema</name>
		<URI>http://www.bnl.lu/schemas/mets.xsd</URI>
	</extension_schema>
	<extension_schema>
		<name>NISO Data Dictionary: Technical Metadata for Digital Still Images, version 0.2</name>
		<URI>http://www.bnl.lu/schemas/mix.xsd</URI>
		<note>linked relative via basic schema with namespace 'http://www.loc.gov/mix/', extended by jp2 support</note>
	</extension_schema>
	<extension_schema>
		<name>MODS: Metadata Object Description Schema, version 3.0</name>
		<URI>http://www.bnl.lu/schemas/mods.xsd</URI>
		<note>linked relative via basic schema</note>
	</extension_schema>
	<extension_schema>
		<name>The "xml:" Namespace</name>
		<URI>http://www.w3.org/2001/03/xml.xsd</URI>
		<note>linked via MODS schema with namespace 'http://www.w3.org/XML/1998/namespace'</note>
	</extension_schema>
	<extension_schema>
		<name>XML Linking Language (XLink), version 1.0</name>
		<URI>http://www.loc.gov/standards/mods/xlink.xsd</URI>
		<note>linked via MODS schema with namespace 'http://www.w3.org/1999/xlink'</note>
	</extension_schema>
	<extension_schema>
		<name>ALTO: Analyzed Layout and Text Object, version 1.1</name>
		<URI>http://www.bnl.lu/schemas/alto.xsd</URI>
		<note>referenced in alto files only</note>
	</extension_schema>
	<description_rules>
	</description_rules>
	<controlled_vocabularies>
		<vocabulary>
			<name>ALTO</name>
			<maintenance_agency>CCS Content Conversion Specialists GmbH</maintenance_agency>
			<URI>http://www.ccs-gmbh.com/alto</URI>
			<context>
				<p>
					ALTO stores layout information and OCR recognized text of pages of any kind of printed documents like books, journals and newspapers. 
					ALTO is a standardized XML format to store layout and content information. 
					It is designed to be used as an extension schema to METS.
					METS holds metadata and structural information while ALTO holds the content and physical information.
				</p>
			</context>
			<description>
				<p>Schema for page based description of printed material.</p>
			</description>
		</vocabulary>
	</controlled_vocabularies>
	<structural_requirements>
		<metsHdr>
			<requirement>
				<p>Each document contains a &lt;metsHdr&gt; element telling the creation and last modification date.</p>
				<p>It contains at least one agent with attribute ROLE="CREATOR" telling the software and  version used to process the document and create the METS file.</p>
				<p>Further agents with version of transformation scripts are optional.</p>
			</requirement>
		</metsHdr>
		<dmdSec>
			<requirement>
				<p>A conforming METS document must contain at least one &lt;dmdSec&gt; with attribute "MODSMD_PRINT MODSMD_ELEC"</p>
				<p>It must also contain one &lt;dmdSec&gt; for every article, section, illustration an supplement that is referenced by a dif element in the logical strcuture</p>
				<p>&lt;dmdSec&gt; always contains one &lt;mdWrap&gt; as child.</p>
				<p>The &lt;mdWrap&gt; node contains the meta data conforming to MODS.</p>
			</requirement>
		</dmdSec>
		<amdSec>
			<requirement>
				<p>For each image page an &lt;amdSec&gt; element is generated.</p>
				<p>Inside the path "techMD/mdWrap/xmlData" the physical meta data of the scanned images are stored in mix format.</p>
			</requirement>
		</amdSec>
		<fileSec>
			<requirement>
				<p>The linked files are grouped by type:</p>
				<p>- one file group for the scanned images [@ID="IMGGRP"]</p>
				<p>- one file group for the ALTO files (containing OCR text) [@ID="ALTOGRP"]</p>
				<p>- one file group for the PDF files  [@ID="PDFGRP"]</p>
				<p>- one file group for the original optimized scanss  [@ID="ORIGIMGGRP"]</p>
				<p>A file ID is generated here for each refererred file. These file ID's are used for identification and linking in the physical and logical structmap.</p>
			</requirement>
		</fileSec>
		<structMap>
			<requirement>
				<p>physical structmap</p>
				<p>This structmap describes the physical sequence and image page linking of the document.</p>
				<p>The &lt;structMap&gt; node contains only one toplevel &lt;div&gt; element representing the document. This contains the attribute DMDID="MODSMD_ELEC MODSMD_PRINT" for referencing the global document meta data.</p>
				<p>Thus the physical structmap will be used  for a page turning interface. However, no structural information (issues, chapters, paragraphs exceeding one page) is reflected here.</p>
				<p>On the next level each &lt;div&gt; element represents one physical page. These nodes contain parallel &lt;fptr&gt; nodes pointing to the related image tif, pdf and ALTO file</p>
				<p>The following attributes are used for the described purpose: 
- attribute "TYPE" contains the value "page" for normal or unspecified pages; for pages created as a consequence to special treatement (cut-outs, crops)  it contains value "CROPPED_FROM_PAGE" 
- attribute "ORDER" contains the automatically incremented values starting at '1'. It reflects the physical sequence of images. 
- attribute "LABEL" contains the OCR result of the area that has been recognized as printed page number on this particular page (optional)
- attribute "ORDERLABEL" contains the page number within the pagination. It is filled automatically for pages without printed page number (optional)
				</p>
			</requirement>
		</structMap>
		<structMap>
			<requirement>
				<p>logical structmap</p>
				<p>The logical structmap describes the logical structure of the document.</p>
				<p>It provides data such as separation into issues, chapters and even paragraphs exceeding one page. This information is used for presentation systems that are displaying data beyong simple page turning.</p>
				<p>Full text search can be applied within particular structure elements and also within certain zone types such as captions of illustrations and tables or headlines.</p>
				<p>Each &lt;div&gt; element contains a type of node in attribute "TYPE", such as "ISSUE" or "HEADLINE". These elements are defined by the newspaper.xsd schema</p>
				<p>
					These basic logical structures need to be flexible to represent any document. On the other hand there is the need for restrictions to provide consistent documents, definitions and patterns e.g. for presentation systems.
					An indirect validation is used to verify the logical structure. Thus the logical structmap is transformed into an separate document only containing the attribute "TYPE" as node names.
				</p>
				<p>The top level &lt;div&gt; element represents the document type of Newspaper </p>
				<p>The second level &lt;div&gt; element describes the root element volume.</p>
				<p>
					All further levels depend on the document structure and are only limited by the indirect structure definitions in the newspaper schema:
					http://www.bnl.lu/schemas/newspaper.xsd
				</p>
				<p>
					The logical structure is based on some basis types, e.g. section, body, article ....
					This simplifies the handling of the data, e.g. loading data in a presentation system for searching purpose.
				</p>
			</requirement>
		</structMap>
		<structLink>
			<requirement>
				<p>A conforming METS document must not contain a behaviorSec element.
				</p>
			</requirement>
		</structLink>
		<behaviorSec>
			<requirement>
				<p>A conforming METS document must not contain a behaviorSec element.
				</p>
			</requirement>
		</behaviorSec>
	</structural_requirements>
	<technical_requirements>
		<content_files>
			<requirement>
				<p>Each page must be described in an ALTO file, containing layout information and ocr results of the related image page.</p>
			</requirement>
   		<requirement>
				<p>More detailed  information concerning the required structure and elements of METS and  ALTO files are included in: Appendix G - METS and ALTO Requirements Specification
</p>
			</requirement>
			<requirement>
				<p>Each page needs at least one referenced image file (tif).</p>
			</requirement>
			<requirement>
				<p>A reference to a PDF file with hidden text must be included fro the issue an per page</p>
			</requirement>
			<requirement>
				<p>The encoding of the XML files is "UTF-8".</p>
			</requirement>
		</content_files>
		<behavior_files>
			<requirement>
				<p>No behavior files are associated with a conforming document.</p>
			</requirement>
		</behavior_files>
		<metadata_files>
			<requirement>
				<p>No meta data files are associated with a conforming document.</p>
			</requirement>
		</metadata_files>
	</technical_requirements>
	<tool/>
</METS_Profile>

