Standard Generalized Markup Language,
also known as SGML, became an International Organisation for Standardisation
(ISO) standard in 1985. It is used to define the structure of
electronic text files or documents. It is concerned primarily
with structure and not with the content of the document.
It consists of text contained within a series of
fields called elements which are defined by markup tags at the
beginning and end of each field. These tags are contained within
triangular brackets, <>. The beginning and ending tag contain
the same name however the ending tag name is preceded by a forward
slash, /.
SGML is designed primarily for defining the structure
of electronic documents and not for direct viewing by the user.
But if you did look at an SGML file, what would it look like?
Below is an example of an extract from an SGML document:
<anzmeta>
<title>Eucalypts of Australia: 1996</title>
<abstract>
<p>This data is a compilation of Eucalyptus species site data from all over Australia. </p>
</abstract>
........
</anzmeta>
This format may look familiar to you. It is probably
because you have looked at another group of marked up documents
on the World Wide Web called Hyper Text Markup Language or HTML
documents. HTML is a form of
SGML document whose name tags have been specially defined for use
on the web. Web documents are written in HTML and HTML is
recognised and interpreted for display by web browsers.
The elements, order and structure of an SGML file
are defined in another document file called a Document Type Definition
or DTD. There is a DTD which defines HTML. A DTD allows for different
documents of the same type to be processed in a similar fashion.
The DTD is read and used by programs such as SGML parsers, or
indexing programs to check if all the required elements are present
and correctly ordered. The DTD can also be used by display programs
such as word processors or web browsers to present SGML to the
user.
The ANZLIC metadata DTD v1.1 defines a standard set
of elements and standard structure for text files of ANZLIC metadata
using SGML.
ANZLIC metadata entries exhibit a structure in that
they contain some 22 data items which are grouped into 10 categories.
This information lends itself to storage in a system which can
handle structured text. SGML or database systems are designed
to store and manage this type of information. SGML also provides
a convenient mechanism for exchange of metadata entries as information
in SGML can be transfered easily from one hardware and software
environment to another. The SGML format can also easily be read
by databases and programs for searching, checking, reporting and
other functions.
SGML also has the potential to be used as a core
component of the Australian Spatial Data Directory (ASDD) which
is part of the Australian Spatial Data Infrastructure (ASDI).
SGML documents can be created directly or outputted from metadatabases.
These documents can then be indexed, searched and presented to
the user via the World Wide Web using a range of technologies
for searching distributed indexes.
Z39.50 is an international standard defining a protocol
for computer-to-computer information retrieval first published
in 1988. It was originally designed for use in library systems
for retrieval of bibliographic information but more recently has
been used for the search and retrieval of information about geographic
datasets. Z39.50 is a network protocol which allows information
to be retrieved from a number of servers on the Internet and results
combined and presented to the user.
Examples of Internet-based data directories which
employ indexed SGML formatted metadata and the Z39.50 search and
retrieval protocol include the US Federal Geographic Data Committee (FGDC)
Geospatial Data Clearinghouse and the
Australian Spatial Data Directory.
The United States has initiated a number of metadata
standards which have gained international recognition including
the FGDC GEO profile for describing Geospatial Metadata and the
GILS or
Global Information Locator Service standard for describing
information resources . Both of these standards have chosen to
use SGML and have defined their own DTDs. By trying to use consistent
metadata SGML tags wherever possible and the Z39.50 protocol,
we greatly enhance the potential for interoperability in directory
searching.
The ISO/TC 211 Working Group on Geographic Information
is also currently working on the development of an international
metadata standard which incorporates SGML as a standard format
for input and output of metadata entries.
The World Wide Web Consortium responsible for coordinating
development of web standards have just released XML v1.0. XML
is the "eXtensible Markup Language" (extensible because
it is not fixed like HTML). XML is designed to bring the benefits
of SGML to the web, namely the ability to handle large and complex
documents and the ability to define your own class of documents
with their own unique structure.
XML is a "cut-down" version of SGML and
is fully compliant with the ISO SGML standard. The goal is to
enable XML documents to be served, received and processed on the
web in the way that HTML documents are today.
The ANZLIC metadata SGML DTD v1.1 is also XML v1.0
compliant. Future versions of web browsers such as Internet Explorer
and Netscape will support XML documents. This would enable ANZLIC
metadata in XML to be viewed and marked up directly by web browsers.
As much as possible, the DTD reflects the structure
of the ANZLIC Core Elements as outlined in the Metadata Guidelines
- Version 1.0, July 1996. However at times the DTD departs
from this structure to facilitate addition of elements for jurisdictional
and thematic directories. However the content model (schema) is
unchanged.
The DTD has a number of key elements which need to
be highlighted.
One of the advantages of this approach is that interoperability between ANZLIC and FGDC systems and SGML documents is enhanced.
The ANZLIC unique dataset ID has been included to facilitate identification of metadata records and exchange of metadata between directory systems.
There are also four additional elements, the Bounding Coordinates, which give summary level information on the geographic coverage of the dataset which can be used when performing spatial searches on the SGML documents across the Internet.
The ANZLIC fields "Custodian" and "Jurisdiction" together form the key organisation responsible for the data. This concept of custodianship is unique to Australia. While these fields do not map exactly to the FGDC and GILS field "Originator", this is a key search field across international directories. For implementation reasons, an additional element origin has been proposed made up of custod and jurisdic.