About the New Zealand Electronic Text Centre

About the NZETC

The New Zealand Electronic Text Centre collections provide open access to significant New Zealand and Pacific Island texts and materials.

This encompasses both digitised heritage material and born-digital resources. The collections contain over 2,600 texts (around 65,000 pages) which are made available in several formats and, where possible, under a Creative Commons license.

The NZETC was created in 2002 as part of the School of English at Victoria University of Wellington. In 2004 the NZETC became part of the Library at Victoria University working on a range of digital initiatives alongside the Digital Services team who managed the library management system, web presence, intranet and specialist application support. In 2010 a new Library Technology Services team was established which took on strategic and operational responsibilities for all Library technology services and projects including the NZETC collections.

News

To read the news on developments at the NZETC go to the NZETC blog. Here you can find out what texts have been added to the collection and what projects we have been working on. The blog also acts as an archive with news stories going back to the creation of the Centre in 2002

Subscribe to the NZETC Blog RSS Feed

Research papers and reports produced by NZETC staff can be found in VUW's ResearchArchive

Projects

Projects with partners within the University

Projects with external partners

Library Technology Services no longer seeks this kind of contract work but continues to work as a partner with other libraries and digital content projects including Matapihi, Digital NZ, Creative Commons NZ, and the Kiwi Research Information Service.

Technology

XML and TEI are the document mark-up standards which underpin the work of the NZETC. Information on TEI can be found through the Text Encoding Initiative. Other key technologies used at the NZETC include topic maps and XTM, XSLT, Apache Cocoon and Lucene. More information is given below.

Books, images, and collections are navigable through a dynamically-generated semantic framework, which represents the first release of a large-scale XML Topic Map (XTM) site in New Zealand. Users are able to move around the resources on the site tracking topics of interest rather than merely browsing the material linearly or through text searching. In a topic map, web-based resources are grouped around items called "topics", each of which represents some subject of interest. In the NZETC topic map, the topics represent books, chapters, and illustrations, and also people and places mentioned in those books.

Topics in a topic map are linked together with hyperlinks called "associations". There can be different types of association in a topic map, representing the different kinds of relationship in the real world. For instance, in the NZETC topic map, the topic which represents a particular person may be linked to a topic which represents a chapter of a book which mentions that person. This association would be labelled to indicate that it represents a "mention". Similarly, the same person's topic might be linked to a particular photograph topic, via a "depiction" association.

To construct our topic map, we use XSLT stylesheets to extract metadata from each of our XML text files, and express it in the XTM format. In this way we automatically create hundreds of topic maps, each of which describes one of our texts. We also harvest information about people, places and organisations from an entity authority file which we construct from what is mentioned in our collection. Finally we merge the harvested topic maps together to create a unified topic map which describes our entire website.

Each page on the website represents one of these topics, along with any associated topics.

The Topic Map framework for the NZETC website was presented at the launch of the new information architecture on 5 May 2005. PowerPoint slides from the presentation are available.

Papers on the NZETC technical infrastucture are available through the Victoria University ResearchArchive

We use the open source TM4J Topic Map engine for merging and querying our topic map.

We use an XML publishing framework called Apache Cocoon to publish the NZETC website.

Cocoon logo

Cocoon is a Java servlet and hence it can be deployed on a wide variety of systems. We run Cocoon inside the Apache Tomcat servlet container (the official reference Implementation for the Java Servlet specification), using JVM version 1.6 from Sun Microsystems.

Tomcat logo

Cocoon offers a flexible environment based on the separation of concerns between content, logic and style.

Cocoon can deliver documents in a variety of formats, including HTML, PDF, RTF, SVG, JPEG, PNG, and any other XML-based format. We have also integrated software to produce Microsoft's eBook Reader format.

We use Cocoon to transform our XML texts into readable documents using XSLT stylesheets.

Cocoon can perform these transformations on demand; i.e. when a request is received from a web browser. Each request is handled by reading the appropriate XML document or documents, and processing the XML data in a succession of stages, first applying logical, then presentational transformations. Each stage is distinct and can be effectively managed by different people. Our web designer can edit the look of the site, the web developer can edit the structure of the site, and the text-editors can edit the content of the site (the e-texts), all independently of each other. To install a new text, the editors can simply upload the XML document and associated image files into the webserver via FTP. The document will then be automatically converted to HTML and divided into separate pages for each chapter, and scaled-down thumbnail versions of the JPEG graphics will be created using the XML graphics format SVG. To change the overall look of the site, the web-designer can upload new design elements such as CSS stylesheets, new versions of the logo, navigation menu, etc, in the same way. When a document is displayed to the reader, the content will be automatically inserted into this new design.

Lucene logo

We use Lucene for searching. Lucene is a full-text search engine written entirely in Java, published by the Apache Software Foundation.

Contact Information

Reporting an error

Max Sullivan, Digital Projects Officer, Victoria University of Wellington Library
Email: max.sullivan@vuw.ac.nz
Phone: +64 4 463 7418
Postal Address: The Library, Victoria University of Wellington, P O Box 3438, Wellington 6140, New Zealand

General inquiries

Michael Parry, Digital Initiatives Co-ordinator, Victoria University of Wellington Library
Email: michael.parry@vuw.ac.nz
Phone: +64 4 463 9734
Postal Address: The Library, Victoria University of Wellington, P O Box 3438, Wellington 6140, New Zealand

NZETC Privacy Policy

This website uses Google Analytics, a web analytics service provided by Google, Inc. ("Google"). Google Analytics uses "cookies", which are text files placed on your computer, to help the website analyze how users use the site. The information generated by the cookie about your use of the website (including your IP address) will be transmitted to and stored by Google on servers in the United States. Google will use this information for the purpose of evaluating your use of the website, compiling reports on website activity for website operators and providing other services relating to website activity and internet usage. Google may also transfer this information to third parties where required to do so by law, or where such third parties process the information on Google's behalf. Google will not associate your IP address with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. By using this website, you consent to the processing of data about you by Google in the manner and for the purposes set out above.

Library Technology Services at Victoria University of Wellington makes use of Google Analytics in order to evaluate the usage of our site, and this information is useful in allowing us to:

  • determine which resources are heavily used, and so indicate areas that we should consider focusing future digitisation efforts upon;
  • determine which resources are lightly used, and so indicate areas where we should consider improving navigation and promotion of these resources;
  • measure the usage of particular resources so that we can provide feedback to those parties that are assisting us in making these resources available through financial or other support.

If you wish to opt-out of cookies from Google you can on the Google site.

Should you like further information about this privacy policy please contact us.