Report from January 2000 ZIG Meeting Working Group Session "Z39.50 and the Web"

Working Group Session
January 20, 2000
at San Antonio ZIG Meeting

Z39.50 and the Web

Meeting Report

March 6, 2000

There were approximately 26 attendees, see list below.

The numbered items below were original agenda items, and these are interspersed (as noted) with additional discussion items.

Purpose/application
In addressing Z39.50 over the web, what application or purpose are we addressing? Bibliographic? Resource discovery?

The consensus of the group was that the question as posed is not very meaningful. The reason to promote Z39.50 over the web is that it is a tool that can potentially allow users to get to more sites that have interesting information and to present results coherently. Another reason is to provide web access to information that is Z39.50 accessible.
Topology
What will be the predominant topological setting for Z39.50 in the web environment? Will the Z39.50/web gateway continue to predominate or will we see the evolution of Z39.50 clients in web browsers?

The consensus was that the gateway topology will predominate for the forseeable future, but it really is not an important distinction (gateway versus local client). What is important is that there be a good interface to the gateway. The meaningful view is that the so-called gateway is really a remote Z39.50 origin with part of a client interface, where the whole interface is split across the browser and gateway and effected by http-to-http communication. Since the term "client" (or "Z39.50 client") in the Z39.50 context is usually thought to mean the origin together with an interface, the gateway could be though of as a "remote Z39.50 client". The term "gateway" is unfortunate and misleading (it is used because of the protocol conversion that it performs, between Z39.50 and http). There would be less confusion if the term "remote Z39.50 client" were used instead of "gateway", however, the term "gateway" is probably too intrenched to change now.

So the distinction mentioned above, "gateway vs. local client", is more meaningfully cast as "remote vs. local client"; for the remote client, if a good interface is provided then the "remoteness" of the client is not significant.

The important point in this discussion is this: features that apply to a local client can apply just as well to a remote client. A web browser, in lieu of incorporating a Z39.50 client, should incorporate interface capability to a "remote client". The reason that existing Z39.50 gateways are seen as less functional than local clients is because that interface capability doesn't yet exist. So our job in the Z39.50 community is not so much to try to get browsers to implement the Z39.50 protocol, but rather to implement the capability to interface with remote Z39.50 clients. If they do, then the gateways (remote clients) will have justification to build more functionality.

Distributed Searching

This discussion was a digression from the agenda, although related to the "gateway" discussion (related because "gateway" is sometimes used to mean "meta-searcher").

However, one of the reasons that distributed search does not fit well in the web is that it cannot accomodate the web's advertisement model, which has become so firmly rooted. So in order for Z39.50 to provide useful distributed searching, it will need to explicitly deal with advertisements.

Distributed Directory of Z39.50 Servers

This discussion was a digression from the agenda.

How can clients "discover" Z39.50 servers?

A mechanism is very much needed for a client to dynamically discover Z39.50 servers. There was some discussion of a broadcast mechanism, and also discussion of a "robot" but these were rejected in favor of a much simpler mechanism, a "distributed Z-directory".

This idea will be further developed in the near future, and a preliminary specification will be drafted. The basic approach is that a Z39.50 server may include (unsolicited) within the Init response an otherInfo item that points the client to one or more servers. The client may then initiate associations with one or more of these servers, possibly for the sole purpose of discovering more servers, and so on. The information to be supplied for each server would be: name, IP address, port, and a URL pointing to information about the server.

The structure will be defined in XML, not ASN.1 (see discussion below).

Stateness
In the web environment, should Z39.50 operate as a stateful or stateless protocol.
This was deemed to be a dead issue. The merits of statefulness are now appreciated, even by web people. Z39.50 is inherrently a stateful protocol, and there is no longer a reason to try to impose statelessness.
Z39.50 Functionality
What Z39.50 functionality is desired for web access?
There was no discussion of this item.
Additional Functionality
What functionality, not currently in Z39.50 (or not easily achieved) is desired? (e.g. ranking, query reformulation)
There was no discussion of this item.
Profiles
What profiles are necessary/relevant? Bath? ZHTTP? ZDSR? Should there be developed a single, unified profile for Z39.50/web access?
This topic was motivated by an observation (from Dan Brickley, who was not at this meeting) that when trying to "sell" Z39.50 to web people, it would be useful to be able to point to a single profile; instead, people are overwhelmed by the large number of existing profiles. Two profiles in the past have been directly aimed at the web: ZDSR and ZHTTP. Neither, however, were completed.
There was no enthusiasm at the meeting for any discussion of "profiling" Z39.50 for the web. However, the discussion turned to the need for an implementors guide. In order for Z39.50 to become useful and accepted within the web, an implementors guide will be important, perhaps critical. We need to identify people who can (and are willing) to help develop a guide. The guide would address two audiences: information providers and information seekers, thus distinct guides for client and server.
Application Level Issues
There were four sub-topics:
1. Query
  Will the type-1 query continue to be the predominant Z39.50 query in the web environment, or will xml query begin to play a stronger role? Zsql? RDF Query?
  There was discussion only of the XML query. Mark Needleman (a W3C AC member and member of the XML Query working group) reported that the requirements document would be available soon. The ZIG should comment on the requirments, and try to ensure that the query definition is compatible with the Z39.50 model, particularly the result set model.
2. GRS-1 or XML
  Will GRS-1 continue to be used as the general synax for structured data, or will xml be used instead? If xml, will grs-1 schemas be recast in xml, and will there be xml dtds or will xml schemas be used instead?
  There seemed to be strong consensus that grs-1 will continue (in the foreseeable future and possibly indefinitely) to play a strong role as a syntax for structured data while the use of xml to represent structured data will increase.
  In considering grs-1 vs. xml three considerations discussed were functionality, utility/interoperability, and existing investment in grs.
  - Functionality. Some people assert grs functionality is completely achievable with xml, and others assert that it is not. The fact is that an analysis of this question has not yet been undertaken. An analysis will be initiated, led by Sebastian Hammer with help from Ray Denenberg and Mark Needleman (and whoever else volunteers).
  - Utility/interoperability. Sebastian speculates that yes, on the one hand, xml can handle any structured-data requirement of Z39.50 (e.g. variants, hits), but on the other hand, this would mean tailoring xml to Z39.50, by developing a special DTD or schema, and it is not clear that vendors would support it, and this would defeat the purposes of xml (those purposes being (1) universal support, and (2) interoperability). Most everyone felt that this was a very important observation.
  - Existing investment. We must distinguish existing formats, defined for grs-1, from new, to-be-developed formats, where there is no existing investment in grs-1. Neither case is clear-cut but they do present different considerations.
    There are two interesting cases:
    - Holdings schema
      The holdings schema has been fully specified as a grs-1 schema, however there is little or no investment in implementation, yet. On the other hand, vendors are anxious to begin implementation, so re-specifying the schema in xml would cause an un-acceptible delay. It is important to note, though, nobody felt that re-specifying holdings in xml would mean that a significant effort would be wasted by having already specified it in grs, for two reasons: (1) the significant intellectual effort that went into the schema development was independent of the abstract syntax, that is, before the development of the grs abstract record structure was begun, and (2) the grs style schema could probably be converted to an xml dtd or schema, more easily than creating xml from scratch, so the effort wouldn't really be wasted.
      We decided that an experimental xml definition for the holdings schema would be developed (by Joe Zeeman and Mark Hinnebusch). This effort is not intended to delay implementation though we recognize that some vendors, upon learning of this effort, may choose to wait for the xml definition; that would be a choice that an individual vendor must make; the ZIG takes no position on this. We will note on the grs schema that an experimental xml definition is being developed and that vendors should watch for it.
    - "Distributed Directory of Servers" Format
      (See discussion above.) This isn't strictly a grs vs. xml question but more generally, asn.1 vs. xml. We decided that this format would be developed using xml. The primarily reason is that these directory infoItems will find their way to applications that don't directly support Z39.50 (though may communicate with applications that do).
    On the question of whether xml dtds or schemas will be used: The answer isn't clear, but it really doesn't much matter. For now, developing xml schemas is premature, because the xml schema specification isn't stable, so dtds should be developed instead. In the future, dtds may be converted to schemas without any change to encoding.
3. Marc or XML
  Will MARC continue to be the predominant bibliographic format or will xml be used also? In the gateway setting, will the gateway pull marc records and convert to xml to send to the browser? In the browser/client scenario, will the server send marc records or xml?
  This is of course a much different question than "grs vs. xml". Grs is a home grown Z39.50 format; marc, though important in the Z39.50 world, is not a Z39.50 format any more than xml is (the Z39.50 protocol supports both). Thus, grs vs. xml is an issue where the technical merits must be considered, but not so for marc vs. xml. The key point in this discussion is that there are no browsers that can render marc, so a marc record is going to have to be converted, and if it is to retain structure, converted to xml. Conversion could occur at the server, gateway, or client.
4. RDF
  How will rdf fit into the Z39.50/ web picture? Can we reconcile the rdf model with the Z39.50 model?
PDU Issues

Abstract Syntax
Will PDUs continue to be described in ASN.1, or will they be recast in xml?
ASN.1 as an abstract syntax for Z39.50 PDUs will not be abandoned in favor of xml in the foreseeable future. However, it is a near certainty that Z39.50 will never be recast in ASN.1-1994 so if the 1990 version (as is currently used) ever becomes obsolete or no-longer usable, then another abstract syntax notation will need to be adopted. Xml would be one possibility, however, there are other possible choices as well.
Encoding
Assuming ASN.1 as the abstract syntax, will BER be used for encoding or XER?
BER and XER will be used. Aside from that there was no discussion of this question.

Transport
Will Z39.50 continue to run directly over TCP, or will it run over HTTP?
There was no discussion of this (not enough time).
Attendees:
1. Jacob Hallén; Royal Library, LIBRIS; Sweden; jacob.hallen@libris.kb.se
2. Jo Rademakers; KULeuven/LIBIS-Net; Belgium; johan.rademakers@libis.kuleuven.ac.be
3. Margery Tibbetts; California Digital Library; USA; margery.tibbetts@ucop.edu
4. Pat Stevens; OCLC; USA; pat_stevens@oclc.org
5. Heikki Levanto; Index Data; Denmark; heikki@indexdata.dk
6. Sebastian Hammer; Index Data; Denmark; quinn@indexdata.dk
7. Chris Peterson; Texas State Library and Archives Commission;USA; chris.peterson@tsl.state.tx.us
8. John Lowery; British Library; United Kingdom; john.lowery@bl.uk
9. Joe Zeeman; CGI; Canada; joe.zeeman@cgi.ca
10. Denise A. Troll; Carnegie Mellon; USA; troll@andrew.cmu.edu
11. Poul Henrik Jørgensen; Danish Library Centre; Denmark; phj@dbc.dk
12. Barbara Shuh; National Library of Canada; Canada; barbara.shuh@nlc-bnc.ca
13. Lennie Stovel; RLG; USA; bl.mds@rlg.org
14. Kevin Gladwell; British Library; UK; kevin.gladwell@bl.uk
15. Luca Lelli; Finsiel S.p.A.; Italy; l.lelli@tlcpi.finsiel.it
16. Andrew Goodchild; DSTC; Australia; andrewg@dstc.edu.au
17. Leif Andresen; Danish National Library Authority; Denmark; lea@bs.dk
18. Kevin C. Marsh; Information Access Institute; USA; Kmarsh@Information.org
19. Ted Koppel; OCLC Inc. (Distributed Systems); USA; koppelt@oclc.org
20. mark hinnebusch; fcla; USA; mark@mark.fcla.ufl.edu
21. Ronald van der Meer; ADLIB Information Systems; The Netherlands; ronald@nl.adlibsoft.com
22. Glenn Evans; SilverPlatter Information; UK; gevans@silverplatter.com
23. Adrian Riley; Fretwell Downing Informatics; UK; adrian.riley@fdgroup.com
24. Thomas Ross; Ameritech Library Services; USA; tross@amlibs.com
25. Mark Needleman; Data Research Associates, Inc; USA; mneedleman@dra.com
26. Ray Denenberg; Library of Congress; USA; rden@loc.gov

The numbered items below were original agenda items, and these are interspersed (as noted) with additional discussion items.

This discussion was a digression from the agenda, although related to the "gateway" discussion (related because "gateway" is sometimes used to mean "meta-searcher").

This discussion was a digression from the agenda.

Attendees: