Working Group Session
January 20, 2000
at San Antonio ZIG Meeting
Z39.50 and the Web
Meeting Report
March 6, 2000
The consensus of the group was that the question as posed is not very meaningful. The reason to promote Z39.50 over the web is that it is a tool that can potentially allow users to get to more sites that have interesting information and to present results coherently. Another reason is to provide web access to information that is Z39.50 accessible.
The consensus was that the gateway topology will predominate for the forseeable future, but it really is not an important distinction (gateway versus local client). What is important is that there be a good interface to the gateway. The meaningful view is that the so-called gateway is really a remote Z39.50 origin with part of a client interface, where the whole interface is split across the browser and gateway and effected by http-to-http communication. Since the term "client" (or "Z39.50 client") in the Z39.50 context is usually thought to mean the origin together with an interface, the gateway could be though of as a "remote Z39.50 client". The term "gateway" is unfortunate and misleading (it is used because of the protocol conversion that it performs, between Z39.50 and http). There would be less confusion if the term "remote Z39.50 client" were used instead of "gateway", however, the term "gateway" is probably too intrenched to change now.
So the distinction mentioned above, "gateway vs. local client", is more meaningfully cast as "remote vs. local client"; for the remote client, if a good interface is provided then the "remoteness" of the client is not significant.
The important point in this discussion is this: features that apply to a local client can apply just as well to a remote client. A web browser, in lieu of incorporating a Z39.50 client, should incorporate interface capability to a "remote client". The reason that existing Z39.50 gateways are seen as less functional than local clients is because that interface capability doesn't yet exist. So our job in the Z39.50 community is not so much to try to get browsers to implement the Z39.50 protocol, but rather to implement the capability to interface with remote Z39.50 clients. If they do, then the gateways (remote clients) will have justification to build more functionality.
Distributed Searching
However, one of the reasons that distributed search does not fit well in the web is that it cannot accomodate the web's advertisement model, which has become so firmly rooted. So in order for Z39.50 to provide useful distributed searching, it will need to explicitly deal with advertisements.
Distributed Directory of Z39.50 Servers
A mechanism is very much needed for a client to dynamically discover Z39.50 servers. There was some discussion of a broadcast mechanism, and also discussion of a "robot" but these were rejected in favor of a much simpler mechanism, a "distributed Z-directory".
This idea will be further developed in the near future, and a preliminary specification will be drafted. The basic approach is that a Z39.50 server may include (unsolicited) within the Init response an otherInfo item that points the client to one or more servers. The client may then initiate associations with one or more of these servers, possibly for the sole purpose of discovering more servers, and so on. The information to be supplied for each server would be: name, IP address, port, and a URL pointing to information about the server.
The structure will be defined in XML, not ASN.1 (see discussion below).
This was deemed to be a dead issue. The merits of statefulness are now appreciated, even by web people. Z39.50 is inherrently a stateful protocol, and there is no longer a reason to try to impose statelessness.
There was no discussion of this item.
There was no discussion of this item.
This topic was motivated by an observation (from Dan Brickley, who was not at this meeting) that when trying to "sell" Z39.50 to web people, it would be useful to be able to point to a single profile; instead, people are overwhelmed by the large number of existing profiles. Two profiles in the past have been directly aimed at the web: ZDSR and ZHTTP. Neither, however, were completed.
There was no enthusiasm at the meeting for any discussion of "profiling" Z39.50 for the web. However, the discussion turned to the need for an implementors guide. In order for Z39.50 to become useful and accepted within the web, an implementors guide will be important, perhaps critical. We need to identify people who can (and are willing) to help develop a guide. The guide would address two audiences: information providers and information seekers, thus distinct guides for client and server.
There was discussion only of the XML query. Mark Needleman (a W3C AC member and member of the XML Query working group) reported that the requirements document would be available soon. The ZIG should comment on the requirments, and try to ensure that the query definition is compatible with the Z39.50 model, particularly the result set model.
There seemed to be strong consensus that grs-1 will continue (in the foreseeable future and possibly indefinitely) to play a strong role as a syntax for structured data while the use of xml to represent structured data will increase.
In considering grs-1 vs. xml three considerations discussed were functionality, utility/interoperability, and existing investment in grs.
There are two interesting cases:
We decided that an experimental xml definition for the holdings schema would be developed (by Joe Zeeman and Mark Hinnebusch). This effort is not intended to delay implementation though we recognize that some vendors, upon learning of this effort, may choose to wait for the xml definition; that would be a choice that an individual vendor must make; the ZIG takes no position on this. We will note on the grs schema that an experimental xml definition is being developed and that vendors should watch for it.
On the question of whether xml dtds or schemas will be used: The answer isn't clear, but it really doesn't much matter. For now, developing xml schemas is premature, because the xml schema specification isn't stable, so dtds should be developed instead. In the future, dtds may be converted to schemas without any change to encoding.
This is of course a much different question than "grs vs. xml". Grs is a home grown Z39.50 format; marc, though important in the Z39.50 world, is not a Z39.50 format any more than xml is (the Z39.50 protocol supports both). Thus, grs vs. xml is an issue where the technical merits must be considered, but not so for marc vs. xml. The key point in this discussion is that there are no browsers that can render marc, so a marc record is going to have to be converted, and if it is to retain structure, converted to xml. Conversion could occur at the server, gateway, or client.
There wasn't any discussion of this question. Nobody had anything interesting to say about it.
ASN.1 as an abstract syntax for Z39.50 PDUs will not be abandoned in favor of xml in the foreseeable future. However, it is a near certainty that Z39.50 will never be recast in ASN.1-1994 so if the 1990 version (as is currently used) ever becomes obsolete or no-longer usable, then another abstract syntax notation will need to be adopted. Xml would be one possibility, however, there are other possible choices as well.
BER and XER will be used. Aside from that there was no discussion of this question.
This discussion was a digression from the agenda.
How can clients "discover" Z39.50 servers?
In the web environment, should Z39.50 operate as a stateful or stateless protocol.
What Z39.50 functionality is desired for web access?
What functionality, not currently in Z39.50 (or not easily
achieved) is desired? (e.g. ranking, query reformulation)
What profiles are necessary/relevant? Bath? ZHTTP? ZDSR? Should there be
developed a single, unified profile for Z39.50/web access?
There were four sub-topics:
Will the type-1 query continue to be the predominant Z39.50 query in the
web environment, or will xml query begin to play a stronger role? Zsql? RDF
Query?
Will GRS-1 continue to be used as the general synax for structured data, or will xml be used instead? If xml, will grs-1 schemas be recast in xml, and will there be xml dtds or will xml schemas be used instead?
The holdings schema has been fully specified as a grs-1 schema, however there is little or no investment in implementation, yet. On the other hand, vendors are anxious to begin implementation, so re-specifying the schema in xml would cause an un-acceptible delay. It is important to note, though, nobody felt that re-specifying holdings in xml would mean that a significant effort would be wasted by having already specified it in grs, for two reasons: (1) the significant intellectual effort that went into the schema development was independent of the abstract syntax, that is, before the development of the grs abstract record structure was begun, and (2) the grs style schema could probably be converted to an xml dtd or schema, more easily than creating xml from scratch, so the effort wouldn't really be wasted.
(See discussion above.) This isn't strictly a grs vs. xml question but more generally, asn.1 vs. xml. We decided that this format would be developed using xml. The primarily reason is that these directory infoItems will find their way to
applications that don't directly support Z39.50 (though may communicate with
applications that do).
Will MARC continue to be the predominant bibliographic format
or will xml be used also? In the gateway setting, will the gateway pull marc
records and convert to xml to send to the browser? In the browser/client scenario,
will the server send marc records or xml?
How will rdf fit into the Z39.50/ web picture? Can we reconcile the rdf
model with the Z39.50 model?
Will PDUs continue to be described in ASN.1, or will they be
recast in xml?
Assuming ASN.1 as the abstract syntax, will BER be used for encoding or XER?
Will Z39.50 continue to run directly over TCP, or will it run over HTTP?
There was no discussion of this (not enough time).
Attendees: