SRU (Search/Retrieval Using URL)

Protocol Transport (SRU Version 1.2 Specifications)

SECTIONS: SRU via HTTP GET | Encoding Issues | SRU via HTTP POST | SRU via HTTP SOAP (formerly SRW) Disclaimer for External Links

SRU VIA HTTP GET

The client MAY send an SRU request via the HTTP GET method. A URL is constructed and sent to the server with fixed parameter names with fixed meanings. When unicode characters need to be encoded, there are some additional constraints, discussed below.

The response MUST be XML conforming to the response schema of the operation. SRU via HTTP GET can thus be described as the simplest case of XML over HTTP.

An example of what might pass over the wire:

GET /voyager?version=1.2&operation=searchRetrieve&query=dinosaur HTTP/1.1
Host: z3950.loc.gov:7090

Syntax

An SRU request (when transported via HTTP GET) is a URI as described in RFC 3986 (See Note 1). Specifically it is an HTTP URL (as described in section 3.3 of RFC 1738; however there are some further notes about character encoding below, and uses the standard & separated key=value encoding for parameters in the query part of the URI.

The parameters for the query section of the URL (the information following the question mark) of the various operations are described in their own sections.

ENCODING ISSUES

The following encoding procedure is recommended, in particular, to accomodate Unicode characters (characters from the Universal Character Set, ISO 10646) beyond U+007F, which are not valid in a URI. This is normally relevant only to the query parameter of the searchRetrieve operation and the scanClause parameter of the scan operation.

  1. Convert the value to UTF-8.
  2. Percent-encode characters as necessary within the value. See rfc 3986 section 2.1.
  3. Construct a the URI from the parameter names and encoded values.

Note: In step 2, it is recommended to percent-encode every character in a value that is not in the URI unreserved set, that is, all except alphabetic characters, decimal digits, and the following four special characters: dash(-), period (.), underscore (_), tilde (~). By this procedure some characters may be percent-encoded that do not need to be -- For example '?' occurring in a value does not need to be percent encoded, but it is safe to do so. If in doubt, percent-encode.

Examples

Consider the following parameter:

query=dc.title =/word kirkegård

The name of the parameter is "query" and the value is "dc.title =/word kirkegård"

Note that the first '=' (following "query") must not be percent encoded as it is used as a URI delimeter, it is not part of a parameter name or value. The second '=' (preceding the '/') must be percent encoded as it is part of a value.

The following characters must be percent encoded:

  • the second '=', percent encoded as %3D
  • the '/', percent encoded as %2F
  • the spaces, percent encoded as %20
  • the 'å'.   Its UTF-8 representation is C3A5, two octets, and correspondingly it is represented in a URI as two characters percent encoded as %C3%A5.

The resulting parameter to be sent to the server would then be:

query=dc.title%20%3D%2Fword%20kirkeg%C3%A5rd

Server Procedure

  1. Parse received request based on '?', '&', and '=' into component parts: the base URL, and parameter names and values.
  2. For each parameter:
    1. Decode all %-escapes.
    2. Treat the result as a UTF-8 string.

Notes:

1. RFC 1738 is obsoleted by RFC 3986. However, RFC 1738 describes the 'http:' URI scheme; RFC 3986 does not, instead indicating that a separate document will be written to do so, but it has not yet been written. So currently there is no valid, normative reference for the 'http:' URI scheme, and so the obsolete RFC 1738 is referenced. When there is a valid, normative reference, it will be listed here.

SRU VIA HTTP POST

 

Instead of constructing a URL, the parameters may be sent via POST to the server. The Content-type header MUST be set to 'application/x-www-form-urlencoded'. Compare to 'text/xml' for SRU via SOAP below, which can be used to distinguish the two transports at the same end point.

POST has several benefits over GET for transfering the request to the server. Primarily the issues with character encoding in URLs are removed, and an explicit character set can be submitted in the Content-type HTTP header. Secondly, very long queries might generate a URL for HTTP GET that is not acceptable by some web servers or client. This length restriction can be avoided by using POST.

The response for SRU via POST is identical to that of SRU via GET, an xml document.

An example of what might be passed over the wire in the request:

POST /voyager HTTP/1.1
Host: z3850.loc.gov:7090
Content-type: application/x-www-form-urlencoded; charset=iso-8859-1
Content-length: 51
version=1.1&operation=searchRetrieve&query=dinosaur

SRU VIA HTTP SOAP

(Note: "SRU via HTTP SOAP " is the former SRW)

SRU via SOAP is a binding to the SOAP recommendation of the W3C. In this transport, the request is encoded in XML and wrapped in some additional SOAP specific elements. The response is the same XML as SRU via GET or POST, but wrapped in additional SOAP specific elements.

The incremental benefits of SRU via SOAP are the ease of structured extensions, web service facilities such as proxying and request routing, and the potential for better authentication systems.

SOAP Requirements

  • Clients and servers MUST support SOAP version 1.1, and MAY support version 1.2 or higher. This requirement is allow as much flexibility in implementation as possible.
  • The service style is 'document/literal'.
  • Messages MUST be inline with no multirefs.
  • The SOAPAction HTTP header may be present, but should not be required. If present its value MUST be the empty string. It MUST be expressed as:
    SOAPAction: ""
  • As specified by SOAP, for version 1.1 the Content-type header MUST be 'text/xml'. For version 1.2 the header value MUST be 'application/soap+xml'. End points supporting both versions of SOAP and SRU via POST thus have three content-type headers to consider.

The specification tries to adhere to the Web Services Interoperability recommendations.

Parameter Differences

There are some differences regarding the parameters that can be transported via the SOAP binding.

  • The 'operation' request parameter MUST NOT be sent. The operation is determined by the XML constructions employed.
  • The 'stylesheet' request parameter MUST NOT be sent. SOAP prevents the use of stylesheets to render the response.

Example SOAP request:

<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP:Body>
   <SRW:searchRetrieveRequest xmlns:SRW="http://www.loc.gov/zing/srw/">
    <SRW:version>1.1</SRW:version>
    <SRW:query>dinosaur</SRW:query>
    <SRW:startRecord>1</SRW:startRecord>
    <SRW:maximumRecords>1</SRW:maximumRecords>
    <SRW:recordSchema>info:srw/schema/1/mods-v3.0</SRW:recordsSchema>
   </SRW:searchRetrieveRequest>
  </SOAP:Body>
</SOAP:Envelope>