SRU (Search/Retrieval Using URL)

CQL: Contextual Query Language (SRU Version 1.2 Specifications)

SECTIONS: Query Syntax | BNF | About Context Sets | Conformance/Base Profile
ADDITIONAL LINKS: Intro and Tutorials (coming soon) | List of All Context Sets | CQL Context Set Standard Disclaimer for External Links


Note: in version 1.1 CQL stands for "Common Query Language". In version 1.2 it is changed to "Contextual Query Language".

CQL, the Contextual Query Language, is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs and museum collection information. The design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex languages.

Traditionally, query languages have fallen into two camps: Powerful, expressive languages, not easily readable nor writable by non-experts (e.g. SQL, PQF, and XQuery);or simple and intuitive languages not powerful enough to express complex concepts (e.g. CCL and google). CQL tries to combine simplicity and intuitiveness of expression for simple, every day queries, with the richness of more expressive languages to accomodate complex concepts when necessary.


Query Syntax

  1. CQL Query
    A CQL query consists of either a single search clause [example 1], or multiple search clauses connected by boolean operators [example 2]. It may have a sort specification at the end, following the 'sortBy' keyword [example 3]. In addition it may include prefix assignments which assign short names to context set identifiers [example 4].

    Examples:
    1. dc.title any fish
    2. dc.title any fish or dc.creator any sanderson
    3. dc.title any fish sortBy dc.date/sort.ascending
    4. > dc = "info:srw/context-sets/1/dc-v1.1" dc.title any fish
  2. Search Clause
    A search clause consists of either an index, relation and a search term [example 1], or a search term by itself [example 2]. If the clause consists of just a term, then the index is treated as 'cql.serverChoice', and the relation is treated as '=' [example 3]. (Treated differently in versions 1.1  and 1.2. See note 1.)

    Examples:
    1. dc.title any fish
    2. fish
    3. cql.serverChoice = fish
  3. Search Term
    Search terms MAY be enclosed in double quotes [example 1], though need not be [example 2]. Search terms MUST be enclosed in double quotes if they contain any of the following characters: < > = / ( ) and whitespace [example 3]. The search term may be an empty string [example 4], but must be present in a search clause. The empty search term has no defined semantics.

    Examples:
    1. "fish"
    2. fish
    3. "squirrels fish"
    4. ""
  4. Index Name
    An index name always includes a base name [example 1] and may also include a prefix [example 2], which determines the context set of which the index is a part. The base name and the prefix are separated by a dot character ('.'). If multiple '.' characters are present, then the first should be treated as the prefix/base name delimiter. If the prefix is not supplied, it is determined by the server.
    Examples:
    1. title any fish
    2. dc.title any fish
  5. Relation
    The relation in a search clause specifies the relationship between the index and search term. It also always includes a base name [example 1] and may also include a prefix providing a context for the relation [example 2]. If a relation does not have a prefix, the context set is 'cql'. If no relation is supplied in a search clause, then = is assumed, which means that the relation is determined by the server. See note 1 regarding version differences.

    Examples:
    1. dc.title any fish
    2. dc.title cql.any fish
  6. Relation Modifiers
    Relations may be modified by one or more relation modifiers. Relation modifiers always include a base name, and may include a prefix for a context set as above [example 1]. If a prefix is not supplied, the context set is 'cql'. Relation modifiers are separated from each other and from the relation by forward slash characters('/'). Whitespace may be present on either side of a '/' character, but the relation plus modifiers group may not end in a '/' [example 2]. Relation modifiers may also have a comparison symbol and a value. The comparison symbol is any of = < <= > >= <>. The value must obey the same rules for quoting as search terms, above [example 3].
    Examples:
    1. dc.title any/relevant fish
    2. dc.title any/ relevant /cql.string fish
    3. dc.title any/rel.algorithm=cori fish
  7. Boolean Operators
    Search clauses may be linked by boolean operators. These are: and, or, not and prox [example 1]. Note that  not is 'and-not' and must not be used as a unary operator. Boolean operators all have the same precedence; they are evaluated left-to-right. Parentheses may be used to overide left-to-right evaluation [example 2].
    Examples:
    1. dc.title any fish or dc.creator any sanderson
    2. dc.title any fish or (dc.creator any sanderson and dc.identifier = "id:1234567")
  8. Boolean Modifiers
    Booleans may be modified by one or more boolean modifiers, separated as per relation modifiers with '/' characters. Again, boolean modifiers consist of a base name and may include a prefix determining the modifier's context set [example 1]. If not supplied, then the context set is 'cql'. As per relation modifiers, they may also have a comparison symbol and a value [example 2].
    Examples:
    1. dc.title any fish or/rel.combine=sum dc.creator any sanderson
    2. dc.title any fish prox/unit=word/distance>3 dc.title any squirrel
  9. Proximity Modifiers
    Basic proximity modifiers are defined in the CQL context set. Proximity units 'word', 'sentence', 'paragraph', and 'element' are defined in the CQL context set, and may also be defined in other context sets. Within the CQL set they are explicitly undefined. When defined in another context set they may be assigned specific meaning.

    Thus compare  "prox/unit=word"  with "prox/xyz.unit=word". In the first, 'unit' is a prox modifier from the CQL set, and as such its values are undefined, so 'word' is subject to interpretation by the server. In the second, 'unit' is a prox modifier defined by the xyz context set, which may assign  the unit 'word' a specific meaning.

    The context set xyz may define additional units, for example, 'street':

                              prox/xyz.unit="street"

    Note that this approach, 'prox/xyz.unit="street"', is preferable to 'Prox/unit=xyz.street'. In the first case, 'unit' is a modifier define in the xyz context set, and 'street' is a value defined for that modifier. In the second, 'unit' is a modifier from the cql context set, with a value defined in a different set. so its value would have to be one that is defined in the cql context set. Pairing a modifier from one set with a value from another is not a good practice.
  10. Sorting (See note 2 regarding version differences.)
    Queries may include explicit information on how to sort the result set generated by the search. The sort specification is included at the end, and is separated by a 'sortBy' keyword. The specification consists of an ordered list of indexes, potentially with modifiers, to use as keys on which to sort the result set. If multiple keys are given, then the second and subsequent keys should be used to determine the order of items that would otherwise sort together. Each index used as a sort key has the same semantics as when it is used to search.

    Modifiers may be attached to the index in the same way as to booleans and relations in the main part of the query. These modifiers may be part of any context set, but the CQL context set and the Sort context set are especially important. If a modifier may be used in this way should be stated in the description of its semantics, and it is the only time at which modifiers may be attached to indexes. As many types of search also require specification of term order (for example the <, > and within relations), these modifiers are often specified as relation modifiers.

    Examples:
    1. "cat" sortBy dc.title
    2. "dinosaur" sortBy dc.date/sort.descending dc.title/sort.ascending
  11. Prefix Assignment
    Warning: The use of Prefix Maps is very uncommon.
    A Prefix Map may be used to assign context set names to specific identifiers in order to be sure that the server maps them in a desired fashion. It may occur at any place in the query and applies to anything below the map in the query tree. A prefix assignment is specified by: '>' shortname '=' identifier [example 1]. The shortname and '=' sign may be omitted, in which case it sets a default context set for indexes [example 2].
    Examples:
    1. > dc = "http://deepcustard.org/" dc.custardDepth > 10
    2. > "http://deepcustard.org/" custardDepth > 10
  12. Case Insensitive
    All parts of CQL are case insensitive apart from user supplied search terms, values for modifiers and prefix map identifiers, which may or may not be case sensitive. If any case insensitive part of CQL is specified with both upper and lower case, it is for aesthetic purposes only.
    Examples:
    1. dC.tiTlE any fish
    2. dc.TitlE Any/rEl.algOriThm=cori fish soRtbY Dc.TitlE

Notes:

  1. In version 1.2 the default relation is '=', while in version 1.1, the default relation is 'scr'. In version 1.1 the '=' relation means "adjacency". In version 1.2 the "=" relation from version 1.1 is replaced by new relation 'adj'.
  2. In version 1.1, a sort parameter is included in the searchRetrieve operation. That parameter is dropped in version 1.2 and instead the sort specification becomes part of the CQL query.   Additional description of sorting in version 1.1.

BNF

Following is the Backus Naur Form (BNF) definition for CQL. ["::=" represents "is defined as"]

sortedQuery ::= prefixAssignment sortedQuery
| scopedClause ['sortby' sortSpec]
sortSpec ::= sortSpec singleSpec | singleSpec
singleSpec ::= index [modifierList]
                            Note: The above three assignments are new in version 1.2 to accomodate the sortSpec.
cqlQuery ::= prefixAssignment cqlQuery
| scopedClause
prefixAssignment ::= '>' prefix '=' uri
| '>' uri
scopedClause ::= scopedClause booleanGroup searchClause
| searchClause
booleanGroup ::= boolean [modifierList]
boolean ::= 'and' | 'or' | 'not' | 'prox'
searchClause ::= '(' cqlQuery ')'
| index relation searchTerm
| searchTerm
relation ::= comparitor [modifierList]
comparitor ::= comparitorSymbol | namedComparitor
comparitorSymbol ::= '=' | '>' | '<' | '>=' | '<=' | '<>' | '=='
namedComparitor ::= identifier
modifierList ::= modifierList modifier | modifier
modifier ::= '/' modifierName [comparitorSymbol modifierValue]
prefix, uri, modifierName, modifierValue, searchTerm, index ::= term
term ::= identifier | 'and' | 'or' | 'not' | 'prox' | 'sortby'
identifier ::= charString1 | charString2
charString1 := Any sequence of characters that does not include any of the following:
whitespace
( (open parenthesis )
) (close parenthesis)
=
<
>
'"' (double quote)
/
If the final sequence is a reserved word, that token is returned instead. Note that '.' (period) may be included, and a sequence of digits is also permitted. Reserved words are 'and', 'or', 'not', and 'prox' (case insensitive). When a reserved word is used in a search term, case is preserved.
charString2 := Double quotes enclosing a sequence of any characters except double quote (unless preceded by backslash (\)). Backslash escapes the character following it. The resultant value includes all backslash characters except those releasing a double quote (this allows other systems to interpret the backslash character). The surrounding double quotes are not included.


Context Sets

See: List of All Context Sets | CQL Context Set

CQL is so-named ("Contextual Query Language") because it is founded on the concept of searching by semantics or context, rather than by syntax. The same search may be performed in a different way on very different underlying data structures in different servers, but the important thing is that both servers understand the intent behind the query. In order for multiple communities to define their own semantics, CQL uses Context Sets in order to ensure cross-domain interoperability.

Context sets permit CQL users to create their own indexes, relations, relation modifiers and boolean modiers without fear of chosing the same name as someone else and thereby having an ambiguous query. All of these four aspects of CQL must come from a context set, however there are rules for determining the prevailing default if one is not supplied. Context sets allow CQL to be used by communities in ways which the designers could not have foreseen, while still maintaining the same rules for parsing which allow interoperability.

When defining a new context set, it is necessary to provide a description of the semantics of each item within it. While context sets may contain indexes, relations, relation modifiers and boolean modifiers, there is no requirement that all should be present; in fact it is expected that most context sets will only define indexes.

Each context set has a unique identifier, a URI. When sending the context set in a query, a short form is used. These short names may be sent as a mapping within the query itself, or be published by the recipient of the query in some protocol dependent fashion. The prefix 'cql' is reserved for the base CQL context set, but authors may wish to recommend a short name for use with their set.

An index, relation, or modifier qualified by a context is represented in the form prefix.value, where prefix is a short name for a unique context set identifier.


Conformance/Base Profile

In order to claim conformance to CQL a server must support one of the following three levels:

Level 0

  1. Must be able to process a term-only query.
    (The term is either a single word or if multiple words separated by spaces then the entire search term is quoted). If the term includes quote marks , they must be a escaped by preceding them with a backslash, e.g."rai sing the \"titanic\"".)
  2. If an unsupported query is supplied, must be able to respond with a diagnostic to say that the query is not supported.

Level 1

  1. Support for Level 0.
  2. Ability to parse both:
    (a) search clauses consisting of 'index relation searchTerm'; and
    (b) queries where search terms are combined with booleans, e.g. "term 1 AND term2"
  3. Support for at least one of (a) and (b).

Note that (b) does not necessarily include queries such as:

index relation term1 AND index relation term2

but rather queries where the search clauses are terms-only (do not inclu de index or relation).

Level 2

  1. Support for Level 1.
  2. Ability to parse all of CQL and respond with appropriate diagnostics.

Note that Level 2 does not require support for all of CQL, it requires that the server be able to parse all of CQL (and respond with proper diagnostics for the parts not supported.).


Note: Version 1.2 is the current SRU and CQL version. These specifications are for both versions, 1.1 and 1.2, but are oriented to version 1.2 with version 1.1 exceptions annotated. For a full version 1.1 specification see Version 1.1 Archive.