CQL a Common Query Language - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

CQL a Common Query Language

Description:

Mike Taylor mike_at_indexdata.com CQL a Common Query Language. What CQL is. Motivation ... Mike Taylor mike_at_indexdata.com Learning curves for query languages ... – PowerPoint PPT presentation

Number of Views:264
Avg rating:3.0/5.0
Slides: 38
Provided by: loc
Category:

less

Transcript and Presenter's Notes

Title: CQL a Common Query Language


1
CQL a Common Query Language
  • What CQL is
  • Motivation
  • Examples and explanation
  • Applications
  • Implementation

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
2
Chapter 1 What CQL is
  • CQL is a query language
  • For humans to type
  • For query forms to generate
  • For translating other languages into
  • The only query language of SRW/SRU
  • Also applicable in other contexts
  • Z39.50 (instead of the Type-1 Query)
  • Query boxes for web searches

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
3
Chapter 2 Motivation
  • Most query languages fall into one of two camps
  • Complex and powerful, but cryptic and hard to
    learn
  • SQL, Prefix Query Format (PQF), XML Query
  • Easy to learn and use, but lacking in power
  • Google, AltaVista, CCL
  • CQL aims to make simple queries easy, and
    complex
  • queries possible (to paraphrase Larry Wall, of
    Perl)

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
4
Learning curves for query languages
SQL
Effort in learning query language
Power of query that can be expressed
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
5
Learning curves for query languages
SQL
Effort in learning query language
Google
Power of query that can be expressed
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
6
Learning curves for query languages
SQL
CQL
Effort in learning query language
Google
Power of query that can be expressed
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
7
Chapter 3 Examples and explanation
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
8
CQL features simple terms
  • Here are some perfectly good CQL queries
  • fish
  • Churchill
  • dinosaur
  • comp.sources.misc

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
9
CQL features quoting
  • Double-quote marks remove the special meanings of
  • special characters like space (which otherwise
    separates
  • tokens) and of keywords such as and and or.
  • "dinosaur"
  • "the complete dinosaur"
  • "extgtu.generic"
  • "and"
  • "the \"nuxi\" problem"
  • (Backslash removes the special meaning of
    following
  • double-quote characters.)

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
10
CQL features booleans
  • The keywords and and or are boolean
    operators.
  • The keyword not is an and-not binary operator.
  • There is no unary negation operator. Case is not
  • significant, so AND and aNd also work.
  • dinosaur or bird
  • dinosaur not reptile
  • dinosaur and bird and reptile
  • dinosaur and bird or dinobird
  • dinosaur not theropod not ornithischian

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
11
CQL features boolean precedence
  • The and, or and not booleans all have equal
  • precedence and are evaluated left-to-right.
  • dinosaur and bird or dinobird
  • MEANS
  • (dinosaur and bird) or dinobird
  • dinosaur or bird and dinobird
  • MEANS
  • (dinosaur or bird) and dinobird
  • NOT
  • dinosaur or (bird and dinobird)

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
12
CQL features parentheses
  • Parentheses may be used to override the default
  • left-to-right parsing of boolean operators.
  • dinosaur and (bird or dinobird)
  • dinosaur or (bird and dinobird)
  • (bird or dinosaur) and (feathers or scales)
  • "feathered dinosaur" and (yixian or jehol)
  • (((a and b) or (c not d) not (e or f and g)) and
    h not i) or j

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
13
CQL features pattern matching
  • There are two pattern-matching characters
  • matches any number of characters
  • ? matches any single character
  • A preceding backslash removes their special
    meaning.
  • dinosaur matches dinosaurs, dinosauria
  • sauria matches dinosauria, carnosauria
  • man?raptor matches maniraptor, manuraptor
  • man?raptor matches the plurals of these
  • "the compsaur" matches the complete dinosaur
  • char\ matches literal char

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
14
CQL features word anchoring
  • A word beginning with must occur at the start
    of its
  • field. A word ending with must occur at the
    end of
  • its field.
  • dinosaur matches the complete dinosaur
  • dinosaur also matches
  • dinosaur does not match
  • the matches the complete dinosaur
  • the also matches
  • the does not match

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
15
CQL features indexes
  • A term of the form namevalue is a query for the
    specified
  • value occurring within the named index.
  • titleChurchill finds biographies of
    Churchill
  • authorChurchill finds books written by him
  • titledinosaur and authorfarlow
  • title(dinosaur and bird)
  • subject(dinosaur or pterosaur)
  • Index names are case-insensitive, so title is
    the same
  • index as TITLE, Title or tiTLe.

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
16
CQL features prefixes
  • The meaning of an index can be specified more
    fully
  • by a prefix indicating what context set it is
    from. The
  • meaning of title is different in cross-domain
    searching
  • (Dublin Core), bibliographic searching (Bath
    Profile)
  • and heraldry.
  • dc.title"the complete dinosaur"
  • property.titlefreehold
  • heraldry.title(viscount or duke)
  • cql.serverChoicefruit
  • cql.resultSetYXJjaGJpc2hvcAp
  • Prefixes are case-insensitive.

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
17
CQL features context sets
A context set is a set of indexes that are
related to a particular area (plus some other
more esoteric stuff that you can ignore). For
example, the Dublin Core context set
contains indexes for searching against the
fifteen DC elements title, creator, subject,
description, publisher, contributor, date, type,
format, identifier, source, language, relation,
coverage, rights. The context set prose must
define their semantics.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
18
CQL features some context sets
  • A few core sets created by the SRW editorial
    board
  • CQL for core indexes such as resultSet
  • DC for metadata searching with Dublin Core
  • Rec metadata about the record, not the resource
  • Net network concepts such as hostname and port
  • Also, many application-specific sets
  • Bath, Zthes, CCG, Music
  • Rel deep voodoo for relevance matching
  • GILS is in development
  • Where do context sets come from?
  • You can just make them up! No-one can stop you!

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
19
A digression on the CQL context set
  • The CQL context set is special. It contains some
    magic
  • indexes
  • cql.anywhere searches in all the indexes
    available
  • cql.serverChoice allows the server to choose
    whatever
  • index or indexes are suitable
  • cql.resultSetId finds the records obtained in a
    previous
  • search, e.g. for refinement by combining with
    other
  • query terms.

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
20
CQL features relations
  • Usually connects an index with its relation,
    but all the
  • other obvious numeric relations are supported
  • Height 13
  • numberOfWheels lt 3
  • numberOfPlates 18
  • lengthOfFemur gt 2.4
  • BioMass gt 100
  • NumberOfToes ltgt 3 (inequality)

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
21
CQL features special relations
  • The keywords any and all can be used as
    relations,
  • indicating that any one of, or all of, the words
    specified
  • in the term must be found in the index
  • author all "kernighan ritchie"
  • shorthand for
  • authorkernighan and authorritchie
  • author any "kernighan ritchie thompson"
  • shorthand for
  • authorkernighan or authorritchie or
  • authorthompson

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
22
CQL features esoterica
You are not expected to understand this.
comment in the Unix Version 7 source code. The
point is that new users are not required to
understand this, and may happily use CQL for many
years perhaps forever without needing to.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
23
CQL esoterica proximity
  • The prox boolean, by default, requires its
    operands
  • to be next to each other, in either order
  • cervical prox vertebra
  • equivalent to
  • "cervical vertebra" or "vertebra cervical"
  • (cervical or dorsal) prox vertebra
  • equivalent to
  • "cervical vertebra" or "dorsal vertebra" or
  • "vertebra cervical" or "vertebra dorsal"

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
24
CQL esoterica proximity II
  • Modifiers can generalise the semantics of
    proximity
  • cervical prox/distancelt5/ vertebrae
  • within five words of each other
  • cervical prox/distance0/unitsentence vertebrae
  • within the same sentence
  • cervical prox/distancegt0/unitparagraph vertebrae
  • in different paragraphs
  • cervical prox/ordered vertebrae
  • in the specified order exactly equivalent to
  • "cervical vertebra"

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
25
CQL esoterica relation modifiers
  • Modifiers can refine the semantics of relations
  • title /stem dig
  • finds dig, digging, dug, etc.
  • title any/relevant "dinosaur bird reptile"
  • finds sauropods, avian, crocodile,
    snake, etc.
  • author /fuzzy tailor
  • finds Mike Taylor
  • phoneNumber exact/fuzzy "020 8348 6768"
  • finds 020 8348 6769

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
26
CQL esoterica relation modifiers II
  • Relation modifiers can be overloaded to specify
    extra
  • information about the term that the relation
    joins to the
  • index
  • createdDate gt/isoDate "2004-03-12 094500"
  • the term is in ISO 8601 format.
  • Location within/geom.polygon "(12,46) (15,52)"
  • the term indicates a polygon of two points
    (i.e. a
  • straight line) rather than the corners of a
    rectangle.

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
27
CQL esoterica boolean modifiers
  • Modifiers can refine the semantics of boolean
    operators.
  • We've already seen some examples of this in
    proximity.
  • cervical prox/distancelt5/ vertebrae
  • within five words of each other
  • cervical or/exclusive vertebrae
  • one or the other, but not both.
  • "denenberg or/rel.mean "information retrieval"
  • "denenberg or/rel.sum "information retrieval"
  • "denenberg or/rel.max "information retrieval"
  • average, total or maximum relevance of operands

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
28
CQL esoterica prefix mapping
  • So far, we have been free and easy with index
    prefixes
  • such as dc. But how do we know what they mean?
  • Why should dc mean Dublin Core rather than Deep
  • Custard?
  • dc.custardDepth lt 20
  • Why should bath mean the Bath Profile for
    bibliographic
  • searching instead of plumbing supplies?
  • bath.capacityInGallons gt 45

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
29
CQL esoterica prefix mapping II
Prefixes are just convenient, easy-to-type
abbreviations. The real identifier of a context
set is its URI. For example, the Dublin Core
context set is infosrw/cql-context-set/1/dc-v1.1
but we map that URI to a prefix for
convenience. This is exactly like XML
namespaces they are identified by URIs, but the
URIs do not appear in the names of elements or
attributes short prefixes are used instead.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
30
CQL esoterica prefix mapping III
  • In XML, a prefix is associated with a namespace
    using
  • ltelement xmlnsprefix"http//example.org/xyz/"gt
  • In CQL, a prefix is associated with a namespace
    using
  • gtprefixhttp//example.org/xyz/
  • and the rest of the query follows.
  • The following queries are exactly equivalent
  • gtdcinfosrw/cql-context-set/1/dc-v1.1
    dc.titlefish
  • gtyxinfosrw/cql-context-set/1/dc-v1.1
    yx.titlefish
  • Most applications will have established default
    mappings.

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
31
CQL esoterica prefix mapping IV
  • It is possible to establish the context set from
    which
  • indexes with no explicit prefix are taken by
    omitting the
  • prefix part from the mapping
  • gthttp//example.org/heraldry/
  • titlebaron and sidesinister
  • So the following queries are exactly equivalent
  • gtinfosrw/cql-context-set/1/dc-v1.1 titlefish
  • gtyxinfosrw/cql-context-set/1/dc-v1.1
    yx.titlefish

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
32
CQL esoterica prefix mapping V
  • Finally ... Finally! -)
  • Prefix mappings can be stacked up
  • gtdc infosrw/cql-context-set/1/dc-v1.1
  • gtbathhttp//zing.z3950.org/cql/bath/2.0/
  • gtrecinfosrw/cql-context-set/2/rec-1.0
  • rec.created lt 2004-10-09 and
  • dc.titleecology and
  • bath.conferenceNamedinosaur
  • (Yes, this is all one query.)

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
33
CQL esoterica prefix mapping VI
Don't try this at home.
CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
34
Chapter 4 Applications
  • CQL has been deployed in many kinds of
    application
  • Google-like structureless searching
  • Simple metadata searching with the Dublin Core
  • Bath Profile for bibliographic data
  • Zthes profile for hierarchical thesaurus
    navigation
  • CCG for collectable card games
  • Music musicalKey, arranger, duration, etc.
  • GILS (Global Information Locator Service)
  • ... your application goes here!

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
35
Chapter 5 Implementations
  • There are good-quality free CQL implementations
  • in several important languages
  • Java (Mike Taylor's CQL-Java package)
  • C/C (Adam Dickmeiss in Index Data's YAZ)
  • Python (Rob Sanderson in Cheshire)
  • Perl (Ed Summers' CQLParser module)
  • Visual Basic is in development (Thomas Habing)
  • ... your language goes here!

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
36
Conclusion What to take home
  • CQL makes easy queries easy and hard ones
    possible
  • You can use it well without learning the hard
    bits
  • It is used in SRW/SRU but also applicable
    elsewhere
  • It is extensible through context sets
  • Existing context sets support lots of
    applications
  • There are free implementations in several
    languages
  • Tutorial on-line at
  • http//zing.z3950.org/cql/intro.html

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
37
CQL esoterica relation modifiers II
  • Relation modifiers can be used to define
    essentially new
  • relations. Some hypothetical examples
  • location lt/geom.within "(12,46) (15,52)"
  • points within the specified rectangle
  • task gt/proj.prerequisite uiDesign
  • tasks that must be performed before the design
  • of the user interface
  • location /geography.sameState "Las Vegas"
  • places in the same state as Las Vegas

CQL a Common Query Language
Mike Taylor ltmike_at_indexdata.comgt
Write a Comment
User Comments (0)
About PowerShow.com