Book:
"The Object Database Standard: ODMG - 93".
Release 1.2.
© 1996 by Morgan Kaufmann Publishers, Inc.

Chapter 4

Object Query Language

Contents

4.1 Introduction

In this chapter, we describe an object query language named OQL, which supports the ODMG data model. It is complete and simple. It deals with complex objects without privileging the set construct and the select-from-where clause.

We first describe the design principles of the language in Section 4.2, then we introduce in the next sections the main features of OQL. We explain the input and result of a query in Section 4.3. Section 4.4 deals with object identity. Section 4.5 presents the path expressions. In Section 4.6, we show how OQL can invoke operations and Section 4.7 describes how polymorphism is managed by OQL. Section 4.8 concludes this part of the presentation of the main concepts by exemplifying the property of operators composition.

Finally, a formal and complete definition of the language is given in Section 4.9. For each feature of the language, we give the syntax, its semantics, and an example. Alternate syntax for some features are described in Section 4.10, which completes OQL in order to accept any syntactical form of SQL. The chapter ends with the formal syntax, which is given in Section 4.11.

4.2 Principles

Our design is based on the following principles and assumptions:

  • OQL relies on the ODMG object model.
  • OQL is a superset of the part of standard SQL that deals with database queries. Thus, any select SQL sentence which runs on relational tables, works with the same syntax and semantics on collections of ODMG objects. Extensions concern object-oriented notions, like complex objects, object identity, path expression, polymorphism, operation invocation, late binding, etc.
  • OQL provides high-level primitives to deal with sets of objects but is not restricted to this collection construct. It also provides primitives to deal with structures, lists, arrays, and treats such constructs with the same efficiency.
  • OQL is a functional language where operators can freely be composed, as long as the operands respect the type system. This is a consequence of the fact that the result of any query has a type which belongs to the ODMG type model, and thus can be queried again.
  • 54
  • OQL is not computationally complete. It is a simple-to-use query language which provides easy access to an ODBMS.
  • Based on the same type system, OQL can be invoked from within programming languages for which an ODMG binding is defined. Conversely, OQL can invoke operations programmed in these languages.
  • OQL does not provide explicit update operators but rather invokes operations defined on objects for that purpose, and thus does not breach the semantics of an ODBMS which, by definition, is managed by the "methods" defined on the objects.
  • OQL provides declarative access to objects. Thus OQL queries can be easily optimized by virtue of this declarative nature.
  • The formal semantics of OQL can easily be defined.

4.3 Query Input and Result

As a stand-alone language, OQL allows querying denotable objects starting from their names, which act as entry points into a database. A name may denote any kind of object, i.e., atomic, structure, collection, or literal.

As an embedded language, OQL allows querying denotable objects which are supported by the native language through expressions yielding atoms, structures, collections, and literals. An OQL query is a function which delivers an object whose type may be inferred from the operator contributing to the query expression. This point is illustrated with two short examples.

The schema defines the types Person and Employee. These types have the extents Persons and Employees respectively. One of these persons is the chairman (and there is an entry-point Chairman to that person). The type Person defines the name, birth-date, and salary as attributes and the operation age. The type Employee, a subtype of Person, defines the relationship subordinates and the operation seniority.

select distinct x.age from Persons x where x.name = "Pat"

This selects the set of ages of all persons named Pat, returning a literal of type set<integer>.

select distinct struct(a:x.age, s: x.sex) from Persons x where x.name = "Pat";
55

This does about the same, but for each person, it builds a structure containing age and sex. It returns a literal of type set<struct>.

select distinct struct(name: x.name, hps: (select y from x.subordinates as y where y.salary >100000)) from Employees x

This is the same type of example, but now we use a more complex function. For each employee we build a structure with the name of the employee and the set of the employee's highly paid subordinates. Notice we have used a select-from-where clause in the select part. For each employee x, to compute hps, we traverse the relationship subordinates and select among this set the employees with a salary superior to $100,000. The result of this query is therefore a literal of the type set<struct>, namely:

set<struct(name: string, hps: bag<Employee>)>
We could also use a select operator in the from part:
select struct (a: x.age, s: x.sex)
from(select y from y in Employees where y.seniority ="10") as x
where x.name = "Pat"
Of course, you do not always have to use a select-from-where clause:
Chairman
retrieves the Chairman object.
Chairman.subordinates
retrieves the set of subordinates of the Chairman.
Persons
gives the set of all persons.

4.4 Dealing with Object Identity

The query language supports both types of objects: mutable (i.e., having an OID) and literal (identity = their value), depending on the way these objects are constructed or selected.

4.4.1 Creating Objects

To create an object with identity a type name constructor is used. For instance, to create a Person defined in the previous example, simply write

Person(name: "Pat", birthdate: "3/28/56", salary: 100,000)
56

The parameters in parentheses allow you to initialize certain properties of the object. Those which are not explicitly initialized are given a default value.

You distinguish such a construction from the construction expressions that yield objects without identity. For instance,

struct (a: 10,b:"Pat")
creates a structure with two valued fields.

If you now return to the example in Section 4.3, instead of computing literals, you can build objects. For example, assuming that these mutable object types are defined:

type vectint: set<integer>;
type stat
attributes
a: integer
s:char
end_type;
type stats: bag<stat>;
you can carry out the following queries:
vectint(select distinct.age from Persons where name = "Pat")
which returns a mutable object of type vectint and
stats(select stat (a: age, s: sex) from Persons where name = "Pat")
which returns a mutable object of type stats.

4.4.2 Selecting Existing Objects

The extraction expressions may return:

  • A collection of objects with identity, e.g., select x from Persons x where x.name ="Pat" returns a collection of persons whose name is Pat.
  • An object with identity, e.g., element(select x from Persons x where x.passport_number=1234567) returns the person whose passport number is 1234567.
  • A collection of literals, e.g., select x.passport_number from Persons x where x.name="Pat" returns a collection of integers giving the passport numbers of people named Pat.
  • A literal, e.g., Chairman.salary.

57

Therefore the result of a query is an object with or without object identity: some objects are generated by the query language interpreter, and others produced from the current database.

4.5 Path Expressions

As explained above, one can enter a database through a named object, but more generally as long as one gets an object (which comes, for instance, from a C++ expression), one needs a way to navigate from it and reach the right data one needs. To do this in OQL, we use the "." (or indifferently "->") notation which enables us to go inside complex objects, as well as to follow simple relationships. For example, we have a Person p and we want to know the name of the city where this person's spouse lives.

Example:
p.spouse.address.city.name
This query starts from a Person, gets his/her spouse, a Person again, goes inside the complex attribute of type Address to get the City object whose name is then accessed.

This example treated 1-1 relationship, let us now look at n-p relationships. Assume we want the names of the children of the person p. We cannot write p.children.name because children is a list of references, so the interpretation of the result of this query would be undefined. Intuitively, the result should be a collection of names, but we need an unambiguous notation to traverse such a multiple relationship and we use the select-from-where clause to handle collections just as in SQL.

Example:
select c.name
from p.children c
The result of this query is a value of type Bag<String>. If we want to get a Set, we simply drop duplicates, like in SQL by using the distinct keyword.

Example:
select distinct c.name
from p.children c

Now we have a means to navigate from an object to any object following any relationship and entering any complex subvalues of an object. For instance, we want the set of addresses of the children of each Person of the database. We know the collection named Persons contains all the persons of the database. We have now to traverse two collections: Persons and Person: :children. Like in SQL, the select-from operator

58
allows us to query more than one collection. These collections then appear in the from part. In OQL, a collection in the from part can be derived from a previous one by following a path which starts from it.
select c.address
from Persons p, p.children c
This query inspects all children of all persons. Its result is a value whose type is Bag<Address>.

4.5.1 Predicate

Of course, the where clause can be used to define any predicate which then serves to select only the data matching the predicate. For example, we want to restrict the previous result to the people living on Main Street, and having at least two children. Moreover we are only interested in the addresses of the children who do not live in the same city as their parents.

Example:
select c.address
from Persons p, p.children c
where p.address.street = "Main Street" and count(p.children) >= 2 and c.address.city != p.address.city

4.5.2 Join

In the from clause, collections which are not directly related can also be declared. As in SQL, this allows computation of joins between these collections. This example selects the people who bear the name of a flower, assuming there exists a set of all flowers called Flowers.

Example:
select p
from Persons p, Flowers f
where p.name = f.name
59

4.6 Method Invoking

OQL allows us to call a method with or without parameters anywhere the result type of the method matches the expected type in the query. The notation for calling a method is exactly the same as for accessing an attribute or traversing a relationship, in the case where the method has no parameter. If it has parameters, these are given between parentheses. This flexible syntax frees the user from knowing whether the property is stored (an attribute) or computed (a method, such as age in the following example). This example returns the age of the oldest child of the person "Paul".

Example:
select max(select c.age from p.children c)
from Persons p
where p.name = "Paul"

Of course, a method can return a complex object or a collection and then its call can be embedded in a complex path expression. For instance, if oldest_child is a method defined on the class Person which returns an object of class Person, the following example computes the set of street names where the oldest children of Parisian people are living.

Example:
select p.oldest_child.address.street
from Persons p
where p.lives_in("Paris")

Although oldest_child is a method we traverse it as if it were a relationship. Moreover, lives_in is a method with one parameter.

4.7 Polymorphism

A major contribution of object orientation is the possibility of manipulating polymorphic collections, and thanks to the late binding mechanism, to carry out generic actions on the elements of these collections. For instance, the set Persons contains objects of class Person, Employee, and Student. So far, all the queries against the Persons extent dealt with the three possible objects of the collection.

If one wants to restrict a query on a subclass of Person, either the schema provides an extent for this subclass which can then be queried directly, or else the superclass extent can be filtered to select only the objects of the subclass, as shown in the class indicator example.

A query is an expression whose operators operate on typed operands. A query is correct if the types of operands match those required by the operators. In this sense, OQL is a typed query language. This is a necessary condition for an efficient query

60
optimizer. When a polymorphic collection is filtered (for instance Persons), its elements are statically known to be of that class (for instance Person). This means that a property of a subclass (attribute or method) cannot be applied to such an element, except in two important cases: late binding to a method or explicit class indication.

4.7.1 Late Binding

Give the activities of each person.

Example:
select p.activities
from Persons p
where activities is a method which has three incarnations. Depending on the kind of person of the current p, the right incarnation is called. If p is an Employee, OQL calls the operation activities defined on this object, or else if p is a Student, OQL calls the operation activities defined for Students, or else, p is a Person and OQL calls the method activities of the type Person.

4.7.2 Class Indicator

To go down the class hierarchy, a user may explicitly declare the class of an object that cannot be inferred statically. The evaluator then has to check at runtime that this object actually belongs to the indicated class (or one of its subclasses). For example, assuming we know that only Students spend their time in following a course of study, we can select those Persons and get their grade. We explicitly indicate in the query that these Persons are in Student:

Example:
select ((Student)p). grade
from Persons p
where "course of study" in p.activities

4.8 Operator Composition

OQL is a purely functional language. All operators can be composed freely as long as the type system is respected. This is why the language is so simple and its manual so short. This philosophy is different from SQL, which is an ad-hoc language whose composition rules are not orthogonal. Adopting a complete orthogonality allows OQL to not restrict the power of expression and makes the language easier to learn without losing the SQL syntax for the simplest queries. However, when very specific SQL syntax does not enter in a pure functional category, OQL accepts these SQL peculiar-ities as possible syntactical variations. This is explained more specifically in Section 4.10.

61

Among the operators offered by OQL but not yet introduced, we can mention the set operators (union, intersect, except), the universal (for all) and existential quantifiers (exists), the sort and group by operators, and the aggregation operators (count, sum, min, max, and avg).

To illustrate this free composition of operators, let us write a rather complex query. We want to know the name of the street where employees live and have the smallest salary on average, compared to employees living in other streets. We proceed by steps and then do it as one query. We use OQL define instruction to evaluate temporary results.

Example:
  1. Build the extent of class Employee (assuming that it is not supported directly by the schema):
    define Employees as
    select (Employee) p
    from Persons p
    where "has a job" in p.activities
  2. Group the employees by street and compute the average salary in each street:
    define salary_map as
    select street, average_salary:avg(select e.salary from partition)
    from Employees e
    group by street: e.address.street
    The result is of type Bag<struct(street: string, average_salary:float)>. The group by operator splits the employees into partitions, according to the criterion (the name of the street where this person lives). The select clause computes, in each partition, the average of the salaries of the employees belonging to the partition.
  3. Sort this set by salary:
    define sorted_salary_map as
    select s
    from salary_mnap s
    order by s.average_salary
    The result is now of type List<struct(street: string, average_salary:float)>.
  4. Now get the smallest salary (the first in the list) and take the corresponding street name. This is the final result.
    first(sortecLsalary_map).street
    Example as a single query:
    first(
    select street, average_salary: avg(select e.salary from partition)
    from (select (Employee) p from Persons p where "has a job" in p.activities ) as e
    group by street: e.address.street
    order by average_salary).street
    62

    4.9 Language Definition

    OQL is an expression language. A query expression is built from typed operands composed recursively by operators. We will use the term expression to designate a valid query in this section.

    The examples are based on the schema described in Chapter 3.

    4.9.1 Queries

    A query consists of a (possibly empty) set of query definition expressions followed by an expression, which is evaluated as the query itself. The set of query definition expressions is nonrecursive (although a query may call an operation which issues a query recursively).

    Example:
    define jones as select distinct x from Students x where x.name = "Jones";
    select distinct student_id from jones

    This defines the set jones of students named Jones and gets the set of their student_ids.

    4.9.2 Named Query Definition

    If q is a query name and e is a query expression, then define q as e is a query definition expression which defines the query with name q.

    Example:
    define Does as select x from Student x where x.name ="Doe"

    This statement defines Does as a query returning a bag containing all the students whose name is Doe.

    define Doe as element(select x from Student x where x.name="Doe")

    This statement defines Doe as a query which returns the student whose name is Doe (if there is only one, otherwise an exception is raised).

    4.9.3 Elementary Expressions

    4.9.3.1 Atomic Literals

    If I is an atomic literal, then I is an expression whose value is the literal itself. Literals have the usual syntax:

    • Object Literal: nil
    • Boolean Literal: false, true
    • Integer Literal: sequence of digits, e.g. 27
    • Float Literal: mantissa/exponent. The exponent is optional, e.g., 3.14 or 314.16e-2
    • Character Literal: character between single quotes, e.g., 'z'
    • String Literal: character string between double quote, e.g.,"a string"
    63

    4.9.3.2 Named Objects

    If e is a named object, then e is an expression. It defines the entity attached to the name.

    Example:
    Students

    This query defines the set of students. We have assumed here that there exists a name Students corresponding to the extent of objects of the class Student.

    4.9.3.3 Iterator Variable

    If x is a variable declared in a from part of a select-from-where, then x is an expression whose value is the current element of the iteration over the corresponding collection.

    4.9.3.4 Named Query

    If define q as e is a query definition expression, then q is an expression.

    Example:
    Doe

    This query returns the student with name Doe. It refers to the query definition expression declared in Section 4.9.2.

    4.9.4 Construction Expressions

    4.9.4.1 Constructing Objects

    If t is a type name, p1, p2, ...,pn are properties of t, and e1, e2, ...,en are expressions, then t(p1: e1..., pn: en) is an expression.

    This defines a new object of type t whose properties p1, p2, ...,pn are initialized with the expression e1, e2, ...,e2. The type of ei must be compatible with the type of pi.

    If t is a type name of a collection and e is a collection literal, then t(e) is a collection object. The type of e must be compatible with t.

    Examples:
    Employee(name: "Peter", boss: Chairman)

    This creates a mutable Employee object.

    vectint(set(1,3,10))

    This creates a mutable set object (see the definition of vectint in Section 4.4.1).

    64

    4.9.4.2 Constructing Structures

    If p1, p2,.... pn are property names, and e1, e2,..., en are expressions, then struct(p1: e1, p2: e2, ..., pn:en) is an expression. It defines the structure taking values e1, e2, ..., en on the properties p1, p2, ..., pn.

    Note that this dynamically creates an instance of the type struct(p1: t1, p2: t2, ..., pn: tn) if ti is the type of ei.

    Example:
    struct(name: "Peter", age: 25);

    This returns a structure with two attributes name and age taking respective values Peter and 25.

    See also abbreviated syntax for some contexts in Section 4.10.1.

    4.9.4.3 Constructing Sets

    If e1, e2, ..., en are expressions, then set(e1, e2,.... en) is an expression. It defines the set containing the elements e1, e2, ..., en. It creates a set instance.

    Example:
    set(1,2,3)

    This returns a set consisting of the three elements 1, 2, and 3.

    4.9.4.4 Constructing Lists

    If e1, e2, ..., en are expressions, then list(e1, e2, ..., en) or simply (e1, e2, ..., en) are expressions. They define the list having elements e1, e2, ..., en. They create a list instance.

    If min, max are two expressions of integer or character types, such that min < max, then list(min .. max) or simply (min .. max) is an expression of value: list(min, min+1,... max-1, max)

    Example:
    list(1,2,2,3)

    This returns a list of four elements.

    Example:
    list(3 .. 5)

    This returns the list(3,4,5)

    65

    4.9.4.5 Constructing Bags

    If e1, e2, ..., en are expressions, then bag(e1, e2, ...,en) is an expression. It defines the bag having elements e1, e2, ..., en. It creates a bag instance.

    Example:
    bag(1,1,2,3,3)

    This returns a bag of five elements.

    4.9.4.6 Constructing Arrays

    If e1, e2, ..., en are expressions, then array(e1, e2, ..., en) is an expression. It defines an array having elements e1, e2,..., en. It creates an array instance.

    Example:
    array(3,4,2,1,1)

    This returns an array of five elements.

    4.9.5 Atomic Type Expressions

    4.9.5.1 Unary Expressions

    If e is an expression and <op> is a unary operation valid for the type of e, then <op> e is an expression. It defines the result of applying <op> to e.
    Arithmetic unary +, -, abs
    Boolean unary operator: not

    Example:
    not true

    This returns false.

    4.9.5.2 Binary Expressions

    If e1 and e2 are expressions and <op> is a binary operation, then e1<op>e2 is an expression. It defines the result of applying <op> to e1 and e2.

    Arithmetic integer binary operators: +, -, *, /, mod (modulo)
    Floating point binary operators: +, -, *, /
    Relational binary operators: =, !=, <, <=, >,>=
    These operators are defined on all atomic types.
    Boolean binary operators: and, or
    66
    Example:
    count(Students) - count(TA)

    This returns the difference between the number of students and the number of TAs.

    4.9.5.3 String Expressions

    If s1 and s2 are expressions of type string, then s1 || s2, and s1 + s2 are equivalent expressions of type string whose value is the concatenation of the two strings.

    If c is an expression of type character, and s an expression of type string, then c in s is an expression of type boolean whose value is true if the character belongs to the string, else false.

    If s is an expression of type string, and i is an expression of type integer, then s[i] is an expression of type character whose value is the i+lth character of the string.

    If s is an expression of type string, and low and up are expressions of type integer, then s[low:up] is an expression of type string whose value is the substring of s from the low+lth character up to the up+lth character.

    If s is an expression of type string, and pattern a string literal which may include the wildcard characters: "?" or "_", meaning any character, and "*" or "%", meaning any substring including an empty substring, then s like pattern is an expression of type boolean whose value is true if s matches the pattern, else false.

    Example:
    "a nice string" like "%nice%str_ng"
    is true.

    4.9.6 Object Expressions

    4.9.6.1 Comparison of Mutable Objects

    If e1 and e2 are expressions which denote mutable objects (objects with identity) of the same type, then e1 = e2 and e1 != e2 are expressions which return a boolean. The second expression is equivalent to not(e1 = e2). Likewise e1 = e2 is true if they designate the same object.

    Example:
    Doe = element(select s from Students s where s.name = "Doe")
    is true.
    67

    4.9.6.2 Comparison of Immutable Objects

    If e1> and e2 are expressions which denote immutable objects (literals) of the same type, then e1 = e2 and e1 != e2 are expressions which return a boolean. The second expression is equivalent to not(e1 = e2). Likewise, e1 = e2 is true if the value e1 is equal to the value e2.

    4.9.6.3 Extracting an Attribute or Traversing a Relationship from an Object

    If e is an expression, if p is a property name, then e->p and e.p are expressions. These are alternate syntax to extract property p of an object e.

    If e happens to designate a deleted or a nonexisting object, i.e., nil, an attempt to access the attribute or to traverse the relationship raises an exception. However, a query may test explicitly if an object is different from nil before accessing a property.

    Example:
    Doe.name

    This returns Doe.

    Example:
    Doe->spouse != nil and Doe->spouse->name = "Carol"

    This returns true, if Doe has a spouse whose name is Carol, or else false.

    4.9.6.4 Applying an Operation to an Object

    If e is an expression and f is an operation name, then e->f and e.f are expressions. These are alternate syntax to apply an operation on an object. The value of the expression is the one returned by the operation or else the object nil, if the operation returns nothing.

    Example:
    jones->number_of_students

    This applies the operation number_of_students to jones.

    4.9.6.5 Applying an Operation with Parameters to an Object

    If e is an expression, if e1, e2, ..., en are expressions and f is an operation name, then e->f(e1, e2, ..., en) and e.f(e1, e2, ..., en) are expressions that apply operation f with parameters e1, e2, ..., en to object e. The value of the expression is the one returned by the operation or else the object nil, if the operation returns nothing.

    68

    In both cases, if e happens to designate a deleted or a nonexisting object, i.e., nil, an attempt to apply an operation to it raises an exception. However, a query may test explicitly if an object is different from nil before applying an operation.

    Example:
    Doe->apply_course("Math", Turing)->number

    This query calls the operation apply_course on class Student for the object Doe. It passes two parameters, a string and an object of class Professor. The operation returns an object of type Course and the query returns the number of this course.

    4.9.6.6 Dereferencing an Object

    If e is an expression which denotes an object with identity (a mutable object), then *e is an expression which delivers the value of the object (a literal). Given two variables of type Person, p1 and p2, the predicate p1 = p2 is true if both variables refer to the same object, while *p1 = *p2 is true if the objects have the same values, even if they are not the same object.

    4.9.7 Collection Expressions

    4.9.7.1 Universal Quantification

    If x is a variable name, e1 and e2 are expressions, e1 denotes a collection, and e2 a predicate, then for all x in e1: e2 is an expression. It returns true if all the elements of collection e1 satisfy e2 and false otherwise.

    Example:
    for all x in Students: x.student_id > 0

    This returns true if all the objects in the Students set have a positive value for their student_id attribute. Otherwise it returns false.

    4.9.7.2 Existential Quantification

    If x is a variable name, if e1 and e2 are expressions, e1 denotes a collection, and e2 a predicate, then exists x in e1: e2 is an expression. It returns true if there is at least one element of collection e1 that satisfies e2 and false otherwise.

    Example:
    exists x in Doe.takes: x.taught_by.name = "Turing"

    This returns true if at least one course Doe takes is taught by someone named Turing.

    69

    If e is a collection expression, then exists(e) and unique(e) are expressions which return a boolean value. The first one returns true if there exists at least one element in the collection, while the second one returns true if there exists only one element in the collection.

    Note that these operators accept the SQL syntax for nested queries like

    select... from col where exists (select... from col1 where predicate)

    The nested query returns a bag to which the operator exists is applied. This is of course the task of an optimizer to recognize that it is useless to compute effectively the intermediate bag result.

    4.9.7.3 Membership Testing

    If e1 and e2 are expressions, e2 is a collection, and e1 has the type of its elements, then e1 in e2 is an expression. It returns true if element e1 belongs to collection e2.

    Example:
    Doe in Does

    This returns true.

    Doe in TA

    This returns true if Doe is a Teaching Assistant.

    4.9.7.4 Aggregate Operators

    If e is an expression which denotes a collection, if <op> is an operator from {min, max, count, sum, avg}, then <op>(e) is an expression.

    Example:
    max (select salary from Professors)

    This returns the maximum salary of the Professors.

    4.9.8 Select From Where

    If e, e', e1, e2, ..., en are expressions which denote collections, and x1, x2,..., xn are variable names if e' is an expression of type boolean, and if projection is an expression or the character *, then

    select projection from e1 as x1, e2 as x2, ..., en as xn where e'

    and

    select distinct projection from e1 as x1, e2 as x2, ..., en as xn where e'
    are expressions.

    The result of the query is a set for a select distinct or a bag for a select.

    70

    If you assume e1, e2, ..., en are all set and bag expressions, then the result is obtained as follows: take the Cartesian product 1 of the sets e1, e2,..., en; filter that product by expression e' (i.e., eliminate from the result all objects that do not satisfy boolean expression e'); apply the projection to each one of the elements of this filtered set and get the result. When the result is a set (distinct case) duplicates are automatically eliminated.

    The situation where one or more of the collections e1, e2,..., en is an indexed collection is a little more complex. The select operator first converts all these collections into bags and applies the previous operation. The result is a set (distinct case) or else a bag. So, in this case, we simply transform each of the ei's into a bag and apply the previous definition.


    1. The Cartesian product between a set and a bag is defined by first converting the set into a bag, and then get the resulting bag, which is the Cartesian product of the two bags.
    Back.

    4.9.8.1 Projection

    Before the projection, the result of the iteration over the from variables is of type

    bag< struct(x1: type_of(e1 elements), ..., xn: type_of(en elements)) >

    The projection is constructed by an expression which can then refer implicitly to the current element of this bag, using the variables xi. If for ei neither explicit nor implicit variable is declared, then xi is given an internal system name (which is not accessible by the query anyway).

    By convention, if the projection is simply "*", then the result of the select is the same as the result of the iteration.

    If the projection is "distinct *", the result of the select is this bag converted into a set. In all other cases, the projection is explicitly computed by the given expression.

    Example:
    select couple(student: x.name, professor: z.name)
    from Students as x,
    x.takes as y,
    y.taught_by as z
    where z.rank = "full professor"
    71

    This returns a bag of objects of type couple giving student names and the names of the full professors from which they take classes.

    Example:
    select *
    from Students as x,
    x.takes as y,
    y.taught_by as z
    where z.rank = "full professor"

    This returns a bag of structures, giving for each student "object", the section object followed by the student and the full professor "object" teaching in this section:

    bag< struct(x: Student, y: Section, z: Professor) >

    4.9.8.2 Iterator Variables

    A variable, xi declared in the from part ranges over the collection ei, and thus has the type of the elements of this collection. Such a variable can be used in any other part of the query to evaluate any other expressions (see the Scope Rules in Section 4.9.15). Syntactical variations are accepted for declaring these variables, exactly as with SQL. The as keyword may be omitted. Moreover, the variable itself can be omitted too, and in this case, the name of the collection itself serves as a variable name to range over it.

    Example:
    select couple(student: Students.name, professor: z.name)
    from Students,
    Students.takes y,
    y.taught_by z
    where z.rank = "full professor"

    4.9.8.3 Predicate

    In a select-from-where query, the where clause can be omitted, with the meaning of a true predicate.

    4.9.9 Group-by Operator

    If select_query is a select-from-where query, partition_attributes is a structure expression and predicate a boolean expression, then

    select_query group by partition_attributes
    is an expression and
    select_query group by partition_attributes having predicate
    is an expression.

    72

    The Cartesian product visited by the select operator is split into partitions. For each element of the Cartesian product, the partition attributes are evaluated. All elements which match the same values according to the given partition attributes, belong to the same partition. Thus the partitioned set, after the grouping operation, is a set of structures: each structure has the valued properties for this partition (the valued partition_attributes), completed by a property which is conventionally called partition and which is the bag of all objects matching this particular valued partition.

    If the partition attributes are att1: e1, att2: e2, ..., attn: en, then the result of the grouping is of type

    set< struct(att1: type_of(e1), att2: type_of(e2), ..., attn: type_of(en), partition: bag< type_of(grouped elements) >)>

    The type of grouped elements is defined as follows.

    If the from clause declares the variables v1 on collection col1, v2 on col2, ..., vn on coln, the grouped elements is a structure with one attribute, vk, for each collection having the type of the elements of the corresponding collection partition:

    bag< struct(v1: type_of(col1 elements), ..., vn: type_of(coln elements))>

    If a collection colk has no variable declared the corresponding attribute has an internal system name.

    This partitioned set may then be filtered by the predicate of a having clause. Finally, the result is computed by evaluating the select clause for this partitioned and filtered set.

    The having clause can thus apply aggregate functions on partition; likewise the select clause can refer to partition to compute the final result. Both clauses can refer also to the partition attributes.

    Example:
    select *
    from Employees e
    group by low: salary < 1000,
    medium: salary >= 1000 and salary < 10000,
    high: salary >= 10000

    This gives a set of three elements, each of which has a property called partition which contains the bag of employees that enter in this category. So the type of the result is

    set<struct(low: boolean, medium: boolean, high: boolean, partition: bag<struct(e: Employee)>)>
    73

    The second form enhances the first one with a having clause which enables you to filter the result using aggregative functions which operate on each partition.

    Example:
    select department, avg_salary: avg(select e.salary from partition)
    from Employees e
    group by department: e.deptno
    having avg(select e.salary from partition) > 30000

    This gives a set of couples: department and average of the salaries of the employees working in this department, when this average is more than 30000. So the type of the result is

    bag<struct(department: integer, avg_salary: float)>

    Note that to compute the average salary, we have used a shortcut notation allowed by the Scope Rules defined in Section 4.9.15. The fully developed notation would read

    avg_salary: avg(select x.e.salary from partition x)

    4.9.10 Order-by Operator

    If select_query is a select-from-where or a select-from-where-group_by query, and if e1, e2,.... en are expressions, then select_query order by e1, e2,..., en is an expression. It returns a list of the selected elements sorted by the functions e1, and inside each subset yielding the same e1, sorted by e2, ..., and the final subsub...set, sorted by en.

    Example:
    select p from Persons p order by p.age, p.name

    This sorts the set of persons on their age, then on their name and puts the sorted objects into the result as a list.

    Each sort expression criterion can be followed by the keyword asc or desc, specifying respectively an ascending or descending order. The default order is that of the previous declaration. For the first expression, the default is ascending.

    Example:
    select * from Persons order by age desc, name asc, department
    74

    4.9.11 Indexed Collection Expressions

    4.9.11.1 Getting the ith Element of an Indexed Collection

    If e1 and e2 are expressions, e1 is a list or an array, e2 is an integer, then e1[e2] is an expression. This extracts the e2+1 element of the indexed collection e1. Notice that the first element has the rank 0.

    Example:
    list(a,b,c,d)[1]

    This returns b.

    Example:
    element (select x
    from Courses x
    where x.name = "Math" and x.number = "101").requires[2]

    This returns the third prerequisite of Math 101.

    4.9.11.2 Extracting a Subcollection of an Indexed Collection

    If e1, e2, and e3 are expressions, e1 is a list or an array, and e2 and e3 are integers, then e1[e2:e3] is an expression. This extracts the subcollection of e1 starting at position e2 and ending at position e3.

    Example:
    list(a,b,c,d)[1:3]

    This returns list(b,c,d).

    Example:
    element (select x
    from Courses x
    where x.name="Math" and x.number="101").requires[0:2]

    This returns the list consisting of the first three prerequisites of Math 101.

    4.9.11.3 Getting the First and Last Elements of an Indexed Collection

    If e is an expression, <op> is an operator from {first, last}, and e is a list or an array, then <op>(e) is an expression. This extracts the first and last element of a collection.

    Example:
    first(element(select x
    from Courses x
    where x.name="Math" and x.number="101").requires)

    This returns the first prerequisite of Math 101.

    75

    4.9.11.4 Concatenating Two Indexed Collections

    If e1 and e2 are expressions and e1 and e2 are both lists or both arrays, then e1+e2 is an expression. This computes the concatenation of e1 and e2.

    Example:
    list(1,2)+list(2,3)
    This query generates list(1,2,2,3).

    4.9.12 Binary Set Expressions

    4.9.12.1 Union, Intersection, Difference

    If e1 and e2 are expressions, if <op> is an operator from {union, except, intersect}, if e1 and e2 are sets or bags, then e1 <op> e2 is an expression. This computes set theoretic operations, union, difference, and intersection on e1 and e2, as defined in Chapter 2.

    When the operand's collection types are different (bag and set), the set is first converted into a bag and the result is a bag.

    Examples:
    Student except TA

    This returns the set of students who are not Teaching Assistants.

    bag(2,2,3,3,3) union bag(2,3,3,3)

    This bag expression returns bag(2,2,3,3,3,2,3,3,3).

    bag(2,2,3,3,3) intersect bag(2,3,3,3)

    The intersection of two bags yields a bag that contains the minimum for each of the multiple values. So the result is bag(2,3,3,3).

    bag(2,2,3,3,3) except bag(2,3,3,3)

    This bag expression returns bag(2).

    4.9.12.2 Inclusion

    If e1 and e2 are expressions which denote sets or bag and if <op> is an operator from {<, <=, >, >=}, then e1 <op> e2 is an expression whose value is a boolean.

    When the operands are different kinds of collections (bag and set), the set is first converted into a bag.

    e1 < e2 is true if e1 is included into e2 but not equal to e2
    e1 <= e2 is true if e1 is included into e2

    Example:
    set(1,2,3) < set(3,4,2,1)

    is true.

    76

    4.9.13 Conversion Expressions

    4.9.13.1 Extracting the Element of a Singleton

    If e is a collection-valued expression, element(e) is an expression. This takes the singleton e and returns its element. If e is not a singleton this raises an exception.

    Example:
    element(select x from Professors x where x.name = "Turing")

    This returns the professor whose name is Turing (if there is only one).

    4.9.13.2 Turning a List into a Set

    If e is a list expression, listtoset(e) is an expression. This converts the list into a set, by forming the set containing all the elements of the list.

    Example:
    listtoset(list( 1,2,3,2))

    This returns the set containing 1, 2, and 3.

    4.9.13.3 Removing Duplicates

    If e is an expression whose value is a collection, then distinct(e) is an expression whose value is the same collection after removing the duplicated elements. If e is a bag, distinct(e) is a set. If e is an ordered collection, the relative ordering of the remaining elements is preserved.

    4.9.13.4 Flattening Collection of Collections

    If e is a collection-valued expression, flatten(e) is an expression. This converts a collection of collections of t into a collection of t. So flattening operates at the first level only.

    Assuming the type of e to be col1<col2<t>>, the result of flatten(e) is:

    • If col2 is a set (resp. a bag), the union of all col2<t> is done and the result is a set<t> (resp. bag<t>).
    • If col2 is a list or an array and col1 is a list or an array, the concatenation of all col2<t> is done following the order in col1 and the result is col2<t>, which is thus a list or an array. Of course duplicates, if any, are maintained by this operation.
    • If col2 is a list or an array and col1 is a set (resp. a bag), the lists or arrays are converted into sets (resp. bags), the union of all these sets (resp. bags) is done and the result is a set<t> (resp. bag<t>).

    Examples:
    flatten(list(set(1,2,3), set(3,4,5,6), set(7)))

    This returns the set containing 1,2,3,4,5,6,7.

    77
    flatten(list(list(1,2), list(1,2,3)))

    This returns list(1,2,1,2,3).

    flatten(set(list(1,2), list(1,2,3)))

    This returns the set containing 1,2,3.

    4.9.13.5 Typing an Expression

    If e is an expression and c is a type name, then (c)e is an expression. This asserts that e is an object of class type c.

    If it turns out that it is not true, an exception is raised at runtime. This is useful to access a property of an object which is statically known to be of a superclass of the specified class.

    Example:
    select ((Employee) s).salary
    from Students s
    where s in (select sec.assistant from Sections sec)

    This returns the set of salaries of all students who are teaching assistants, assuming that Students and Sections are the extents of the classes Student and Section.

    4.9.14 Function Call

    If f is a function name, if e1 , e2, ..., en are expressions, then f() and f(e1, e2, ..., e2) are expressions whose value is the value returned by the function, or the object nil, when the function does not return any value. The first form calls a function without a parameter, while the second one calls a function with the parameters e1, e2, ..., en.

    OQL does not define in which language the body of such a function is written. This allows one to extend the functionality of OQL without changing the language.

    4.9.15 Scope Rules

    The from part of a select-from-where query introduces explicit or implicit variables to range over the filtered collections. An example of an explicit variable is

    select... from Persons p ... while
    an implicit declaration would be
    select... from Persons ...

    The scope of these variables spreads over all the parts of the select-from-where expression including nested sub-expressions.

    78

    The group by part of a select-from-where-group_by query introduces the name partition along with possible explicit attribute names which characterize the partition. These names are visible in the corresponding having and select parts, including nested sub-expressions within these parts.

    Inside a scope, you use these variable names to construct path expressions and reach properties (attributes and operations) when these variables denote complex objects. For instance, in the scope of the first from clause above, you access the age of a person by p.age.

    When the variable is implicit, like in the second from clause, you directly use the name of the collection by Persons.age.

    However, when no ambiguity exists, you can use the property name directly as a shortcut, without using the variable name to open the scope (this is made implicitly), writing simply: age. There is no ambiguity when a property name is defined for one and only one object denoted by a visible variable.

    To summarize, a name appearing in a (nested) query is looked up as follows:

    • a variable in the current scope, or
    • a named query introduced by the define clause, or
    • a named object, i.e., an entry point in the database, or
    • an attribute name or an operation name of a variable in the current scope, when there is no ambiguity, i.e., this property name belongs to only one variable in the scope.

    Example:

    Assuming that in the current schema the names Persons and Cities are defined.

    select scope1
    from Persons,
    Cities c
    where exists(select scope2 from children as child) or
    count (select scope3, (select scope4 from partition)
    from children p,
    scope5 v
    group by age: scope6
    )

    In scope1, we see the names: Persons, c, Cities, all property names of class Person and class City as long as they are not present in both classes, and they are not called "Persons", "c", nor "Cities".

    79

    In scope2, we see the names: child, Persons, c, Cities, the property names of the class City which are not property of the class Person. No attribute of the class Person can be accessed directly since they are ambiguous between "child" and "Persons".

    In scope3, we see the names: age, partition, and the same names from scope1, except "age" and "partition", if they exist.

    In scope4, we see the names: age, partition, p, v, and the same names from scope1, except "age", "partition", "p" and "v", if they exist.

    In scope5, we see the names: p, and the same names from scope1, except "p", if it exists.

    In scope6, we see the names: p, v, Persons, c, Cities, the property names of the class City which are not property of the class Person. No attribute of the class Person can be accessed directly since they are ambiguous between "child" and "Persons".

    4.10 Syntactical Abbreviations

    OQL defines an orthogonal expression language, in the sense that all operators can be composed with each other as long as the types of the operands are correct. To achieve this property, we have defined a functional language with simple operators such as "+" or composite operators such as "select from where", "group_by", and "order_by" which always deliver a result in the same type system and which thus can be recursively operated with other operations in the same query.

    In order to accept the whole DML query part of SQL, as a valid syntax for OQL, we have added ad-hoc constructions each time SQL introduces a syntax which cannot be considered in the category of true operators. This section gives the list of these constructions that we call "abbreviations," since they are completely equivalent to a functional OQL expression. At the same time, we give the semantics of these constructions, since all operators used for this description have been previously defined.

    4.10.1 Structure Construction

    The structure constructor has been introduced in Section 4.9.4.2. An alternate syntax is allowed in two contexts: select clause and group-by clause. In both contexts, the SQL syntax is accepted, along with the one already defined.

    select projection{, projection} ...
    select ... group by projection {, projection}
    where projection is one of the forms:
    1. expression as identifier
    2. identifier: expression
    3. expression
    80

    This is an alternate syntax for

    struct(identifier: expression {, identifier: expression})

    If there is only one projection and the syntax (3) is used in a select clause, then it is not interpreted as a structure construction but rather the expression stands as is. Furthermore, a (3) expression is only valid if it is possible to infer the name of the corresponding attribute (the identifier). This requires that the expression denotes a path expression (possibly of length one) ending by a property whose name is then chosen as the identifier.

    Example:
    select p.name, salary, student_id
    from Professors p, p.teaches

    This query returns a bag of structures:

    bag<struct(name: string, salary: float, studentJd: integer)>

    4.10.2 Aggregate Operators

    These operators have been introduced in Section 4.9.7.4. SQL adopts a notation which is not functional for them. So OQL accepts this syntax, too. If we define aggregate as one of min, max, count, sum and avg.
    select count(*) from ...
    is equivalent to
    count(select * from ...)
    select aggregate(query) from ...
    is equivalent to
    aggregate(select query from ...)
    select aggregate(distinct query) from ...
    is equivalent to
    aggregate (distinct(select query from ...)

    4.10.3 Composite Predicates

    If e1 and e2 are expressions, e2 is a collection, e1 has the type of its elements, if relation is a relational operator (=, !=, <, <=, > , >=), then e1 relation some e2 and e1 relation any e2 and e1 relation all e2 are expressions whose value is a boolean.

    The two first predicates are equivalent to

    exists x in eg: ei relation x
    81

    The last predicate is equivalent to

    for all x in e2: e1 relation x
    Example:
    10 < some (8,15, 7, 22)
    is true.

    4.10.4 String Literal

    OQL accepts single quotes as well to delineate a string (see Section 4.9.3.1), like SQL does. This introduces an ambiguity for a string with one character which then has the same syntax as a character literal. This ambiguity is solved by context.

    4.11 OQL BNF

    The OQL grammar is given using a rather informal BNF notation.

    • { symbol } is a sequence of 0 or n symbol(s).
    • [symbol] is an optional symbol. Do not confuse this with the separators [].
    • keyword is a terminal of the grammar.
    • xxx_name is the syntax of an identifier.
    • xxx_literal is self-explanatory, e.g., "a string" is a string_literal.
    • bind_argument stands for a parameter when embedded in a programming language, e.g., $3i.

    The non-terminal query stands for a valid query expression. The grammar is presented as recursive rules producing valid queries. This explains why most of the time this non­terminal appears on the left side of ::=. Of course, each operator expects its "query" operands to be of the right types. These type constraints have been introduced in the previous sections.

    These rules must be completed by the priority of OQL operators which is given after the grammar. Some syntactical ambiguities are solved semantically from the types of the operands.

    4.11.1 Grammar

    4.11.1.1 Axiom (see 4.9.1, 4.9.2)

    query_program ::= {define_query;} query 
    define_query ::=define identifier as query 
    
    82

    4.11.1.2 Basic (see 4.9.3)

    query ::= nil
    query ::= true
    query ::= false
    query ::= integer_literal
    query ::= float_literal
    query ::= character_literal
    query ::= string_literal
    query ::= entry_name
    query ::= query_name
    query ::= bind_argument2
    query ::= from_variable_name 
    query ::= (query)
    

    2. A bind argument allows one to bind expressions from a programming language to a query when embedded into this language (see chapters on language bindings). Back.

    4.11.1.3 Simple Expression (see 4.9.5)

    query ::= query + query3
    query ::= query - query
    query ::= query * query
    query ::= query / query
    query ::= - query
    query ::= query mod query
    query ::= abs(query)
    query ::= query || query
    

    3. The operator + is also used for list and array concatenation. Back.

    4.11.1.4 Comparison (see 4.9.5)

    query ::= query comparison_operator query
    query ::= query like string_literal
    comparison_operator ::= =
    comparison_operator ::= !=
    comparison_operator ::= >
    comparison_operator ::= <
    comparison_operator ::= >=
    comparison_operator ::= <=
    

    4.11.1.5 Boolean Expression (see 4.9.5)

    query ::= not query
    query ::= query and query
    query ::= query or query
    
    83

    4.11.1.6 Constructor (see 4.9.4)

    query ::= type_name([query])
    query ::= type_name (identifier: query {,identifier: query})
    query ::= struct (identifier: query {,identifier: query})
    query ::= set ([query {, query}])
    query ::= bag ([query {, query}])
    query ::= list ([query {, query}])
    query ::= (query, query {, query})
    query ::= [list](query .. query)
    query ::= array ([query {,query}])
    

    4.11.1.7 Accessor (see 4.9.6, 4.9.11, 4.9.14, 4.9.15)

    query ::= query dot attribute_name
    query ::= query dot relationship_name
    query ::= query dot operation_name(query {,query})
    dot   ::= . | ->
    query ::= * query
    query ::= query [query]
    query ::= query [query:query]
    query ::= first(query)
    query ::= last(query)
    query ::= function_name([query {,query}])
    

    4.11.1.8 Collection Expression (see 4.9.7, 4.10.3)

    query ::= for all identifier in query: query
    query ::= exists identifier in query: query
    query ::= exists(query)
    query ::= unique(query)
    query ::= query in query
    query ::= query comparison_operator quantifier query
    quantifier ::= some
    quantifier ::= any
    quantifier ::= all
    query ::= count(query)
    query ::= count(*)
    query ::= sum(query)
    query ::= min(query)
    query ::= max(query)
    query ::= avg(query)
    
    84

    4.11.1.9 Select Expression (see 4.9.8,4.9.9,4.9.10)

    query ::= 
    select [ distinct] projection_attributes
    from variable_declaration {, variable_declaration}
    [where query]
    [group by partition_atlributes]
    [having query]
    [order by sort_criterion {, sort_criterion}]
    projection_attributes ::= projection {, projection} proiection_attributes ::= * projection ::= query projection ::= identifier: query projection ::= query as identifier variable_declaration ::= query [[ as] identifier] partition_attributes ::= projection {, projection} sort_criterion ::= query [ordering] ordering ::= asc ordering ::= desc

    4.11.1.10 Set Expression (see 4.9.12)

    query ::= query intersect query
    query ::= query union query
    query ::= query except query
    

    4.11.1.11 Conversion (see 4.9.13)

    query ::= listtoset(query)
    query ::= element(query)
    query ::= distinct(e)
    query ::= flatten(query)
    query ::= (class_name) query
    
    85

    4.11.2 Operator Priorities

    The following operators are sorted by decreasing priority. Operators on the same line have the same priority and group left-to-right.

    () [] . ->
    not -(unary) + (unary)
    in
    * / mod intersect
    + - union except ||
    < > <= >= <some <any <all
     = != like
    and exists for all
    or
    .. :
    ,
    (identifier) this is the cast operator
    order
    having
    group by
    where
    from
    select