The Constellation Query Language (CQL) is a language for constructing and querying fact-based information models. It is a textual representation of the graphical Object Role Modeling language, augmented with derivations (queries). CQL is also relationally complete; it implements all capabilities of Codd's Relational Algebra.
CQL has an open vocabulary (almost any linking words or expressions may be used in defining fact types) because it is designed to resemble natural language. Only certain key words and key phrases are restricted. Some keywords are disallowed in any open text, because they always have their special meaning, whereas others are only special in certain contexts. In most cases the individual words of key phrases are not restricted; only the complete phrase is special. CQL allows free use of whitespace and comments, even in the middle of key phrases.
This introduction to CQL uses English language keywords and phrases, though CQL also has variants in other natural languages. Keywords that are special everywhere include the logical operators, such as and, or, maybe, not, none, no, if, and designators like some, that, etc. Other words of key phrases such as is, identified, kind, of, as, at, etc, may be used anywhere open vocabulary is allowed. Their special meaning applies only in the specific places they appear in the grammar.
CQL statement are parsed by first identifying key expressions and terms (including role names and adjectives) while skipping the linking text. This allows identification of all the object types in each clause. Each clause is then matched to a fact type which has compatible object types by using the linking text and adjectives (whether or not a hyphen has been used), and the whole expression is then analysed. The remaining words of linking text (with the object types) form a reading which designates this the fact type.
CQL is case sensitive. Person is not the same thing as person. It's convenient however to use a capital letter for the names of all object types (called terms, because the specific sequence of words in an object type name will always designate that object type, and sometimes the lower-cased version of a name will appear in linking text.
White space and comments as similar to C and C++:
/* comment
may span lines */ and
// introduces a comment to end of the current line.
CQL requires lexical support for only a rudimentary set of data types and literals; boolean (true/false), numbers (integer, real, scientific), character strings, and ranges of numbers and strings (ranges may be open at either end).
Data types can be refined through the use of parameters to match concrete types available in typical target data environments and languages, and code generators will assume reasonable defaults even without that. You can also define entirely new data types and their parameters, for use in specialised code generators or where a target language has a data type that is not special in the code generators. The syntax for value literals is near the end of this document.
A CQL file must start with a schema definition:
An import definition imports object type names from another schema, possibly using the alias syntax to rename some terms. In addition, fact type readings from the imported vocabulary may subsequently be included in new definitions which provide translations specific to this vocabulary.
A Value Type is a kind of thing which has a single value that may be written down, that is, a lexical type, like a number, a name, a date, etc. CQL literals may provide more than one way to write the same thing, for example the integer 16 may be written as the hexadecimal 0x10, but the uniqueness of these is determined by the underlying canonical value.
A value type is usually derived from another value type, whether or not that value type has been defined in an imported vocabulary. A top-level value type is merely one whose existence is presumed, or is written as itself in a circular fashion. Many top-level names are detected and handled by code generators that receive CQL, but CQL itself does not do this.
A value type may define named parameters, and may assign or restrict parameter values. For legacy reasons, there are two positional parameters called length and scale.
A numeric literal or a Value Type may be assigned a specific unit of measure
A value type value constraint can restrict the allowable values from those allowed by the supertype.
each Month Nr is written as an Integer restricted to {1..12}; each Currency Amount is written as a Decimal(Precision: 14, Scale: 2); each Name is written as a String(64, accepts Encoding as String restricted to {'ASCII', 'UTF-8'}); each Claim Sequence Nr is written as an Integer restricted to {1..999}; each Contact Method is written as a Char(1) restricted to {'B', 'H', 'M'}; each Glass Area is written as an Integer in mm^2; each Acceleration is written as a Real in m/s^2;
A unit definition defines a new unit identifier in terms of an a conversion to and from other more fundamental units. It allows an optional coefficient (real number or integer fraction) multiplied by one or more base units, each raised to an integer power. It's common to define the singular form of a unit, then also define the plural as equivalent.
0.01 m^3 converts to cc; 1000.0 cc converts to liter; 299792458.0 m sec^-1 converts to lightspeed approximately; 0.00000000011125945705385 C^2 N^-1 e^-2 electronmass^-1 hbar^2 m^-2 converts to bohrradius;Units are defined by declaring conversion formulae involving base units. Both singular and plural names may be given. If a base unit is not otherwise defined, it is assumed to be fundamental. Formulae are limited to a coeficient and an offset (
25.4 millimeters converts to inch/inches; kelvin + 273.15 converts to celsius; 9/5 celsius + 32 converts to fahrenheit; acceleration converts to metres second^-2; g converts to 9.8 acceleration approximately; 0.853 dollarUS converts to dollarAU ephemeral;The use of units in defining value types and in query literals allows dimensional analysis and automatic conversions. This example shows a derived fact type ("Pane has Area") and a query that uses units conversion to list large panes of glass. Notice that the literal in last line has an associated unit (this line also has a contracted join, see below).
Dimension is written as Real in millimeters; Width is written as Dimension; Height is written as Dimension; Pane has one Width; Pane has one Height; Pane has Area where Pane has Width, Pane has Height, Area = Width * Height; Pane has Area > 5 foot^2?
Each Entity Type plays roles in at least one fact type, and is identified by the combination of the counterparts of one or more such roles. At least one identifying role must be mandatory.
The simplest form of entity identification scheme is by a single value, similar to the General reference mode in ORM:
each Concept is identified by GUID;Here, GUID is an existing object type, or will be asserted as a value type, and CQL will create the existential fact type "Concept has one GUID" with the alternate reading "GUID is of at most one Concept". If you don't want to use has/is of, you can provide your own reading, and still get the other.
Popular reference modes from ORM derive a new value type that is a subtype of the specified type, and whose name includes the name of the new entity type. A value type restriction may also be appended:
each Item is identified by its Number(Digits: 14); each Year is identified by its Number restricted to {1900..};Here, each Item has one Item Number. If Item Number has not already been declared, it will be defined as a subtype of Number, which will in turn be defined as a new value type if necessary. Again, you can use the default has/is of or can provide your own readings. Note the open-ended range in the definition of Year Number.
You can define identification patterns long-hand, and this is the only way to define multi-part identifiers. Here's the equivalent to the above, and a multi-part definition:
each GUID is written as a GUID; each Concept is identified by a GUID where that Concept has one GUID, that GUID is of at most one Concept; each Number is written as a Number; each Item Number is written as a Number(Digits: 14); each Item is identified by an Item Number where that Item has one Item Number, that Item Number is of at most one Item; each Policy is identified by a Year and a sequence- Number where that Policy was issued in one Year, that Policy is assigned one sequence- Number;Note that that is optional, as are the (equivalent) indefinite articles a/an and even the initial each. We think it's nicer to use the longer form however.
Note that in defining single-role identification, both roles must be unique (a 1-to-1 relationship), and the new entity type's role must be mandatory. To use the inline definition of mandatory and uniqueness (one in the above), there must be two readings (one in each direction) because inline quantifiers like this may only be placed on the last role in a fact type reading.
In defining a multi-part identifier, the entity type's counterparts must be unique (but not vice versa; we can issue more than one Policy in a Year!), and at least one must be mandatory. Note the hyphenation used with the word sequence. This binds the adjective sequence to Number, which verbalises properly and flows to column names. Adjectives may also be bound following the Term, for use e.g. in French.
In all cases, any identifying role may be played by either a Value Type or an Entity Type.
If the default fact type readings (has/is of) aren't appropriate, you can provide your preferred readings. The required uniqueness and mandatory constraints are still added where needed.
Item is identified by its Number where Item is called Item Number;
Some code generators make use of pragmas to control their behaviour:
An entity type may be declared to be a subtype of one or more other entity types, the supertypes. A subtype instance is also an instance of each supertype, so it must play all their mandatory roles (it may play any others) and thus shares their identifiers. It may also have its own identification scheme (though this is not common; it will be identified by its relationship with its first supertype).
each Asset is identified by Asset ID; each Vehicle is a kind of Asset identified by VIN; each Employee is a kind of Person identified by its Number;
each Apple is a kind of Fruit; each Shelf Life is written as Time in days; Fruit has at most one Shelf Life; Fruit has one Price per kg; Apple Type is a kind of Fruit; Apple Type 'Jonathon' has Shelf Life 31 and has Price 3.20 per kg;
In these fact types, each fruit (and so each apple type) must have a price and maybe a shelf life.
Declaring a subtype creates subtyping fact types, which is useful when subtyping relationships must be explicitly queried.
An object type that can have instances which play no roles (other than its identifying roles) should be declared independent:
State is independent identified by its Code;
A Fact Type is an expression of possibility consisting of an optional naming expression, one or more fact type readings, optionally followed by a query (for a derived fact type). A Fact Type expresses a possible relationship between two or more objects or a characteristic of a single object. The objects are said to play a role in the instances of this fact type. The same object or (two objects of the same type) may play more than one role in the same fact.
Each reading provides a different verbal expression for the same meaning, so must have the same set of role players, often in a different order. When a named fact type is defined, each role player also has a link fact with the objectifying entity, so readings for these fact types may also be included. The first reading for a new fact type is preferred, and sets a default priority order when sorting fact instances.
If a fact type is named, its instances can play a role in other facts. In some cases, a code generator will require that a fact type is named, especially if it has two roles with no uniqueness constraint, or has more than two roles. This example shows an un-named and a named (objectified) fact type:
Person smokes; /* Unary fact type (characteristic) */ Person was born at one birth-Place; Directorship is where Person directs Company, Company is run by Person, /* Two readings */ Person holds Directorship; /* Link fact reading */
If an object type plays more than one role in the same fact type, the separate roles must be distinguished by either adjectives, subscripts, or a defined role name. See the resolution rules under Resolving roles for more details.
A derived fact type also has a query. This query must contain at least one occurrence of the roles of the new fact type, but may link them together in a logical expression that allows computation of the population of the derived fact type. See the discussion of queries below.
A Reading is made up of noun phrases (involving an object type) and linking words (which can be any non-special text). Some of the optional elements in the discussion below are only usable in queries or in other special contexts, but we need to introduce them here. A Fact literal will for example include one literal for every value role. Queries may include aggregate or objectification_step elements, which will be explained later.
Every role in a fact type must have a unique full name. Where the same object type plays two roles, this can be achieved by adding a numeric subscript to each, adding an explicit role name, or using distinctive leading or trailing adjectives (as in given-Name, family-Name). The adjectives are introduced by proper placement of a hyphen and perhaps white-space, and are detected by scanning the whole definition before attempting to parse it. See the discussion of Terms for more information.
Hyphen used in Readings usually indicate the use of adjectives, which can be either leading or trailing (e.g. in French, trailing adjectives are used). The hyphen is only required once within a declaration, and this associates the adjective with that role player throughout this declaration.
Person is identified by given-Name and family-Name where Person is called one given Name, given Name is of Person, Person has one family Name, family Name is of Person;
Hyphens may be used to designate multiple adjectives, but must have a space beside the hyphen, on the side of the existing object type name. Otherwise the pair of (previously unseen) words is treated as a simple hyphenated word (not a term):
suitably- trained Person is allowed to drive semi-trailer;
It is also possible to use hyphenated words in adjectives. See the details under Terms
Various constraints may be included in a noun phrase of a fact type definition. The simplest are quantifiers, which require that an instance of the object type may play this role at most once, exactly once, or at least once. Only the last role in any reading may have an embedded quantifier. Thus to define a one-to-one fact type using embedded constraints, it is essential to have two readings. The alternative is to verbalise a separate constraint elsewhere, which is much less concise.
In some contexts, there is no need for a quantifier, but another article may be needed in the same place. The keywords some and that may be used to resolve ambiguous references to the same object type, or to make it clear where a name refers to the same instance previously mentioned in the same definition. In a query, the keyword which indicates that the population of this role should be included in the query result.
The keyword no (as a quantifier) or not (as any linking word) indicates that there must be no matching instance of the corresponding fact.
In the definition of some entity types, and in many queries, a fact reading may start or end with an term that starts the next reading or comparison expression. This allows the use of more concise verbal forms called contractions. The three cases are:
Qualifiers are keywords enclosed in square brackets after a clause, which apply some condition to that clause. The most common case is to define a Ring Constraint - see the section on that
Many queries look like a statement of fact, but end with a question mark. Each term in the query corresponds to a variable. Except for the question mark, the body of a query may also occur in a derived fact type. Some of the variables may be bound to a specific value, while others may be free. Free variables may be preceded by the keyword "which" (indicating the value is sought) or "some" indicating that some value must exist, but we don't care to see it. The response to the query includes all sets of identifying values for the free variables which satisfy the conditions of the query. If a query has no free variables, the response is either "yes" or "no", indicating whether the conditions are met. Except for the question mark, the body of a query may also occur in a derived fact type.
Country Code 'CH'? /* Does 'CH' exist as a Country Code? */ Person 'Daniel' drives some Car? /* Does Daniel drive any Car? */ Person 'Daniel' drives which Car? /* What Car does Daniel drive? */ Person 'Daniel' drives some Car and speeds? /* Does Daniel drive any Car and does he speed? */ sum of Fine in (Person 'Daniel' received Fine for Driving Offence); /* total fines incurred */
When a query involves a named fact type that has no suitable link fact readings, it may necessary to use a special expression (an objectification_step) to join the contents of the fact. In this expression, a parenthesised sub-expression follows the term, starting with the text (in which and ending with a closing parenthesis. So if we have a named fact type Booking as the objectification of "Person booked Table", we might say Booking (in which some Person booked some Table)....
// Ask which Waiters received which tip Amounts Service (in which which Waiter served some Meal) earned a tip of which Amount?
Certainty keywords may be used to indicate whether a given fact must be matched (the default), may be matched (outer join semantics) or must not be matched (anti-join). Note that inserting the quantifier no anywhere into a clause also means that fact must not be matched.
The returning clause indicates which other roles in the query expression should be made available to a program that requests a derived fact. The behaviour is transitive; if the query uses other derived fact types with a returning clause, those roles also should be returned. These extra roles do not affect the population or behaviour of the derived fact type, but can be useful in justifying an individual instance.
Normally, when processing a query, only the object instances that play the roles of the derived fact type (and satisfy the query) will be available in the results, and there is no defined ordering in the values. When the returning clause is used, additional object and fact instances may be accessible from the result, which may also be sorted. This extension of the result set is transitive, so that if a derived fact type invokes another derived fact type, the returned instances from the invoked fact type’s returning clause will also be available.
The results now include more than just a simple table of the instances that play the roles of the derived fact type. Instead, each object instance may be associated with additional facts for other roles it plays, and the roles of those facts will be populated by further object instances, and so on. This data structure is hereby defined as a constellation, which is where CQL gets its name. The query has selected certain instances from the entire fact population, much as an astronomer might select stars from the night sky.
The use of returning doesn’t change the contents of
the defined fact type, it’s merely a pragmatic instruction to
the query engine about which additional instances will be useful to
the calling program, and in what order.
Algebraic and aggregate expressions follow simple patterns:
Product may be substituted by alternate-Product in Season [acyclic, intransitive] Topic belongs to at most one parent-Topic [acyclic]; Girl (as Girlfriend) is going out with at most one Boy (as Boyfriend) [symmetric]
A value constraint may follow apply to a value type, or to a role played by a value type (or by an entity type ultimately identified by a single value type), and this constrains the allowed values of that value. In addition to fact type definitions, a value constraint may be applied in derivations and to fact instances (one value only!), where it has the obvious effect.
Embedded quantifiers allow the definition of the most common kinds of constraints, the internal mandatory, uniqueness and frequency constraints (collectively, CQL calls these presence constraints). Often there are constraints that cannot be expressed in this form, such as when an object type must play one or at most one of many unrelated roles (an external mandatory constraint, possibly disjunctive).
When a single role player must play one and only one (or at least one) of a set of roles, we can say:
In this example, a Range must have either a minimum Bound or a maximum Bound, or both:each Range occurs at least one time in Range has minimum-Bound, Range has maximum-Bound;
In another example, supposing that we were to identify Person instances by given name and family name (not a good idea in a real system!) we need to ensure that the combination given name, family name is unique. We can say:
each family Name, given Name occurs at most one time in Person is known by given-Name, Person has family-Name;
Note that with "each Person occurs at most one time", this syntax is an Exclusion Constraint (disjunction), not a Presence Constraint. With "exactly one" it is also Mandatory (exactly one). In both cases, a more succinct syntax may be available.
The only ORM characteristic that cannot be expressed this way is a non-mandatory constraint having a minimum frequency above one; for example a constraint that allows zero, or more than two, occurrences. For example, in a footy tipping competition, it might be the case that if a participant submits no tips this week, they get the tips published by a known tipster, but if they do submit tips, they must submit at least eight. This kind of non-mandatory frequency constraint may be expressed in CQL using the maybe qualifier, which is also used in outer join derivations.
maybe Participant entered at least 8 Tips
either Person abstains from smoking or Person is at risk of cancer;
Set constraints compare two or more sets of one or more roles each.
for each ReceivedItem exactly one of these holds: ReceivedItem is for PurchaseOrderItem, ReceivedItem is for TransferRequest;
for each Unit exactly one of these holds: that Unit is fundamental, that Unit is derived from some base-Unit;In the case where exactly one of two fact types applies, you can use the more natural form:
either Unit is fundamental or that Unit is derived from some base-Unit but not both;
A subset constraint says that one thing is the case only if some other thing is.
Address has third-StreetLine only if that Address has some second-StreetLine;
Note that this example didn't use the first and second StreetLine, as we assume that the first StreetLine is a mandatory part of the address, so the subset constraint would be redundant.
Equality constraints declare that the populations of two or more roles (or sequences of roles) are the same. They are expressed using ‘if and only if’:
Competition is in Series if and only if Competition has series-Number;
When a fact type includes the same object type more than once, or includes a supertype and its subtype, there’s the possibility of the same instance playing both roles. This is often not desired, but further it introduces a whole class of further situations which can be restricted using ring constraints. The CQL keywords used in fact clause qualifiers for ring constraints are the following:
intransitive, transitive, acyclic and symmetric. Intransitive means that just because “A relates to B”, and “B relates to C”, that doesn’t mean that “A relates to C”. Transitive means the opposite. Acyclic means that no A may relate to itself, or to any B that has that relation to A, and so on. Symmetric means that if A relates to B, B also relates to A (so there is only one fact instance possible between A and B).
This method for defining ring constraints is not fully general, and a new syntax is required for covering complex cases
By default, a Constraint restricts the instances of objects and facts that are possible in a valid Population. This is referred to as the alethic mode. When it is possible (but not permissable) to violate a constraint, that constraint is treated as deontic. These terms come from modal logic. CQL allows any constraint to be defined as deontic by specifying an enforcement action to be taken when a violation is detected. Maybe the action is ineffectual, but the possibility of this constraint being violated is indicated by the use of a note starting with otherwise. Implementations that process CQL should allow such situations to be asserted, and (if feasible) should support the enforcement actions.
Most of the above constraint types may include embedded queries, which ORM calls join constraints. Any clause in these constraints may be an arbitrary derivation (expression of joined clauses). The constraint compares the population of specific roles that are projected from the derivations. This has the same effect as if the constraint applied over derived fact types, except that those fact types do not need to be defined or given readings. Here is a small example of a subset constraint using derivations:
Diplomat speaks Language; Country uses Language, Language is spoken in Country; Diplomat serves in Country; Diplomat serves in Country only if Country uses Language and Diplomat speaks Language;
This constraint requires that in order to serve in a country, a diplomat must speak at least one language used in that country. The use of a contraction makes this more succinct (and dressed up with some/that):
some Diplomat serves in some Country only if that Diplomat speaks some Language that is spoken in that Country;
A value instance is defined by stating the name of the value type followed by the lexical representation of the value. An entity having a single identifying role may be defined exactly the same way, which also defines the required identifying instance (in the third example here, a value). Within a vocabulary, an instance is asserted into the sample population, but in other contexts another population may be the target (the metamodel supports arbitrary named populations).
CompanyName 'Microsoft'; Year 1999; Company 'Microsoft'; /* Entity identified by a single value */
When a fact reading is invoked with values, a fact instance is created. The simplest is where a declaration is just an object type name followed by a value:
Name ‘Fred’;
This form is allowed for any value type, or any entity type that’s identified by a single value type (or an entity identified by a single entity identified by a single value type, etc). In more complex cases, it might be necessary or convenient to invoke more than one fact type to define the instance:
Person is called given name ‘Fred’, Person has family Name ‘Bloggs’;
or
given Name ‘Fred’ is of Person who has family Name ‘Bloggs’;
The Person instance being defined is a reference to the same instance in each fact type reading; there is an implicit join over the two clauses.
Person has family Name, family Name = ‘Bloggs’, Person is not called given Name ‘Fred’, Person is a kind of Employee, Employee is managed by no Manager;
Business context such as the reasons for certain modeling decisions may need to be recorded. Although CQL is perhaps not the ideal way to do that, it is supported to support model exchange.
A term also allows zero or more leading adjectives and/or trailing adjectives. Leading adjectives are indicated by a hyphen after the first adjective, and trailing adjectives are indicated by a hyphen before the last adjective. This makes it complicated because (like a linking word) an adjective may be hyphenated. If adjectives are introduced by a hyphenated word, the hyphen is doubled instead of being adjacent. Accordingly, there are very precise rules about where spaces and hyphens are allowed inside terms.
Most other lexical rules mostly follow standard C, UNIX and HTTP conventions.