2. Language Reference

This section gives a systematic overview of the FlowSpec language.

2.1. Lexical matters

2.1.1. Identifiers

Most identifiers in FlowSpec fall into one of two categories, which we will refer to as:

  • Lowercase identifiers, that start with a lowercase character, and must match the regular expression [a-z][a-zA-Z0-9]*.
  • Uppercase identifiers, that start with an uppercase character, and must match the regular expression [A-Z][a-zA-Z0-9]*.

2.1.2. Comments

Comments in FlowSpec follow C-style comments:

  • // ... single line ... for single-line comments
  • /* ... multiple lines ... */ for multi-line comments

Multi-line comments can be nested, and run until the end of the file when the closing */ is omitted.

2.2. Terms and patterns

Terms can be constructed terms from either the abstract syntax tree or a user-defined algebraic data type. Tuples are built-in, as are sets and maps. The latter two have special construction and comprehension syntax.

term = ctor-id "(" {term ","}* ")"
     | "(" {term ","}* ")"
     | "{" {term ","}* "}"
     | "{" term "|" {term ","}* "}"
     | "{" {(term "|->" term) ","}* "}"
     | "{" term "|->" term "|" {(term "|->" term) ","}* "}"

Pattern matching is done with patterns for constructed terms and tuples. Parts can also be bound to a variable or ignored with a wildcard.

pattern = ctor-id “(” {pattern “,”}* “)”
“(” {pattern “,”}* “)”
var-id “@” pattern
“_”
var-id

2.3. Modules

module module-id

  section*

FlowSpec specifications are organized in modules. A module has a name. This name consists of one or more names separated by slashes {name /}+. The names must match the regular expression [a-zA-Z0-9\_][a-zA-Z0-9\_\.\-]*.

Every module is defined in its own file, with the extension .flo. The module name and the file paths must match.

Example. An empty module analysis/flow/control, defined in a file .../analysis/flow/control.flo.

module analysis/flow/control

// TODO: add control-flow rules

Modules consist of sections for imports, control-flow rules, data-flow properties and rules, lattices and functions.

2.3.1. Imports

imports

  module-ref*

  external

    module-ref*

A module can import definitions from other modules by importing the other module. Imports are specified in an imports section, which lists the modules being imported. A module reference can be:

  • A module identifier, which imports a single module with that name.
  • A wildcard, which imports all modules with a given prefix. A wildcard is like a module identifier, but with a dash as the last part, as in {name /}+ /-.

A wildcard import does not work recursively. For example, analysis/- would imports analysis/functions, and analysis/classes, but not analysis/lets/recursive.

External imports allow you to import module of for example Stratego, to import the signatures of the abstract syntax you wish to match on.

Example. A main module importing several submodules.

module liveness

imports
  control

  external
    signatures/-

2.4. Control Flow

control-flow rules

  control-flow-rule*

The first step of analysis in FlowSpec is to define the control-flow through a program. This connection is established with rules that match patterns of abstract syntax and providing the control-flow of that pattern.

2.4.1. Rules

A normal control-flow rule maps an abstract syntax pattern to a list of control-flow edges.

pattern* = {cfg-edges ","}+

These edges can start from the special entry and exit control-flow nodes that are provided to connect the pattern to the wider control-flow graph. Subtrees matched in the abstract syntax pattern are usually used directly at one side of an edge to connect their corresponding sub-control-flow graph. They can also be inserted as direct control-flow nodes using the node keyword. This is rarely used. More likely, you may want to insert the whole matched pattern as a node. The this keyword can be used for that.

cfg-edges = {cfg-edge-end "->"}+

cfg-edge-end = "entry"
             | "exit"
             | variable
             | "node" variable
             | "this"

A common case exists where you merely wish to register a pattern as a control-flow graph node. Rather than write out [pattern] = entry -> this -> exit, you can write node [pattern] for this.

Example. Module that defines control-flow for some expressions

module control

control-flow rules

  node Int(_)
  Add(l, r) = entry -> l -> r -> this -> exit

2.4.2. Root rules

A root of the control-flow defines the start and end nodes of a control-flow graph. You can have multiple control-flow graphs in the same AST, but not nested ones. Each control-flow graph has a unique start and end node. A root control-flow rule introduces the start and end node. In other control-flow rules these nodes can be referred to for abrupt termination.

cfg-edge-end = ...
             | "start"
             | "end"

Example. Module that defines control-flow for a procedure, and the return statement that goes straight to the end of the procedure.

module control

control-flow rules

  root Procedure(args, _, body) = start -> args -> body -> end
  Return(_) = entry -> this -> end

2.5. Data Flow

Data-flow analysis in FlowSpec is based on named properties. Data-flow properties are defined in a property definition section, their rules are defined in a property rules section. Properties have an associated lattices, whose operations take care of merging data at merge points in the control-flow.

properties

  property-definition*
property rules

  property-rule*

2.5.1. Definitions

A property definition consists only of the property name, with a lowercase start and otherwise camelcase for multiple words. The lattice looks like a type expression but uses a lattice name. This lattice instance is used internally for the data-flow computation.

property-definition = name ":" lattice

2.5.2. Rules

A property rule consists of the name of the property, a pattern within round brackets and an expression after the equals sign. The pattern matches a control-flow graph node by its originating AST and another control-flow graph node before or after it by name. The expression describes the effect of the matched control-flow graph node, in terms of a change to the value from the adjacent control-flow graph node matched. All rules of a property need to propagate the information in the same way, either forward or backward.

Each property needs to have at least one rule with the start or end pattern. This is pattern matches the extremal node of a control-flow graph and defines the initial value there. With set-based analyses this is usually the empty set, as usually nothing is known at that point. The extremal node needs to match the direction of the rules, with start for a forward analysis and end of a backward analysis.

property-rule = name "(" prop-pattern ")" "=" expr
prop-pattern = name "->" pattern
             | pattern "->" name
             | pattern "." "start"
             | pattern "." "end"

2.6. Lattices

Lattices definitions are defined in their own section.

lattices

  lattice-definition*

Each lattice definition consists of a name and a number of components: the underlying type, the least-upper-bound or join operation between two lattice values (the less-than-or-equal comparison is derived from this operation), the top value and the bottom value. The type is usually defined by the user in another section as an algebraic data type, and operated on with pattern matching.

name where
  type = type
  lub([name], name) = expr
  top = expr
  bottom = expr

2.7. Types

Algebraic data types can be defined in a types block.

types

  type-definition*

Each type has a name and one or more option preceded by a vertical bar. Each option has a constructor and zero or more arguments in round brackets.

name =
  ("|" ctor-id "(" {type ","}* ")")+

2.8. Expressions

There are many expressions supported to express (abstract) semantics of control-flow nodes.

2.8.1. Integers

Integer literals are as usual: an optional minus sign followed by one or more decimals.

Supported integer operations are
  1. addition [+],
  2. subtraction [-],
  3. multiplication [*],
  4. division [/],
  5. modulo [%],
  6. negate [-],
  7. comparison [<, <=, >, >=, ==, !=].

2.8.2. Booleans

Boolean literals true and false are available as well as the usual boolean operations:

  1. and [&&]
  2. or [||]
  3. not [!]

2.8.3. Sets and Maps

Set and map literals are both denoted with curly braces. A set literal contains a comma-separated list of elements: {elem1, elem2, elem3}. A map literal contains a comma-separated list of bindings of the form key |-> value.

Operations on sets and maps include
  1. union [\/],
  2. intersection [/\],
  3. set/map minus [\],
  4. containment/lookup [in].

There are also comprehensions of the form { new | old <- set, conditions } or { newkey |-> newvalue | oldkey |-> oldvalue <- map, condition }, where new elements or bindings are gathered based on old ones from a set or map, as long as the boolean condition expressions hold. Such a condition expression may also be a match expression without a body for the arms. This is commonly used to filter maps or sets.

2.8.4. Match expressions

Pattern matching can be done with a match expression: match expr with | pattern1 => expr2 | pattern2 => expr2, where expr are expressions and pattern are patterns. Terms and patterns are defined at the start of the reference.

2.8.5. Variables and references

Pattern matching can introduce variables. Other references include values in the lattice, such as MaySet.bottom or MustSet.top.

2.8.6. Function application

User defined functions are available to be called with functionname(arg1, arg2). Lattice operations can also be called with for example MaySet.lub(s1, s2).

2.8.7. Property lookup

Property lookup is similar to a function call, although property lookup only ever has a single argument.

2.9. Functions

Functions are defined in their own section.

functions

  function-definition*

A function definition consists of a name, typed arguments and a function body expression.

name([{(name ":" type) ","}+]) =
  expr