2. Language Reference¶
This section gives a systematic overview of the FlowSpec language.
2.1. Lexical matters¶
2.1.1. Identifiers¶
Most identifiers in FlowSpec fall into one of two categories, which we will refer to as:
- Lowercase identifiers, that start with a lowercase character, and
must match the regular expression
[a-z][a-zA-Z0-9]*
. - Uppercase identifiers, that start with an uppercase character, and
must match the regular expression
[A-Z][a-zA-Z0-9]*
.
2.1.2. Comments¶
Comments in FlowSpec follow C-style comments:
// ... single line ...
for single-line comments/* ... multiple lines ... */
for multi-line comments
Multi-line comments can be nested, and run until the end of the file
when the closing */
is omitted.
2.2. Terms and patterns¶
Terms can be constructed terms from either the abstract syntax tree or a user-defined algebraic data type. Tuples are built-in, as are sets and maps. The latter two have special construction and comprehension syntax.
term = ctor-id "(" {term ","}* ")"
| "(" {term ","}* ")"
| "{" {term ","}* "}"
| "{" term "|" {term ","}* "}"
| "{" {(term "|->" term) ","}* "}"
| "{" term "|->" term "|" {(term "|->" term) ","}* "}"
Pattern matching is done with patterns for constructed terms and tuples. Parts can also be bound to a variable or ignored with a wildcard.
- pattern = ctor-id “(” {pattern “,”}* “)”
“(” {pattern “,”}* “)”var-id “@” pattern“_”var-id
2.3. Modules¶
module module-id
section*
FlowSpec specifications are organized in modules. A module has a name. This name consists of one or more names separated by slashes {name “/”}+
. The names must match the regular expression [a-zA-Z0-9\_][a-zA-Z0-9\_\.\-]*
.
Every module is defined in its own file, with the extension .flo
. The module name and the file paths must match.
Example. An empty module analysis/flow/control
, defined in a file
.../analysis/flow/control.flo
.
module analysis/flow/control
// TODO: add control-flow rules
Modules consist of sections for imports, control-flow rules, data-flow properties and rules, lattices and functions.
2.3.1. Imports¶
imports
module-ref*
external
module-ref*
A module can import definitions from other modules by importing the other module. Imports are specified in an imports
section, which lists the modules being imported. A module reference can be:
- A module identifier, which imports a single module with that name.
- A wildcard, which imports all modules with a given prefix. A
wildcard is like a module identifier, but with a dash as the last
part, as in
{name “/”}+ “/-“
.
A wildcard import does not work recursively. For example,
analysis/-
would imports analysis/functions
, and
analysis/classes
, but not analysis/lets/recursive
.
External imports allow you to import module of for example Stratego, to import the signatures of the abstract syntax you wish to match on.
Example. A main module importing several submodules.
module liveness
imports
control
external
signatures/-
2.4. Control Flow¶
control-flow rules
control-flow-rule*
The first step of analysis in FlowSpec is to define the control-flow through a program. This connection is established with rules that match patterns of abstract syntax and providing the control-flow of that pattern.
2.4.1. Rules¶
A normal control-flow rule maps an abstract syntax pattern to a list of control-flow edges.
pattern* = {cfg-edges ","}+
These edges can start from the special entry
and exit
control-flow nodes that are provided to connect the pattern to the wider control-flow graph. Subtrees matched in the abstract syntax pattern are usually used directly at one side of an edge to connect their corresponding sub-control-flow graph. They can also be inserted as direct control-flow nodes using the node
keyword. This is rarely used. More likely, you may want to insert the whole matched pattern as a node. The this
keyword can be used for that.
cfg-edges = {cfg-edge-end "->"}+
cfg-edge-end = "entry"
| "exit"
| variable
| "node" variable
| "this"
A common case exists where you merely wish to register a pattern as a control-flow graph node. Rather than write out [pattern] = entry -> this -> exit
, you can write node [pattern]
for this.
Example. Module that defines control-flow for some expressions
module control
control-flow rules
node Int(_)
Add(l, r) = entry -> l -> r -> this -> exit
2.4.2. Root rules¶
A root of the control-flow defines the start
and end
nodes of a control-flow graph. You can have multiple control-flow graphs in the same AST, but not nested ones. Each control-flow graph has a unique start
and end
node. A root
control-flow rule introduces the start
and end
node. In other control-flow rules these nodes can be referred to for abrupt termination.
cfg-edge-end = ...
| "start"
| "end"
Example. Module that defines control-flow for a procedure, and the return statement that goes straight to the end of the procedure.
module control
control-flow rules
root Procedure(args, _, body) = start -> args -> body -> end
Return(_) = entry -> this -> end
2.5. Data Flow¶
Data-flow analysis in FlowSpec is based on named properties. Data-flow properties are defined in a property definition section, their rules are defined in a property rules section. Properties have an associated lattices, whose operations take care of merging data at merge points in the control-flow.
properties
property-definition*
property rules
property-rule*
2.5.1. Definitions¶
A property definition consists only of the property name, with a lowercase start and otherwise camelcase for multiple words. The lattice looks like a type expression but uses a lattice name. This lattice instance is used internally for the data-flow computation.
property-definition = name ":" lattice
2.5.2. Rules¶
A property rule consists of the name of the property, a pattern within round brackets and an expression after the equals sign. The pattern matches a control-flow graph node by its originating AST and another control-flow graph node before or after it by name. The expression describes the effect of the matched control-flow graph node, in terms of a change to the value from the adjacent control-flow graph node matched. All rules of a property need to propagate the information in the same way, either forward or backward.
Each property needs to have at least one rule with the start or end pattern. This is pattern matches the extremal node of a control-flow graph and defines the initial value there. With set-based analyses this is usually the empty set, as usually nothing is known at that point. The extremal node needs to match the direction of the rules, with start for a forward analysis and end of a backward analysis.
property-rule = name "(" prop-pattern ")" "=" expr
prop-pattern = name "->" pattern
| pattern "->" name
| pattern "." "start"
| pattern "." "end"
2.6. Lattices¶
Lattices definitions are defined in their own section.
lattices
lattice-definition*
Each lattice definition consists of a name and a number of components: the underlying type, the least-upper-bound or join operation between two lattice values (the less-than-or-equal comparison is derived from this operation), the top value and the bottom value. The type is usually defined by the user in another section as an algebraic data type, and operated on with pattern matching.
name where
type = type
lub([name], name) = expr
top = expr
bottom = expr
2.7. Types¶
Algebraic data types can be defined in a types block.
types
type-definition*
Each type has a name and one or more option preceded by a vertical bar. Each option has a constructor and zero or more arguments in round brackets.
name =
("|" ctor-id "(" {type ","}* ")")+
2.8. Expressions¶
There are many expressions supported to express (abstract) semantics of control-flow nodes.
2.8.1. Integers¶
Integer literals are as usual: an optional minus sign followed by one or more decimals.
- Supported integer operations are
- addition [
+
], - subtraction [
-
], - multiplication [
*
], - division [
/
], - modulo [
%
], - negate [
-
], - comparison [
<
,<=
,>
,>=
,==
,!=
].
- addition [
2.8.2. Booleans¶
Boolean literals true
and false
are available as well as the usual boolean operations:
- and [
&&
]- or [
||
]- not [
!
]
2.8.3. Sets and Maps¶
Set and map literals are both denoted with curly braces. A set literal contains a comma-separated list of elements: {elem1, elem2, elem3}
. A map literal contains a comma-separated list of bindings of the form key |-> value
.
- Operations on sets and maps include
- union [
\/
], - intersection [
/\
], - set/map minus [
\
], - containment/lookup [
in
].
- union [
There are also comprehensions of the form { new | old <- set, conditions }
or { newkey |-> newvalue | oldkey |-> oldvalue <- map, condition }
, where new elements or bindings are gathered based on old ones from a set or map, as long as the boolean condition expressions hold. Such a condition expression may also be a match expression without a body for the arms. This is commonly used to filter maps or sets.
2.8.4. Match expressions¶
Pattern matching can be done with a match expression: match expr with | pattern1 => expr2 | pattern2 => expr2
, where expr
are expressions and pattern
are patterns. Terms and patterns are defined at the start of the reference.
2.8.5. Variables and references¶
Pattern matching can introduce variables. Other references include values in the lattice, such as MaySet.bottom
or MustSet.top
.
2.8.6. Function application¶
User defined functions are available to be called with functionname(arg1, arg2)
. Lattice operations can also be called with for example MaySet.lub(s1, s2)
.
2.8.7. Property lookup¶
Property lookup is similar to a function call, although property lookup only ever has a single argument.
2.9. Functions¶
Functions are defined in their own section.
functions
function-definition*
A function definition consists of a name, typed arguments and a function body expression.
name([{(name ":" type) ","}+]) =
expr