How Do I Write Policies?
OPA is purpose built for reasoning about information represented in structured documents. The data that your service and its users publish can be inspected and transformed using OPA’s native query language Rego.
What is Rego?
Rego was inspired by Datalog, which is a well understood, decades old query language. Rego extends Datalog to support structured document models such as JSON.
Rego queries are assertions on data stored in OPA. These queries can be used to define policies that enumerate instances of data that violate the expected state of the system.
Why use Rego?
Use Rego for defining policy that is easy to read and write.
Rego focuses on providing powerful support for referencing nested documents and ensuring that queries are correct and unambiguous.
Rego is declarative so policy authors can focus on what queries should return rather than how queries should be executed. These queries are simpler and more concise than the equivalent in an imperative language.
Like other applications which support declarative query languages, OPA is able to optimize queries to improve performance.
The Basics
This section introduces the main aspects of Rego.
The simplest rule is a single expression and is defined in terms of a Scalar Value:
pi := 3.14159
Rules define the content of documents. We can query for the content of the pi
document generated by the rule above:
pi
3.14159
Rules can also be defined in terms of Composite Values:
rect := {"width": 2, "height": 4}
The result:
rect
{
"height": 4,
"width": 2
}
You can compare two scalar or composite values, and when you do so you are checking if the two values are the same JSON value.
rect == {"height": 4, "width": 2}
true
You can define a new concept using a rule. For example, v
below is true if the equality expression is true.
v { "hello" == "world" }
If we evaluate v
, the result is undefined
because the body of the rule never
evaluates to true
. As a result, the document generated by the rule is not
defined.
undefined decision
Expressions that refer to undefined values are also undefine. This includes comparisons such as !=
.
v == true
undefined decision
v != true
undefined decision
We can define rules in terms of Variables as well:
t { x := 42; y := 41; x > y }
The formal syntax uses the semicolon character ;
to separate expressions. Rule
bodies can separate expressions with newlines and omit the semicolon:
t2 {
x := 42
y := 41
x > y
}
When evaluating rule bodies, OPA searches for variable bindings that make all of the expressions true. There may be multiple sets of bindings that make the rule body true. The rule body can be understood intuitively as:
expression-1 AND expression-2 AND ... AND expression-N
The rule itself can be understood intuitively as:
rule-name IS value IF body
If the value is omitted, it defaults to true.
When we query for the value of t
we see the obvious result:
true
The order of expressions in a rule does not affect the document’s content.
s {
x > y
y = 41
x = 42
}
The query result is the same:
true
There’s one exception: if you use assignment :=
the compiler will check
that the variable you are assigning has not already been used.
z {
y := 41
y := 42
43 > y
}
1 error occurred: module.rego:5: rego_compile_error: var y assigned above
Rego References help you refer to nested documents. For example, with:
sites = [{"name": "prod"}, {"name": "smoke1"}, {"name": "dev"}]
And
r { sites[_].name == "prod" }
The rule r
above asserts that there exists (at least) one document within sites
where the name
attribute equals "prod"
.
The result:
true
We can generalize the example above with a rule that defines a set document instead of a boolean document:
q[name] { name := sites[_].name }
The value of q
is a set of names
[
"prod",
"smoke1",
"dev"
]
We can re-write the rule r
from above to make use of q
. We will call the new rule p
:
p { q["prod"] }
Querying p
will have the same result:
true
As you can see, rules which have arguments can be queried with input values:
q["smoke2"]
undefined decision
If you made it this far, congratulations!
This section introduced the main aspects of Rego. The rest of this document walks through each part of the language in more detail.
For a concise reference, see the Language Reference document.
Scalar Values
Scalar values are the simplest type of term in Rego. Scalar values can be Strings, numbers, booleans, or null.
Documents can be defined solely in terms of scalar values. This is useful for defining constants that are referenced in multiple places. For example:
greeting := "Hello"
max_height := 42
pi := 3.14159
allowed := true
location := null
These documents can be queried like any other:
[greeting, max_height, pi, allowed, location]
[
"Hello",
42,
3.14159,
true,
null
]
Strings
Rego supports two different types of syntax for declaring strings. The first is likely to be the most familiar: characters surrounded by double quotes. In such strings, certain characters must be escaped to appear in the string, such as double quotes themselves, backslashes, etc. See the Language Reference for a formal definition.
The other type of string declaration is a raw string declaration. These are made of characters surrounded by backticks (`
), with the exception
that raw strings may not contain backticks themselves. Raw strings are what they sound like: escape sequences are not interpreted, but instead taken
as the literal text inside the backticks. For example, the raw string `hello\there`
will be the text “hello\there”, not “hello” and “here”
separated by a tab. Raw strings are particularly useful when constructing regular expressions for matching, as it eliminates the need to double
escape special characters.
A simple example is a regex to match a valid Rego variable. With a regular string, the regex is "[a-zA-Z_]\\w*"
, but with raw strings, it becomes `[a-zA-Z_]\w*`
.
Composite Values
Composite values define collections. In simple cases, composite values can be treated as constants like Scalar Values:
cube := {"width": 3, "height": 4, "depth": 5}
The result:
cube.width
3
Composite values can also be defined in terms of Variables or References. For example:
a := 42
b := false
c := null
d := {"a": a, "x": [b, c]}
+----+-------+------+---------------------------+
| a | b | c | d |
+----+-------+------+---------------------------+
| 42 | false | null | {"a":42,"x":[false,null]} |
+----+-------+------+---------------------------+
By defining composite values in terms of variables and references, rules can define abstractions over raw data and other rules.
Sets
In addition to arrays and objects, Rego supports set values. Sets are unordered collections of unique values. Just like other composite values, sets can be defined in terms of scalars, variables, references, and other composite values. For example:
s := {cube.width, cube.height, cube.depth}
+---------+
| s |
+---------+
| [5,4,3] |
+---------+
Set documents are collections of values without keys. OPA represents set documents as arrays when serializing to JSON or other formats that do not support a set data type. The important distinction between sets and arrays or objects is that sets are unkeyed while arrays and objects are keyed, i.e., you cannot refer to the index of an element within a set.
When comparing sets, the order of elements does not matter:
{1,2,3} == {3,1,2}
true
Because sets are unordered, variables inside sets must be unified with a ground value outside of the set. If the variable is not unified with a ground value outside the set, OPA will complain:
{1,2,3} == {3,x,2}
1 error occurred: 1:1: rego_unsafe_var_error: var x is unsafe
Because sets share curly-brace syntax with objects, and an empty object is
defined with {}
, an empty set has to be constructed with a different syntax:
count(set())
0
Variables
Variables are another kind of term in Rego. They appear in both the head and body of rules.
Variables appearing in the head of a rule can be thought of as input and output of the rule. Unlike many programming languages, where a variable is either an input or an output, in Rego a variable is simultaneously an input and an output. If a query supplies a value for a variable, that variable is an input, and if the query does not supply a value for a variable, that variable is an output.
For example:
sites := [
{"name": "prod"},
{"name": "smoke1"},
{"name": "dev"}
]
q[name] { name := sites[_].name }
In this case, we evaluate q
with a variable x
(which is not bound to a value). As as result, the query returns all of the values for x
and all of the values for q[x]
, which are always the same because q
is a set.
q[x]
+----------+----------+
| x | q[x] |
+----------+----------+
| "prod" | "prod" |
| "smoke1" | "smoke1" |
| "dev" | "dev" |
+----------+----------+
On the other hand, if we evaluate q
with an input value for name
we can determine whether name
exists in the document defined by q
:
q["dev"]
"dev"
Variables appearing in the head of a rule must also appear in a non-negated equality expression within the same rule. This property ensures that if the rule is evaluated and all of the expressions evaluate to true for some set of variable bindings, the variable in the head of the rule will be defined.
References
References are used to access nested documents.
The examples in this section use the data defined in the Examples section.
The simplest reference contains no variables. For example, the following reference returns the hostname of the second server in the first site document from our example data:
sites[0].servers[1].hostname
"helium"
References are typically written using the “dot-access” style. The canonical form does away with .
and closely resembles dictionary lookup in a language such as Python:
sites[0]["servers"][1]["hostname"]
"helium"
Both forms are valid, however, the dot-access style is typically more readable. Note that there are four cases where brackets must be used:
- String keys containing characters other than
[a-z]
,[A-Z]
,[0-9]
, or_
(underscore). - Non-string keys such as numbers, booleans, and null.
- Variable keys which are described later.
- Composite keys which are described later.
References are always prefixed with a variable that identifies the root
document. In the example above this is sites
. The root document may be:
- a local variable inside a rule.
- a rule inside the same package.
- a document stored in OPA.
- a documented temporarily provided to OPA as part of a transaction.
Variable Keys
References can include variables as keys. References written this way are used to select a value from every element in a collection.
The following reference will select the hostnames of all the servers in our example data:
sites[i].servers[j].hostname
+---+---+------------------------------+
| i | j | sites[i].servers[j].hostname |
+---+---+------------------------------+
| 0 | 0 | "hydrogen" |
| 0 | 1 | "helium" |
| 0 | 2 | "lithium" |
| 1 | 0 | "beryllium" |
| 1 | 1 | "boron" |
| 1 | 2 | "carbon" |
| 2 | 0 | "nitrogen" |
| 2 | 1 | "oxygen" |
+---+---+------------------------------+
Conceptually, this is the same as the following imperative (Python) code:
def hostnames(sites):
result = []
for site in sites:
for server in site.servers:
result.append(server.hostname)
return result
In the reference above, we effectively used variables named i
and j
to iterate the collections. If the variables are unused outside the reference, we prefer to replace them with an underscore (_
) character. The reference above can be rewritten as:
sites[_].servers[_].hostname
+------------------------------+
| sites[_].servers[_].hostname |
+------------------------------+
| "hydrogen" |
| "helium" |
| "lithium" |
| "beryllium" |
| "boron" |
| "carbon" |
| "nitrogen" |
| "oxygen" |
+------------------------------+
The underscore is special because it cannot be referred to by other parts of the rule, e.g., the other side of the expression, another expression, etc. The underscore can be thought of as a special iterator. Each time an underscore is specified, a new iterator is instantiated.
Under the hood, OPA translates the
_
character to a unique variable name that does not conflict with variables and rules that are in scope.
Composite Keys
References can include Composite Values as keys if the key is being used to refer into a set. Composite keys may not be used in refs for base data documents, they are only valid for references into virtual documents.
This is useful for checking for the presence of composite values within a set, or extracting all values within a set matching some pattern. For example:
s := {[1, 2], [1, 4], [2, 6]}
s[[1, 2]]
[
1,
2
]
s[[1, x]]
+---+-----------+
| x | s[[1, x]] |
+---+-----------+
| 2 | [1,2] |
| 4 | [1,4] |
+---+-----------+
Multiple Expressions
Rules are often written in terms of multiple expressions that contain references to documents. In the following example, the rule defines a set of arrays where each array contains an application name and a hostname of a server where the application is deployed.
apps_and_hostnames[[name, hostname]] {
some i, j, k
name := apps[i].name
server := apps[i].servers[_]
sites[j].servers[k].name == server
hostname := sites[j].servers[k].hostname
}
The result:
apps_and_hostnames[x]
+----------------------+-----------------------+
| x | apps_and_hostnames[x] |
+----------------------+-----------------------+
| ["web","hydrogen"] | ["web","hydrogen"] |
| ["web","helium"] | ["web","helium"] |
| ["web","beryllium"] | ["web","beryllium"] |
| ["web","boron"] | ["web","boron"] |
| ["web","nitrogen"] | ["web","nitrogen"] |
| ["mysql","lithium"] | ["mysql","lithium"] |
| ["mysql","carbon"] | ["mysql","carbon"] |
| ["mongodb","oxygen"] | ["mongodb","oxygen"] |
+----------------------+-----------------------+
Don’t worry about understanding everything in this example right now. There are just two important points:
- Several variables appear more than once in the body. When a variable is used in multiple locations, OPA will only produce documents for the rule with the variable bound to the same value in all expressions.
- The rule is joining the
apps
andsites
documents implicitly. In Rego (and other languages based on Datalog), joins are implicit.
Self-Joins
Using a different key on the same array or object provides the equivalent of self-join in SQL. For example, the following rule defines a document containing apps deployed on the same site as "mysql"
:
same_site[apps[k].name] {
some i, j, k
apps[i].name == "mysql"
server := apps[i].servers[_]
server == sites[j].servers[_].name
other_server := sites[j].servers[_].name
server != other_server
other_server == apps[k].servers[_]
}
The result:
same_site[x]
+-------+--------------+
| x | same_site[x] |
+-------+--------------+
| "web" | "web" |
| "web" | "web" |
| "web" | "web" |
| "web" | "web" |
+-------+--------------+
Comprehensions
Comprehensions provide a concise way of building Composite Values from sub-queries.
Like Rules, comprehensions consist of a head and a body. The body of a comprehension can be understood in exactly the same way as the body of a rule, that is, one or more expressions that must all be true in order for the overall body to be true. When the body evaluates to true, the head of the comprehension is evaluated to produce an element in the result.
The body of a comprehension is able to refer to variables defined in the outer body. For example:
region := "west"
names := [name | sites[i].region == region; name := sites[i].name]
+-----------------+--------+
| names | region |
+-----------------+--------+
| ["smoke","dev"] | "west" |
+-----------------+--------+
In the above query, the second expression contains an Array Comprehension that refers to the region
variable. The region variable will be bound in the outer body.
When a comprehension refers to a variable in an outer body, OPA will reorder expressions in the outer body so that variables referred to in the comprehension are bound by the time the comprehension is evaluated.
Comprehensions are similar to the same constructs found in other languages like Python. For example, we could write the above comprehension in Python as follows:
# Python equivalent of Rego comprehension shown above.
names = [site.name for site in sites if site.region == "west"]
Comprehensions are often used to group elements by some key. A common use case for comprehensions is to assist in computing aggregate values (e.g., the number of containers running on a host).
Array Comprehensions
Array Comprehensions build array values out of sub-queries. Array Comprehensions have the form:
[ <term> | <body> ]
For example, the following rule defines an object where the keys are application names and the values are hostnames of servers where the application is deployed. The hostnames of servers are represented as an array.
app_to_hostnames[app_name] = hostnames {
app := apps[_]
app_name := app.name
hostnames := [hostname | name := app.servers[_]
s := sites[_].servers[_]
s.name == name
hostname := s.hostname]
}
The result:
app_to_hostnames[app]
+-----------+------------------------------------------------------+
| app | app_to_hostnames[app] |
+-----------+------------------------------------------------------+
| "web" | ["hydrogen","helium","beryllium","boron","nitrogen"] |
| "mysql" | ["lithium","carbon"] |
| "mongodb" | ["oxygen"] |
+-----------+------------------------------------------------------+
Object Comprehensions
Object Comprehensions build object values out of sub-queries. Object Comprehensions have the form:
{ <key>: <term> | <body> }
We can use Object Comprehensions to write the rule from above as a comprehension instead:
app_to_hostnames := {app.name: hostnames |
app := apps[_]
hostnames := [hostname |
name := app.servers[_]
s := sites[_].servers[_]
s.name == name
hostname := s.hostname]
}
The result is the same:
app_to_hostnames[app]
+-----------+------------------------------------------------------+
| app | app_to_hostnames[app] |
+-----------+------------------------------------------------------+
| "web" | ["hydrogen","helium","beryllium","boron","nitrogen"] |
| "mysql" | ["lithium","carbon"] |
| "mongodb" | ["oxygen"] |
+-----------+------------------------------------------------------+
Object comprehensions are not allowed to have conflicting entries, similar to rules:
{"foo": y | z := [1, 2, 3]; y := z[_] }
"foo": eval_conflict_error: object keys must be unique
Set Comprehensions
Set Comprehensions build set values out of sub-queries. Set Comprehensions have the form:
{ <term> | <body> }
For example, to construct a set from an array:
a := [1, 2, 3, 4, 3, 4, 3, 4, 5]
b := {x | x = a[_]}
+---------------------+-------------+
| a | b |
+---------------------+-------------+
| [1,2,3,4,3,4,3,4,5] | [1,2,3,4,5] |
+---------------------+-------------+
Rules
Rules define the content of Virtual Documents in OPA. When OPA evaluates a rule, we say OPA generates the content of the document that is defined by the rule.
The sample code in this section make use of the data defined in Examples.
Generating Sets
The following rule defines a set containing the hostnames of all servers:
hostnames[name] { name := sites[_].servers[_].hostname }
When we query for the content of hostnames
we see the same data as we would if we queried using the sites[_].servers[_].hostname
reference directly:
hostnames[name]
+-------------+-----------------+
| name | hostnames[name] |
+-------------+-----------------+
| "hydrogen" | "hydrogen" |
| "helium" | "helium" |
| "lithium" | "lithium" |
| "beryllium" | "beryllium" |
| "boron" | "boron" |
| "carbon" | "carbon" |
| "nitrogen" | "nitrogen" |
| "oxygen" | "oxygen" |
+-------------+-----------------+
This example introduces a few important aspects of Rego.
First, the rule defines a set document where the contents are defined by the variable name
. We know this rule defines a set document because the head only includes a key. All rules have the following form (where key, value, and body are all optional):
<name> <key>? <value>? <body>?
For a more formal definition of the rule syntax, see the Language Reference document.
Second, the sites[_].servers[_].hostname
fragment selects the hostname
attribute from all of the objects in the servers
collection. From reading the fragment in isolation we cannot tell whether the fragment refers to arrays or objects. We only know that it refers to a collections of values.
Third, the name := sites[_].servers[_].hostname
expression binds the value of the hostname
attribute to the variable name
, which is also declared in the head of the rule.
Generating Objects
Rules that define objects are very similar to rules that define sets.
apps_by_hostname[hostname] = app {
some i
server := sites[_].servers[_]
hostname := server.hostname
apps[i].servers[_] = server.name
app := apps[i].name
}
The rule above defines an object that maps hostnames to app names. The main difference between this rule and one which defines a set is the rule head: in addition to declaring a key, the rule head also declares a value for the document.
The result:
apps_by_hostname["helium"]
"web"
Incremental Definitions
A rule may be defined multiple times with the same name. When a rule is defined this way, we refer to the rule definition as incremental because each definition is additive. The document produced by incrementally defined rules is the union of the documents produced by each individual rule.
For example, we can write a rule that abstracts over our servers
and
containers
data as instances
:
instances[instance] {
server := sites[_].servers[_]
instance := {"address": server.hostname, "name": server.name}
}
instances[instance] {
container := containers[_]
instance := {"address": container.ipaddress, "name": container.name}
}
If the head of the rule is same, we can chain multiple rule bodies together to obtain the same result. We don’t recommend using this form anymore.
instances[instance] {
server := sites[_].servers[_]
instance := {"address": server.hostname, "name": server.name}
} {
container := containers[_]
instance := {"address": container.ipaddress, "name": container.name}
}
An incrementally defined rule can be intuitively understood as <rule-1> OR <rule-2> OR ... OR <rule-N>
.
The result:
instances[x]
+-----------------------------------------------+-----------------------------------------------+
| x | instances[x] |
+-----------------------------------------------+-----------------------------------------------+
| {"address":"hydrogen","name":"web-0"} | {"address":"hydrogen","name":"web-0"} |
| {"address":"helium","name":"web-1"} | {"address":"helium","name":"web-1"} |
| {"address":"lithium","name":"db-0"} | {"address":"lithium","name":"db-0"} |
| {"address":"beryllium","name":"web-1000"} | {"address":"beryllium","name":"web-1000"} |
| {"address":"boron","name":"web-1001"} | {"address":"boron","name":"web-1001"} |
| {"address":"carbon","name":"db-1000"} | {"address":"carbon","name":"db-1000"} |
| {"address":"nitrogen","name":"web-dev"} | {"address":"nitrogen","name":"web-dev"} |
| {"address":"oxygen","name":"db-dev"} | {"address":"oxygen","name":"db-dev"} |
| {"address":"10.0.0.1","name":"big_stallman"} | {"address":"10.0.0.1","name":"big_stallman"} |
| {"address":"10.0.0.2","name":"cranky_euclid"} | {"address":"10.0.0.2","name":"cranky_euclid"} |
+-----------------------------------------------+-----------------------------------------------+
Complete Definitions
In addition to rules that partially define sets and objects, Rego also supports so-called complete definitions of any type of document. Rules provide a complete definition by omitting the key in the head. Complete definitions are commonly used for constants:
pi := 3.14159
Rego allows authors to omit the body of rules. If the body is omitted, it defaults to true.
Documents produced by rules with complete definitions can only have one value at a time. If evaluation produces multiple values for the same document, an error will be returned.
For example:
# Define user "bob" for test input.
user := "bob"
# Define two sets of users: power users and restricted users. Accidentally
# include "bob" in both.
power_users := {"alice", "bob", "fred"}
restricted_users := {"bob", "kim"}
# Power users get 32GB memory.
max_memory = 32 { power_users[user] }
# Restricted users get 4GB memory.
max_memory = 4 { restricted_users[user] }
Error:
module.rego:15: eval_conflict_error: complete rules must not produce multiple outputs
OPA returns an error in this case because the rule definitions are in conflict. The value produced by max_memory cannot be 32 and 4 at the same time.
The documents produced by rules with complete definitions may still be undefined:
max_memory with user as "johnson"
undefined decision
In some cases, having an undefined result for a document is not desirable. In those cases, policies can use the Default Keyword to provide a fallback value.
Like variables declared in rules, there can be at most one complete definition
name declared with the :=
operator per package. The compiler checks for
redeclaration of complete definitions with the :=
operator:
package example
pi := 3.14
# some other rules...
pi := 3.14156 # Redeclaration error because 'pi' already declared above.
1 error occurred: module.rego:3: rego_type_error: rule named pi redeclared at module.rego:7
Functions
Rego supports user-defined functions that can be called with the same semantics as Built-in Functions. They have access to both the the data Document and the input Document.
For example, the following function will return the result of trimming the spaces from a string and then splitting it by periods.
trim_and_split(s) = x {
t := trim(s, " ")
x := split(t, ".")
}
trim_and_split(" foo.bar ")
[
"foo",
"bar"
]
Functions may have an arbitrary number of inputs, but exactly one output. Function arguments may be any kind of term. For example, suppose we have the following function:
foo([x, {"bar": y}]) = z {
z := {x: y}
}
The following calls would produce the logical mappings given:
Call | x | y |
---|---|---|
z := foo(a) | a[0] | a[1].bar |
z := foo(["5", {"bar": "hello"}]) | "5" | "hello" |
z := foo(["5", {"bar": [1, 2, 3, ["foo", "bar"]]}]) | "5" | [1, 2, 3, ["foo", "bar"]] |
If you need multiple outputs, write your functions so that the output is an array, object or set containing your results. If the output term is omitted, it is equivalent to having the output term be the literal true
. That is, the function declarations below are equivalent:
f(x) {
x == "foo"
}
f(x) = true {
x == "foo"
}
The outputs of user functions have some additional limitations, namely that they must resolve to a single value. If you write a function that has multiple possible bindings for an output variable, you will get a conflict error:
p(x) = y {
y := x[_]
}
p([1, 2, 3])
module.rego:3: eval_conflict_error: functions must not produce multiple outputs for same inputs
It is possible in Rego to define a function more than once, to achieve a conditional selection of which function to execute:
Functions can be defined incrementally.
q(1, x) = y {
y := x
}
q(2, x) = y {
y := x*4
}
q(1, 2)
2
q(2, 2)
8
A given function call will execute all functions that match the signature given. If a call matches multiple functions, they must produce the same output, or else a conflict error will occur:
r(1, x) = y {
y := x
}
r(x, 2) = y {
y := x*4
}
r(1, 2)
module.rego:7: eval_conflict_error: functions must not produce multiple outputs for same inputs
On the other hand, if a call matches no functions, then the result is undefined.
s(x, 2) = y {
y := x*4
}
s(5, 2)
20
s(5, 3)
undefined decision
Negation
To generate the content of a Virtual Document, OPA attempts to bind variables in the body of the rule such that all expressions in the rule evaluate to True.
This generates the correct result when the expressions represent assertions about what states should exist in the data stored in OPA. In some cases, you want to express that certain states should not exist in the data stored in OPA. In these cases, negation must be used.
For safety, a variable appearing in a negated expression must also appear in another non-negated equality expression in the rule.
OPA will reorder expressions to ensure that negated expressions are evaluated after other non-negated expressions with the same variables. OPA will reject rules containing negated expressions that do not meet the safety criteria described above.
The simplest use of negation involves only scalar values or variables and is equivalent to complementing the operator:
t {
greeting := "hello"
not greeting == "goodbye"
}
The result:
t
true
Negation is required to check whether some value does not exist in a collection. That is, complementing the operator in an expression such as p[_] == "foo"
yields p[_] != "foo"
. However, this is not equivalent to not p["foo"]
.
For example, we can write a rule that defines a document containing names of apps not deployed on the "prod"
site:
prod_servers[name] {
site := sites[_]
site.name == "prod"
name := site.servers[_].name
}
apps_in_prod[name] {
app := apps[_]
server := app.servers[_]
prod_servers[server]
name := app.name
}
apps_not_in_prod[name] {
name := apps[_].name
not apps_in_prod[name]
}
The result:
apps_not_in_prod[name]
+-----------+------------------------+
| name | apps_not_in_prod[name] |
+-----------+------------------------+
| "mongodb" | "mongodb" |
+-----------+------------------------+
Universal Quantification (FOR ALL)
Like SQL, Rego does not have a direct way to express universal quantification (“FOR ALL”). However, like SQL, you can use other language primitives (e.g., Negation) to express FOR ALL. For example, imagine you want to express a policy that says (in English):
There must be no apps named "bitcoin-miner".
A common mistake is to try encoding the policy with a rule named
no_bitcoin_miners
like so:
no_bitcoin_miners {
app := apps[_]
app.name != "bitcoin-miner" # THIS IS NOT CORRECT.
}
It becomes clear that this is incorrect when you use the some
keyword, because the rule is true whenever there is SOME app that is not a
bitcoin-miner:
no_bitcoin_miners {
some i
app := apps[i]
app.name != "bitcoin-miner"
}
You can confirm this by querying the rule:
no_bitcoin_miners with apps as [{"name": "bitcoin-miner"}, {"name": "web"}]
true
The reason the rule is incorrect is that variables in Rego are existentially
quantified. This means that rule bodies and queries express FOR ANY and not FOR
ALL. To express FOR ALL in Rego complement the logic in the rule body (e.g.,
!=
becomes ==
) and then complement the check using negation (e.g.,
no_bitcoin_miners
becomes not any_bitcoin_miners
).
For this policy, you define a rule that finds if there exists a bitcoin-mining
app (which is easy using the some
keyword). And then you use negation to check
that there is NO bitcoin-mining app. Technically, you’re using 2 negations and
an existential quantifier, which is logically the same as a universal
quantifier.
For example:
no_bitcoin_miners_using_negation {
not any_bitcoin_miners
}
any_bitcoin_miners {
some i
app := apps[i]
app.name == "bitcoin-miner"
}
no_bitcoin_miners_using_negation with apps as [{"name": "web"}]
true
no_bitcoin_miners_using_negation with apps as [{"name": "bitcoin-miner"}, {"name": "web"}]
undefined decision
The
undefined
result above is expected because we did not define a default value forno_bitcoin_miners_using_negation
. Since the body of the rule fails to match, there is no value generated.
Alternatively, we can implement the same kind of logic inside a single rule using Comprehensions.
no_bitcoin_miners_using_comprehension {
bitcoin_miners := {app | app := apps[_]; app.name == "bitcoin-miner"}
count(bitcoin_miners) == 0
}
Whether you use negation or comprehensions to express FOR ALL is up to you. The comprehension version is more concise and does not require a helper rule while the negation version is more verbose but a bit simpler and allows for more complex ORs.
Modules
In Rego, policies are defined inside modules. Modules consist of:
Modules are typically represented in Unicode text and encoded in UTF-8.
Comments
Comments begin with the #
character and continue until the end of the line.
Packages
Packages group the rules defined in one or more modules into a particular namespace. Because rules are namespaced they can be safely shared across projects.
Modules contributing to the same package do not have to be located in the same directory.
The rules defined in a module are automatically exported. That is, they can be queried under OPA’s Data API provided the appropriate package is given. For example, given the following module:
package opa.examples
pi := 3.14159
The pi
document can be queried via the Data API:
GET https://example.com/v1/data/opa/examples/pi HTTP/1.1
Imports
Import statements declare dependencies that modules have on documents defined outside the package. By importing a document, the identifiers exported by that document can be referenced within the current module.
All modules contain implicit statements which import the data
and input
documents.
Modules use the same syntax to declare dependencies on Base Documents and Virtual Documents.
package opa.examples
import data.servers
http_servers[server] {
server := servers[_]
server.protocols[_] == "http"
}
Similarly, modules can declare dependencies on query arguments by specifying an import path that starts with input
.
package opa.examples
import input.user
import input.method
# allow alice to perform any operation.
allow { user == "alice" }
# allow bob to perform read-only operations.
allow {
user == "bob"
method == "GET"
}
# allows users assigned a "dev" role to perform read-only operations.
allow {
method == "GET"
data.roles["dev"][_] == input.user
}
Imports can include an optional as
keyword to handle namespacing issues:
package opa.examples
import data.servers as my_servers
http_servers[server] {
server := my_servers[_]
server.protocols[_] == "http"
}
Some Keyword
The some
keyword allows queries to explicitly declare local variables. Use the
some
keyword in rules that contain unification statements or references with
variable operands if variables contained in those statements are not
declared using :=
.
Statement | Example | Variables |
---|---|---|
Unification | input.a = [["b", x], [y, "c"]] | x and y |
Reference with variable operands | data.foo[i].bar[j] | i and j |
For example, the following rule generates tuples of array indices for servers in the “west” region that contain “db” in their name. The first element in the tuple is the site index and the second element is the server index.
tuples[[i, j]] {
some i, j
sites[i].region == "west"
server := sites[i].servers[j] # note: 'server' is local because it's declared with :=
contains(server.name, "db")
}
If we query for the tuples we get two results:
[
[
1,
2
],
[
2,
1
]
]
Since we have declared i
, j
, and server
to be local, we can introduce
rules in the same package without affecting the result above:
# Define a rule called 'i'
i := 1
If we had not declared i
with the some
keyword, introducing the i
rule
above would have changed the result of tuples
because the i
symbol in the
body would capture the global value. Try removing some i, j
and see what happens!
The some
keyword is not required but it’s recommended to avoid situations like
the one above where introduction of a rule inside a package could change
behaviour of other rules.
With Keyword
The with
keyword allows queries to programmatically specify values nested
under the input Document and the data Document.
For example, given the simple authorization policy in the Imports section, we can write a query that checks whether a particular request would be allowed:
allow with input as {"user": "alice", "method": "POST"}
true
allow with input as {"user": "bob", "method": "GET"}
true
not allow with input as {"user": "bob", "method": "DELETE"}
true
allow with input as {"user": "charlie", "method": "GET"} with data.roles as {"dev": ["charlie"]}
true
not allow with input as {"user": "charlie", "method": "GET"} with data.roles as {"dev": ["bob"]}
true
The with
keyword acts as a modifier on expressions. A single expression is
allowed to have zero or more with
modifiers. The with
keyword has the
following syntax:
<expr> with <target-1> as <value-1> [with <target-2> as <value-2> [...]]
The <target>
s must be references to values in the input document (or the input
document itself) or data document.
When applied to the
data
document, the<target>
must not attempt to partially define virtual documents. For example, given a virtual document at pathdata.foo.bar
, the compiler will generate an error if the policy attempts to replacedata.foo.bar.baz
.
The with
keyword only affects the attached expression. Subsequent expressions
will see the unmodified value. The exception to this rule is when multiple
with
keywords are in-scope like below:
inner := [x, y] {
x := input.foo
y := input.bar
}
middle := [a, b] {
a := inner with input.foo as 100
b := input
}
outer := result {
result := middle with input as {"foo": 200, "bar": 300}
}
Default Keyword
The default
keyword allows policies to define a default value for documents
produced by rules with Complete Definitions. The
default value is used when all of the rules sharing the same name are undefined.
For example:
default allow = false
allow {
input.user == "bob"
input.method == "GET"
}
allow {
input.user == "alice"
}
When the allow
document is queried, the return value will be either true
or false
.
{
"user": "bob",
"method": "POST"
}
false
Without the default definition, the allow
document would simply be undefined for the same input.
When the default
keyword is used, the rule syntax is restricted to:
default <name> = <term>
The term may be any scalar, composite, or comprehension value but it may not be a variable or reference. If the value is a composite then it may not contain variables or references.
Else Keyword
The else
keyword is a basic control flow construct that gives you control
over rule evaluation order.
Rules grouped together with the else
keyword are evaluated until a match is
found. Once a match is found, rule evaluation does not proceed to rules further
in the chain.
The else
keyword is useful if you are porting policies into Rego from an
order-sensitive system like IPTables.
authorize = "allow" {
input.user == "superuser" # allow 'superuser' to perform any operation.
} else = "deny" {
input.path[0] == "admin" # disallow 'admin' operations...
input.source_network == "external" # from external networks.
} # ... more rules
In the example below, evaluation stops immediately after the first rule even though the input matches the second rule as well.
{
"path": [
"admin",
"exec_shell"
],
"source_network": "external",
"user": "superuser"
}
"allow"
In the next example, the input matches the second rule (but not the first) so evaluation continues to the second rule before stopping.
{
"path": [
"admin",
"exec_shell"
],
"source_network": "external",
"user": "alice"
}
"deny"
The else
keyword may be used repeatedly on the same rule and there is no
limit imposed on the number of else
clauses on a rule.
Operators
Equality: Assignment, Comparison, and Unification
Rego supports three kinds of equality: assignment (:=
), comparison (==
), and unification =
. Both assignment (:=
) and comparison (==
) are only available inside of rules (and in the REPL), and we recommend using them whenever possible for policies that are easier to read and write.
Assignment :=
The assignment operator (:=
) is used to define local variables inside of a rule. Assigned variables are locally scoped to that rule and shadow global variables.
x := 100
p {
x := 1 # declare local variable 'x' and assign value 1
x != 100 # true because 'x' refers to local variable
}
Assigned variables are not allowed to appear before the assignment in the query. For example, the following policy will not compile:
p {
x != 100
x := 1 # error because x appears earlier in the query.
}
q {
x := 1
x := 2 # error because x is assigned twice.
}
2 errors occurred
module.rego:5: rego_compile_error: var x referenced above
module.rego:10: rego_compile_error: var x assigned above
Comparison ==
Comparison checks if two values are equal within a rule. If the left or right hand side contains a variable that has not been assigned a value, the compiler throws an error.
p {
x := 100
x == 100 # true because x refers to the local variable
}
{
"p": true
}
y := 100
q {
y == 100 # true because y refers to the global variable
}
{
"q": true,
"y": 100
}
r {
z == 100 # compiler error because z has not been assigned a value
}
1 error occurred: module.rego:4: rego_unsafe_var_error: var z is unsafe
Unification =
Unification (=
) combines assignment and comparison. Rego will assign variables to values that make the comparison true. Unification lets you ask for values for variables that make an expression true.
# Find values for x and y that make the equality true
[x, "world"] = ["hello", y]
+---------+---------+
| x | y |
+---------+---------+
| "hello" | "world" |
+---------+---------+
sites[i].servers[j].name = apps[k].servers[m]
+---+---+---+---+
| i | j | k | m |
+---+---+---+---+
| 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 2 | 1 | 0 |
| 1 | 0 | 0 | 2 |
| 1 | 1 | 0 | 3 |
| 1 | 2 | 1 | 1 |
| 2 | 0 | 0 | 4 |
| 2 | 1 | 2 | 0 |
+---+---+---+---+
Best Practices for Equality
Here is a comparison of the three forms of equality.
Equality Applicable Compiler Errors Use Case
-------- ----------- ------------------------- ----------------------
:= Inside rule Var already assigned Assign local variable
== Inside rule Var not assigned Compare values
= Everywhere Values cannot be computed Express query
Best practice is to use assignment :=
and comparison ==
wherever possible. The additional compiler checks help avoid errors when writing policy, and the additional syntax helps make the intent clearer when reading policy.
Under the hood :=
and ==
are syntactic sugar for =
, local variable creation, and additional compiler checks.
Comparison Operators
The following comparison operators are supported:
a == b # `a` is equal to `b`.
a != b # `a` is not equal to `b`.
a < b # `a` is less than `b`.
a <= b # `a` is less than or equal to `b`.
a > b # `a` is greater than `b`.
a >= b # `a` is greater than or equal to `b`.
None of these operators bind variables contained in the expression. As a result, if either operand is a variable, the variable must appear in another expression in the same rule that would cause the variable to be bound, i.e., an equality expression or the target position of a built-in function.
Built-in Functions
In some cases, rules must perform simple arithmetic, aggregation, and so on. Rego provides a number of built-in functions (or “built-ins”) for performing these tasks.
Built-ins can be easily recognized by their syntax. All built-ins have the following form:
<name>(<arg-1>, <arg-2>, ..., <arg-n>)
Built-ins usually take one or more input values and produce one output value. Unless stated otherwise, all built-ins accept values or variables as output arguments.
If a built-in function is invoked with a variable as input, the variable must be safe, i.e., it must be assigned elsewhere in the query.
Built-ins can include “.” characters in the name. This allows them to be
namespaced. If you are adding custom built-ins to OPA, consider namespacing
them to avoid naming conflicts, e.g., org.example.special_func
.
See the Language Reference document for details on each built-in function.
Example Data
The rules below define the content of documents describing a simplistic deployment environment. These documents are referenced in other sections above.
sites := [
{
"region": "east",
"name": "prod",
"servers": [
{
"name": "web-0",
"hostname": "hydrogen"
},
{
"name": "web-1",
"hostname": "helium"
},
{
"name": "db-0",
"hostname": "lithium"
}
]
},
{
"region": "west",
"name": "smoke",
"servers": [
{
"name": "web-1000",
"hostname": "beryllium"
},
{
"name": "web-1001",
"hostname": "boron"
},
{
"name": "db-1000",
"hostname": "carbon"
}
]
},
{
"region": "west",
"name": "dev",
"servers": [
{
"name": "web-dev",
"hostname": "nitrogen"
},
{
"name": "db-dev",
"hostname": "oxygen"
}
]
}
]
apps := [
{
"name": "web",
"servers": ["web-0", "web-1", "web-1000", "web-1001", "web-dev"]
},
{
"name": "mysql",
"servers": ["db-0", "db-1000"]
},
{
"name": "mongodb",
"servers": ["db-dev"]
}
]
containers := [
{
"image": "redis",
"ipaddress": "10.0.0.1",
"name": "big_stallman"
},
{
"image": "nginx",
"ipaddress": "10.0.0.2",
"name": "cranky_euclid"
}
]