constraints
Constraints are a feature of many data platforms. When specified, the platform will perform additional validation on data as it is being populated in a new table or inserted into a preexisting table. If the validation fails, the table creation or update fails, the operation is rolled back, and you will see a clear error message.
When enforced, a constraint guarantees that you will never see invalid data in the table materialized by your model. Enforcement varies significantly by data platform.
Constraints require the declaration and enforcement of a model contract.
Constraints are never applied on ephemeral
models or those materialized as view
. Only table
and incremental
models support applying and enforcing constraints.
Platform-specific support
In transactional databases, it is possible to define "constraints" on the allowed values of certain columns, stricter than just the data type of those values. For example, Postgres supports and enforces all the constraints in the ANSI SQL standard (not null
, unique
, primary key
, foreign key
), plus a flexible row-level check
constraint that evaluates to a boolean expression.
Most analytical data platforms support and enforce a not null
constraint, but they either do not support or do not enforce the rest. It is sometimes still desirable to add an "informational" constraint, knowing it is not enforced, for the purpose of integrating with legacy data catalog or entity-relation diagram tools (dbt-core#3295).
To that end, there are two optional fields you can specify on any filter:
warn_unenforced: False
to skip warning on constraints that are supported, but not enforced, by this data platform. The constraint will be included in templated DDL.warn_unsupported: False
to skip warning on constraints that aren't supported by this data platform, and therefore won't be included in templated DDL.
- Postgres
- Redshift
- Snowflake
- BigQuery
- Databricks
- PostgreSQL constraints documentation: here
{{
config(
materialized = "table"
)
}}
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: id
data_type: int
constraints:
- type: not_null
- type: primary_key
- type: check
expression: "id > 0"
- name: customer_name
data_type: text
- name: first_transaction_date
data_type: date
Expected DDL to enforce constraints:
create table "database_name"."schema_name"."constraints_example__dbt_tmp"
(
id integer not null primary key check (id > 0),
customer_name text,
first_transaction_date date
)
;
insert into "database_name"."schema_name"."constraints_example__dbt_tmp"
(
id,
customer_name,
first_transaction_date
)
(
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
);
Redshift currently only enforces not null
constraints; all other constraints are metadata only. Additionally, Redshift does not allow column checks at the time of table creation. See more in the Redshift documentation here.
{{
config(
materialized = "table"
)
}}
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: id
data_type: integer
constraints:
- type: not_null
- type: primary_key # not enforced -- will warn & include
- type: check # not supported -- will warn & skip
expression: "id > 0"
tests:
- unique # primary_key constraint is not enforced
- name: customer_name
data_type: varchar
- name: first_transaction_date
data_type: date
Note that Redshift limits the maximum length of the varchar
values to 256 characters by default (or when specified without a length). This means that any string data exceeding 256 characters might get truncated or return a "value too long for character type" error. To allow the maximum length, use varchar(max)
. For example, data_type: varchar(max)
.
Expected DDL to enforce constraints:
create table "database_name"."schema_name"."constraints_example__dbt_tmp"
(
id integer not null,
customer_name varchar,
first_transaction_date date,
primary key(id)
)
;
insert into "database_name"."schema_name"."constraints_example__dbt_tmp"
(
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
);
Snowflake suppports four types of constraints: unique
, not null
, primary key
, and foreign key
.
It is important to note that only the not null
(and the not null
property of primary key
) are actually checked at present.
The rest of the constraints are purely metadata, not verified when inserting data.
Currently, Snowflake doesn't support the check
syntax and dbt will skip the check
config and raise a warning message if it is set on some models in the dbt project.
{{
config(
materialized = "table"
)
}}
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: id
data_type: integer
description: hello
constraints:
- type: not_null
- type: primary_key # not enforced -- will warn & include
- type: check # not supported -- will warn & skip
expression: "id > 0"
tests:
- unique # need this test because primary_key constraint is not enforced
- name: customer_name
data_type: text
- name: first_transaction_date
data_type: date
Expected DDL to enforce constraints:
create or replace transient table <database>.<schema>.constraints_model
(
id integer not null primary key,
customer_name text,
first_transaction_date date
)
as
(
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
);
BigQuery allows defining and enforcing not null
constraints, and defining (but not enforcing) primary key
and foreign key
constraints (which can be used for query optimization). BigQuery does not support defining or enforcing other constraints. For more information, refer to Platform constraint support
Documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
Data types: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
{{
config(
materialized = "table"
)
}}
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: id
data_type: int
constraints:
- type: not_null
- type: primary_key # not enforced -- will warn & include
- type: check # not supported -- will warn & skip
expression: "id > 0"
tests:
- unique # primary_key constraint is not enforced
- name: customer_name
data_type: string
- name: first_transaction_date
data_type: date
Column-level constraint on nested column:
{{
config(
materialized = "table"
)
}}
select
'string' as a,
struct(
1 as id,
'name' as name,
struct(2 as id, struct('test' as again, '2' as even_more) as another) as double_nested
) as b
version: 2
models:
- name: nested_column_constraints_example
config:
contract:
enforced: true
columns:
- name: a
data_type: string
- name: b.id
data_type: integer
constraints:
- type: not_null
- name: b.name
description: test description
data_type: string
- name: b.double_nested.id
data_type: integer
- name: b.double_nested.another.again
data_type: string
- name: b.double_nested.another.even_more
data_type: integer
constraints:
- type: not_null
Expected DDL to enforce constraints:
create or replace table `<project>`.`<dataset>`.`constraints_model`
(
id integer not null,
customer_name string,
first_transaction_date date
)
as
(
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
);
Databricks allows you to define:
- a
not null
constraint - and/or additional
check
constraints, with conditional expressions including one or more columns
As Databricks does not support transactions nor allows using create or replace table
with a column schema, the table is first created without a schema, and alter
statements are then executed to add the different constraints.
This means that:
- The names and order of columns is checked but not their type
- If the
constraints
and/orconstraint_check
fails, the table with the failing data will still exist in the Warehouse
See this page with more details about the support of constraints on Databricks.
{{
config(
materialized = "table"
)
}}
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: id
data_type: int
constraints:
- type: not_null
- type: primary_key # not enforced -- will warn & include
- type: check # not supported -- will warn & skip
expression: "id > 0"
tests:
- unique # primary_key constraint is not enforced
- name: customer_name
data_type: text
- name: first_transaction_date
data_type: date
Expected DDL to enforce constraints:
create or replace table schema_name.my_model
using delta
as
select
1 as id,
'My Favorite Customer' as customer_name,
cast('2019-01-01' as date) as first_transaction_date
Followed by the statements
alter table schema_name.my_model change column id set not null;
alter table schema_name.my_model add constraint 472394792387497234 check (id > 0);