Data Map
The Data Map is an inventory of your data locations that allows you to establish a short name, called a data label, for each data location (like a table, collection, or S3 bucket) that you want to protect. Optionally, you can use tags to group or categorize your data labels.
When you write a policy rule, you use data labels and tags, rather than specific table and column names, to specify which data the rule protects.
Because a single data label or tag can refer to many locations in many repositories (databases, S3 storage locations, and so on), the Data Map gives you the ability to write a policy that treats your data consistently, even when that data is spread across many data repositories.
Once you've created a data label or tag in your Data Map, you can use
it in the governedData
section of a global policy.
Structure
The Data Map follows this structure:
{ LABEL }:
attributes: [{ ATTRIBUTE_LOCATION }, ...]
And the fields are defined as follows:
{LABEL}
: A data label given to the data specified in the corresponding list. The label name is case sensitive; when you write your policy, take care to write it exactly as it is written here.attributes
: A list of data locations in this repository that will be included in this data label. Specify each attribute using the format,{SCHEMA}.{TABLE}.{FIELD}
where:SCHEMA
is the name of the schema, database, or Dremio space, if any;TABLE
is the name of the database table, MongoDB collection, or AWS bucketFIELD
is the database column name or AWS object key. Omit the AWS object key name if you want to protect the whole bucket.note
The
SCHEMA
,TABLE
, andFIELD
names describing your data location are case insensitive. For example, if you include a table name,orders
, in your label, and you have tables calledorders
andOrders
, then both will be covered by the label.
info
Optionally, you can use tags to group your data labels.
tip
Dremio users: When you refer to data in a Dremio repository,
please include the complete location, with each nested Dremio
space separated by a .
(dot). For example, an attribute my_attr
contained by table my_tbl
within space inner_space
within
space outer_space
would be referenced as
outer_space.inner_space.my_tbl.my_attr
.
Data Map example for a database
In the below example, we assign data labels to data in two databases, claims
and loans
. The data label CCN
is assigned to the attribute bank_card
in the
table customers
in the finance
schema of the claims
database
as well as the attribute credit_card_number
in the table borrowers
in the applications
schema of the loans
database. The data labels
EMAIL
and SSN
are also assigned to email and social security
number data from each database, respectively, following the same pattern.
In summary, the data label assignments will be:
Database | Schema | Table | Column | Data label applied |
---|---|---|---|---|
claims | finance | customers | bank_card | CCN |
claims | finance | customers | EMAIL | |
claims | finance | customers | ssn | SSN |
loans | applications | borrowers | credit_card_number | CCN |
loans | applications | borrowers | EMAIL | |
loans | applications | borrowers | social_security_number | SSN |
Example Data Map for the claims
database:
CCN:
attributes:
- finance.customers.bank_card
EMAIL:
attributes:
- finance.customers.email
SSN:
attributes:
- finance.customers.ssn
Example Data Map for the loans
database:
CCN:
attributes:
- applications.borrowers.credit_card_number
EMAIL:
attributes:
- applications.borrowers.email
SSN:
attributes:
- applications.borrowers.social_security_number
The example global policy shows how these
data labels are used in a policy. In the example, we show a sample policy that sets access rules
for each of these data labels (CCN
, EMAIL
, and SSN
). The policy
applies to all repositories included in the Data Map.
Data Map example for S3
Prerequisite: Before you set up your Data Map entries for S3, make sure you have tracked your S3 locations in Cyral.
Example Data Map for S3:
FUNDING_BUCKET_ALL:
attributes:
- finance-funding
FUNDING_2023_EVENTS:
attributes:
- finance-funding.2023.event
In the above example, for an S3 repository that you've tracked in Cyral:
- The data label
FUNDING_BUCKET_ALL
can be used to write policies that govern access to an entire S3 bucket, meaning it will cover all keys (files and folders) inside the designated bucket. In this example, that's thefinance-funding
bucket. - The data label
FUNDING_2023_EVENTS
can be used to write policies that govern access to a specific S3 key, which could be a single file or a folder. In this example,2023.event
designates a specific folder in thefinance-funding
bucket.
Handling S3 bucket names and object key names that contain special characters
If your S3 buckets or object key names contain forward slashes (/
) or
dots (.
), follow these rules when you define labels:
If your S3 object key uses forward slashes (
/
)to represent a folder structure, convert each slash to a dot (.
).If any S3 bucket name or object key name itself contains a dot (
.
) character, you must surround that bucket name or object key name in double quotes.caution
If you manage your Data Map in the Data Labels ➡️ View as YAML window of the Cyral control plane UI, then, for any S3 resource label declaration that contains double quotes, you must surround the entire resource string in single quotes.
Below, we show some examples.
S3 folders with forward-slash characters in their names
If you wish to protect the contents of the S3 folder
2022/participants
in the financefunding
bucket, the S3 resource
label declaration in your Data Map would look like this:
SAMPLE_LABEL_1:
attributes:
- financefunding.2022.participants
To protect just the file, disclosures.pdf
, which resides in the same
S3 bucket and folder, the label declaration would be:
SAMPLE_LABEL_2:
attributes:
- 'financefunding.2022.participants."disclosures.pdf"'
S3 locations with dots in their names
To protect an S3 bucket called financefunding.euro
(in this case,
the dot is part of the bucket name), the label declaration would be:
SAMPLE_LABEL_3:
attributes:
- '"financefunding.euro"'
To protect a sample object key 2023.event
(similar to the previous
example, the dot is part of the object key name) inside the bucket
financefunding.euro
, the label declaration would be:
SAMPLE_LABEL_4:
attributes:
- '"financefunding.euro"."2023.event"'
Tags for data labels
You can use tags to group or
categorize your data labels. Tags are optional. Once you've
established a tag, you can use the tag in your policy in a way that's
analogous to labels within the governedData
section of a global policy.
When you write a policy rule for a tag, that rule applies to all data labels that are associated with the tag.
In the example Data Map, the tag PII
applies to the labels CCN
, EMAIL
,
and SSN
. If you write a policy rule for the tag PII
, then that
rule will apply to all three labels. In addition, the example below
also applies the tag PCI
to the CCN
label.
Example Data Map with tags:
CCN:
tags:
- PCI
- PII
attributes:
- applications.borrowers.credit_card_number
EMAIL:
tags:
- PII
attributes:
- applications.borrowers.email
SSN:
tags:
- PII
attributes:
- applications.borrowers.social_security_number