Skip to main content
Version: v4.18

Data Map

The Data Map is an inventory of your data locations that allows you to establish a short name, called a data label, for each data location (like a table, collection, or S3 bucket) that you want to protect. Optionally, you can use tags to group or categorize your data labels.

When you write a policy rule, you use data labels and tags, rather than specific table and column names, to specify which data the rule protects.

Because a single data label or tag can refer to many locations in many repositories (databases, S3 storage locations, and so on), the Data Map gives you the ability to write a policy that treats your data consistently, even when that data is spread across many data repositories.

Once you've created a data label or tag in your Data Map, you can use it in the governedData section of a global policy.

Structure

The Data Map follows this structure:

{ LABEL }:
attributes: [{ ATTRIBUTE_LOCATION }, ...]

And the fields are defined as follows:

  • {LABEL}: A data label given to the data specified in the corresponding list. The label name is case sensitive; when you write your policy, take care to write it exactly as it is written here.
  • attributes: A list of data locations in this repository that will be included in this data label. Specify each attribute using the format, {SCHEMA}.{TABLE}.{FIELD} where:
    • SCHEMA is the name of the schema, database, or Dremio space, if any;
    • TABLE is the name of the database table, MongoDB collection, or AWS bucket
    • FIELD is the database column name or AWS object key. Omit the AWS object key name if you want to protect the whole bucket.
      note

      The SCHEMA, TABLE, and FIELD names describing your data location are case insensitive. For example, if you include a table name, orders, in your label, and you have tables called orders and Orders, then both will be covered by the label.

info
tip

Dremio users: When you refer to data in a Dremio repository, please include the complete location, with each nested Dremio space separated by a . (dot). For example, an attribute my_attr contained by table my_tbl within space inner_space within space outer_space would be referenced as outer_space.inner_space.my_tbl.my_attr.

Data Map example for a database

In the below example, we assign data labels to data in two databases, claims and loans. The data label CCN is assigned to the attribute bank_card in the table customers in the finance schema of the claims database as well as the attribute credit_card_number in the table borrowers in the applications schema of the loans database. The data labels EMAIL and SSN are also assigned to email and social security number data from each database, respectively, following the same pattern.

In summary, the data label assignments will be:

DatabaseSchemaTableColumnData label applied
claimsfinancecustomersbank_cardCCN
claimsfinancecustomersemailEMAIL
claimsfinancecustomersssnSSN
loansapplicationsborrowerscredit_card_numberCCN
loansapplicationsborrowersemailEMAIL
loansapplicationsborrowerssocial_security_numberSSN

Example Data Map for the claims database:

CCN:
attributes:
- finance.customers.bank_card
EMAIL:
attributes:
- finance.customers.email
SSN:
attributes:
- finance.customers.ssn

Example Data Map for the loans database:

CCN:
attributes:
- applications.borrowers.credit_card_number
EMAIL:
attributes:
- applications.borrowers.email
SSN:
attributes:
- applications.borrowers.social_security_number

The example global policy shows how these data labels are used in a policy. In the example, we show a sample policy that sets access rules for each of these data labels (CCN, EMAIL, and SSN). The policy applies to all repositories included in the Data Map.

Data Map example for S3

Prerequisite: Before you set up your Data Map entries for S3, make sure you have tracked your S3 locations in Cyral.

Example Data Map for S3:

FUNDING_BUCKET_ALL:
attributes:
- finance-funding

FUNDING_2023_EVENTS:
attributes:
- finance-funding.2023.event

In the above example, for an S3 repository that you've tracked in Cyral:

  • The data label FUNDING_BUCKET_ALL can be used to write policies that govern access to an entire S3 bucket, meaning it will cover all keys (files and folders) inside the designated bucket. In this example, that's the finance-funding bucket.
  • The data label FUNDING_2023_EVENTS can be used to write policies that govern access to a specific S3 key, which could be a single file or a folder. In this example, 2023.event designates a specific folder in the finance-funding bucket.

Handling S3 bucket names and object key names that contain special characters

If your S3 buckets or object key names contain forward slashes (/) or dots (.), follow these rules when you define labels:

  1. If your S3 object key uses forward slashes (/)to represent a folder structure, convert each slash to a dot (.).

  2. If any S3 bucket name or object key name itself contains a dot (.) character, you must surround that bucket name or object key name in double quotes.

    caution

    If you manage your Data Map in the Data Labels ➡️ View as YAML window of the Cyral control plane UI, then, for any S3 resource label declaration that contains double quotes, you must surround the entire resource string in single quotes.

Below, we show some examples.

S3 folders with forward-slash characters in their names

If you wish to protect the contents of the S3 folder 2022/participants in the financefunding bucket, the S3 resource label declaration in your Data Map would look like this:

SAMPLE_LABEL_1:
attributes:
- financefunding.2022.participants

To protect just the file, disclosures.pdf, which resides in the same S3 bucket and folder, the label declaration would be:

SAMPLE_LABEL_2:
attributes:
- 'financefunding.2022.participants."disclosures.pdf"'

S3 locations with dots in their names

To protect an S3 bucket called financefunding.euro (in this case, the dot is part of the bucket name), the label declaration would be:

SAMPLE_LABEL_3:
attributes:
- '"financefunding.euro"'

To protect a sample object key 2023.event (similar to the previous example, the dot is part of the object key name) inside the bucket financefunding.euro, the label declaration would be:

SAMPLE_LABEL_4:
attributes:
- '"financefunding.euro"."2023.event"'

Tags for data labels

You can use tags to group or categorize your data labels. Tags are optional. Once you've established a tag, you can use the tag in your policy in a way that's analogous to labels within the governedData section of a global policy.

When you write a policy rule for a tag, that rule applies to all data labels that are associated with the tag.

In the example Data Map, the tag PII applies to the labels CCN, EMAIL, and SSN. If you write a policy rule for the tag PII, then that rule will apply to all three labels. In addition, the example below also applies the tag PCI to the CCN label.

Example Data Map with tags:

CCN:
tags:
- PCI
- PII
attributes:
- applications.borrowers.credit_card_number
EMAIL:
tags:
- PII
attributes:
- applications.borrowers.email
SSN:
tags:
- PII
attributes:
- applications.borrowers.social_security_number