Data Map for S3
The Data Map is an inventory of your data locations
that allows you to establish a short name, called a LABEL
, for each
data location (like a table, collection, or S3 bucket) that you want
to protect. When you write a policy rule, you'll use LABEL
s rather
than specific table and column names to specify which data the rule
protects.
Each LABEL
maps to a specific location (for example, a specific column
in a specific database). Because a single LABEL
can refer to many
locations in many repositories, the Data Map gives you the ability to
write a policy that treats your data consistently, even when that data
is spread across many data repositories.
Structure
The Data Map follows this structure:
{ LABEL }:
attributes: [{ ATTRIBUTE_LOCATION }, ...]
The fields are defined as follows:
{LABEL}
(string): label given to the data specified in the corresponding list.- each value in the list assigned to a label is an object made up of two fields:
attributes
([string]): contains the specific locations of the data within the repo, following the pattern{BUCKET}.{KEY}
, whereKEY
is optional
For example, the following Data Map entries are valid for S3:
EXAMPLE_BUCKET:
attributes: [my_bucket_name]
EXAMPLE_KEY:
attributes: [my_bucket_name.key]
Example
FUNDING_BUCKET_ALL:
attributes: [finance-funding]
FUNDING_2022_EVENTS:
attributes: [finance-funding.2022.event]
In the above example, for an S3 repository that you've tracked in Cyral:
- The label
FUNDING_BUCKET_ALL
can be used to write policies that govern access to an entire S3 bucket, meaning it will cover all keys (files and folders) inside the designed bucket. - The label
FUNDING_2022_EVENTS
can be used to write policies that govern access to a specific S3 key, which could be a single file or a folder. In this example,2022.event
designates a specific folder in thefinance-funding
bucket.
Policy examples
The following policy examples using the three labels we established above.
Case 1: No file access
The user should not be able to read any file from the finance-funding
bucket:
data:
- FUNDING_BUCKET_ALL
rules:
- identities:
users:
- frank.hardy@hhiu.us
By adding FUNDING_BUCKET_ALL
to the top data field, we instruct the
sidecars that this label is associated with sensitive data that needs
to be governed by this policy. Since the rules
block contains no
rule declaring access permissions for this label, user Frank has no
access.
Case 2: The right to read files only
The user should be able to read any file from the
finance-funding.2022.event
folder, but should not be able to
list other folders or read files from any other folders inside that
bucket.
data:
- FUNDING_BUCKET_ALL
- FUNDING_2022_EVENTS
rules:
- identities:
users:
- frank.hardy@hhiu.us
reads:
- data:
- FUNDING_2022_EVENTS
rows: any
severity: low
By adding FUNDING_BUCKET_ALL
to the top data field, we instruct the
sidecars that this label is associated with sensitive data that needs
to be governed by this policy. Since the rules
block contains no
rule providing the access permissions for this label, user Frank
has no access to the bucket as a whole.
By adding FUNDING_2022_EVENTS
to the top data field, we instruct the
sidecars that this label is associated with sensitive data that needs
to be governed by this policy. This label also shows up in the
rules.reads.data
entry, meaning that the read access is governed by
that specific rule.
Within this policy, we have two labels covering the same data:
FUNDING_2022_EVENTS
: covers only the folderfinance-funding.2022.event
FUNDING_BUCKET_ALL
: covers all folders in this bucket, includingfinance-funding.2022.event
.
When Cyral encounters a case like this, the most specific label is used to evaluate policies.
In this example, this means that even though FUNDING_BUCKET_ALL
would prohibit Frank from reading data from
finance-funding.2022.event
, the more specific label,
FUNDING_2022_EVENTS
, overrides the broader label and allows the
read to proceed.
Based on the bolicy above, Frank's attempt to run the following
will fail because the policy does not contain a reads
rule for
the FUNDING_BUCKET_ALL
. At the command line, Frank would see this:
aws s3 ls s3://finance-funding
Using S3 proxy: http://edge-sidecar-a01.example.cyral.com:453
An error occurred (Forbidden) when calling the ListObjectsV2 operation: Request blocked as user
[frank.hardy@hhiu.us] does not have permission to access the required resource
On the other hand, Frank can successfully download a file from the finance-funding.2022.event
folder
because the policy for him contains a reads
rule for the FUNDING_2022_EVENTS
label. Here's what Frank will see:
aws s3 cp s3://finance-funding/2022/funding/output.txt /tmp
Using S3 proxy: http://edge-sidecar-a01.example.cyral.com:453
download: s3://finance-funding/2022/funding/output.txt to ../../../../tmp/output.txt
S3 object key names containing dots
This behavior also applies to any other data repo, as this is a characteristic of the Data Map. It is not specifically related to S3.
When an S3 object key name contains dots
Let’s use the file downloaded in the previous use case as an example.
This file resides in the S3 bucket finance-funding
under the following
S3 object key name:
finance-funding/2022/funding/output.txt
When adding this location to the Cyral Data Map, the administrator
needs to convert it to the format used by Cyral, which consists of
converting the delimiters from slashes (/
) to dots (.
).
When naively doing this conversion, we might end up with the following attribute entry:
SAMPLE_LABEL:
attributes:
- finance-funding.2022.event.output.txt
The above entry will be wrongly interpreted by the sidecar. To avoid such misbehavior, names containing dots must be wrapped in double-quotes. The correct way to write the above object key name in the Cyral Data Map is:
SAMPLE_LABEL:
attributes:
- finance-funding.2022.event."output.txt"
When an S3 bucket name contains dots
If your S3 bucket name contains dots (.
), you must:
- wrap the bucket name first in double quotes
- wrap the entire bucket and object name string in single quotes.
For a sample S3 bucket called financefunding.euro
, this would look like:
SAMPLE_LABEL:
attributes:
- '"financefunding.euro"'
For a sample object key 2022.event
inside the bucket financefunding.euro
, this would look like:
SAMPLE_LABEL:
attributes:
- '"financefunding.euro".2022.event'
note
Why is this needed? The wrapping of bucket names in double quotes
and single quotes overcomes a YAML limitation. In the Cyral management
console UI, Data Maps are managed through YAML files, and this
introduces complications when strings start with quotation marks.
By double-wrapping the bucket name, we preserve the double quotes
around the bucket name, even when it's used with a dot-delimited S3
object key name like '"financefunding.euro".2022.event'
Next steps
- To learn more about protecting S3, see Track an S3 storage location
- To learn about policies in Cyral, see Policy framework