Basic walk-through
blackline helps you easily manage your PII in data stores. This will help give your organization confidence that PII is only stored as long as it should be and is de-identified correctly.
To begin using blackline, you will need:
- A data store with sample data. If you are running Docker you can use docker-compose to setup a local database with sample data.
- Basic understanding of command line tooling.
- Basic understanding of relational databases.
Installation
It's what you expect from a python package
For more details about installing blackline-core
Adapters
blackline will interegate with a number of data platforms. Each of these platforms have their own package blackline-<adapter name>
. blackline-core
only includes the ability to connect to sqlite.
CLI
A CLI is the main interface with a blackline project.
blackline --help
Usage: blackline [OPTIONS] COMMAND [ARGS]...
Blackline CLI version 0.1.2
Options:
--debug / --no-debug Debug mode
--version Show the version and exit.
--help Show this message and exit.
Commands:
debug Test data store connections.
init Initialize a project.
report Report the defined project.
run Run project.
sample Create a sample project.
Initial project
You can initialize an empty project with a folder structure and a blackline_project.yml
file.
blackline init --help
Usage: blackline init [OPTIONS]
Initialize a project.
Options:
-p, --project-dir PATH Project directory, where blackline_project.yml is
located [default: /home/runner/work/blackline-
core/blackline-core]
-n, --name TEXT Project name [default: blackline]
--default-profile TEXT Default profile to use [default: default]
--catalogue-path PATH Path to the catalogue folder [default: catalogue]
--adapters-path PATH Path to the adapters folder [default: adapters]
--help Show this message and exit.
blackline init
Initialized blackline project at: blackline_sample
tree blackline_sample
blackline_sample
├── adapters
│ └── organization
│ └── system
│ └── resource
│ └── dataset.yaml
├── blackline_project.yml
└── catalogue
└── organization
├── organization.yaml
└── system
├── resource
│ ├── dataset
│ │ └── dataset.yaml
│ └── resource.yaml
└── system.yaml
9 directories, 6 files
Sample project
You can create a simple blackline project with a fake data in a sqlite database using the blackline sample
command.
blackline sample --help
Usage: blackline sample [OPTIONS]
Create a sample project.
Options:
-p, --project-dir PATH Project directory, where blackline_project.yml
is located [default:
/home/runner/work/blackline-core/blackline-
core/blackline_sample]
-n, --name TEXT Project name [default: blackline_sample]
--overwrite / --no-overwrite Overwrite existing project [default: no-
overwrite]
--default-profile TEXT Default profile to use [default: default]
--data-only / --no-data-only Only create a sample sqlite database
[default: no-data-only]
--help Show this message and exit.
blackline sample
Created sample project at: blackline_sample
tree blackline_sample
blackline_sample
├── adapters
│ └── organization
│ └── system
│ └── resource
│ └── dataset.yaml
├── blackline_project.yml
├── blackline_sample.db
└── catalogue
└── organization
├── organization.yaml
└── system
├── resource
│ ├── dataset
│ │ └── dataset.yaml
│ └── resource.yaml
└── system.yaml
9 directories, 7 files
The examples in this getting-started guide are taken from the blackline sample
Define Data Store Connections
Stores are defines as a simple yaml file that detail the connection parameters of an adapter. There are a few simple concepts.
- The data stores are a collection of yaml files that are located in the
adapters
folder. This location can be changed in theblackline_project.yml
file. - The name of the yaml file is the reference name of the store.
- Each file can include multiple store
profiles
which allow you do group the stores de-identified during a run job. - The data store yaml file is defines as:
# /blackline_sample/adapters/organization/system/resource/dataset.yaml
# See details docs at https://docs.getblackline.com/
profiles:
default:
type: sqlite
config:
connection:
database: blackline_sample.db
uri: true
Details related to the individual adapters are found in the Adapters section.
Debug
Once your data stores are configured you can debug the connections with blackline.
blackline debug --help
Usage: blackline debug [OPTIONS]
Test data store connections.
Options:
--profile TEXT Data stores profile to use [required]
-p, --project-dir PATH Project directory, where blackline_project.yml is
located [default: /home/runner/work/blackline-
core/blackline-core]
--help Show this message and exit.
Usage: blackline debug [OPTIONS]
blackline debug --profile dev
Testing connections for profile: default
dataset: good
Validating dataset definitions for profile: <function option.<locals>.decorator at 0x7fbc39e2fbe0>
Dataset: organization.system.resource.dataset:
Collection: user:
Collection found: True
Field: email:
Field found: True
Invalid field constraint: False
Field: ip:
Field found: True
Invalid field constraint: False
Field: name:
Field found: True
Invalid field constraint: False
Collection: shipment:
Collection found: True
Field: street:
Field found: True
Invalid field constraint: False
Exceptions:
None!
Define Catalouge of PII Data
Place a folder for each data store in the catalogue directory. The name of the folder must me the same as the data store. Add a yaml file that defines the columns with PII, the de-identifying method, and their respective retention period for each table. The period format is detailed in pydantic datetime - timedelta
The sample catalogue created:
Sample Organization
Sample System
Sample Resource
# See details docs at https://docs.getblackline.com/
resource:
- key: resource_demo
resource_type: Service
privacy_declarations:
- name: Analyze customer behaviour for improvements.
data_categories:
- user.contact
- user.device.cookie_id
data_use: improve.system
data_subjects:
- customer
data_qualifier: identified_data
Sample Dataset
dataset:
- key: demo_db
name: Demo Database
description: Demo database for Blackline
collections:
user:
name: user
description: User collection
datetime_field:
name: created_at
fields:
- name: name
description: Name of user
deidentifier:
type: redact
period: P365D
- name: email
deidentifier:
type: replace
value: fake@email.com
period: P365D
- name: ip
deidentifier:
type: mask
value: '#'
period: 280 00
shipment:
name: shipment
datetime_field:
name: order_date
fields:
- name: street
deidentifier:
type: redact
period: P185D
Report
Create a report of the data stores and the catalogue values useing report
.
blackline report --help
Usage: blackline report [OPTIONS]
Report the defined project.
Options:
-p, --project-dir PATH Project directory, where blackline_project.yml is
located [default: /home/runner/work/blackline-
core/blackline-core]
--help Show this message and exit.
blackline report
================================================================================
Project Settings:
Project name: blackline_sample
Project Root: blackline_sample
Adapters path: blackline_sample/adapters
Catalogue path: blackline_sample/catalogue
Default profile: default
Data Stores:
Data Store: dataset
Profiles:
default
Type: sqlite
Adapter: <blackline.adapters.sqlite.sqlite.SQLiteAdapter object at 0x7fc23adfeda0>
Config:
Connection:
database: blackline_sample.db
timeout: 5.0
detect_types: 0
isolation_level: DEFERRED
check_same_thread: True
factory: <class 'sqlite3.Connection'>
cached_statements: 100
uri: True
Catalogue:
...
De-identify
With the data stores and catalogue defines to can run blackline to run de-identification over your data.
Beginning with sample data created by blackline sample
we have two tables that are part of the sqlite database blackline_sample.db
.
Existing Data
user table:
+----+------+-----------------+-------------+----------+---------------------+
| id | name | email | ip | verified | created_at |
+----+------+-----------------+-------------+----------+---------------------+
| 00 | Bar | bar@example.com | 555.444.3.2 | 1 | 2021-02-01 00:00:00 |
| 01 | Biz | biz@example.com | 555.444.3.3 | 1 | 2022-06-01 00:00:00 |
| 02 | Baz | baz@example.com | 555.444.3.4 | 0 | 2022-02-01 00:00:00 |
| 03 | Cat | cat@example.com | 555.444.3.5 | 1 | 2023-01-01 00:00:00 |
| 04 | Dog | dog@example.com | 555.444.3.6 | 0 | 2023-01-01 00:00:00 |
+----+------+-----------------+-------------+----------+---------------------+
shipment table:
+----+---------+---------------------+------------------+----------+-----------+-----------+
| id | user_id | order_date | street | postcode | city | status |
+----+---------+---------------------+------------------+----------+-----------+-----------+
| 00 | 01 | 2022-06-01 00:00:00 | Ceintuurbaan 282 | 1072 GK | Amsterdam | delivered |
| 01 | 02 | 2022-03-01 00:00:00 | Singel 542 | 1017 AZ | Amsterdam | delivered |
| 02 | 02 | 2022-04-15 00:00:00 | Singel 542 | 1017 AZ | Amsterdam | delivered |
| 03 | 03 | 2023-01-05 00:00:00 | Wibautstraat 150 | 1091 GR | Amsterdam | delivered |
| 04 | 03 | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR | Amsterdam | returned |
| 05 | 03 | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR | Amsterdam | delivered |
+----+---------+---------------------+------------------+----------+-----------+-----------+
Run de-identification with blackline run
blackline run --help
Usage: blackline run [OPTIONS]
Run project.
Options:
--profile TEXT Data stores profile to use [required]
-p, --project-dir PATH Project directory, where
blackline_project.yml is located [default:
/home/runner/work/blackline-core/blackline-
core]
--start-date [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]
Start date for deidentification [default:
2023-07-20]
--help Show this message and exit.
blackline run --profile dev --start-date 2023-01-01
Running project: /home/runner/work/blackline-core/blackline-core/blackline_sample
Running profile: default
Running start date: 2023-01-01 00:00:00
Finished project: /home/runner/work/blackline-core/blackline-core/blackline_sample
De-identified data
user table:
+----+------+-----------------+-------------+----------+---------------------+
| id | name | email | ip | verified | created_at |
+----+------+-----------------+-------------+----------+---------------------+
| 00 | None | fake@email.com | ###.###.#.# | 1 | 2021-02-01 00:00:00 |
| 01 | Biz | biz@example.com | 555.444.3.3 | 1 | 2022-06-01 00:00:00 |
| 02 | Baz | baz@example.com | ###.###.#.# | 0 | 2022-02-01 00:00:00 |
| 03 | Cat | cat@example.com | 555.444.3.5 | 1 | 2023-01-01 00:00:00 |
| 04 | Dog | dog@example.com | 555.444.3.6 | 0 | 2023-01-01 00:00:00 |
+----+------+-----------------+-------------+----------+---------------------+
shipment table:
+----+---------+---------------------+------------------+----------+-----------+-----------+
| id | user_id | order_date | street | postcode | city | status |
+----+---------+---------------------+------------------+----------+-----------+-----------+
| 00 | 01 | 2022-06-01 00:00:00 | None | 1072 GK | Amsterdam | delivered |
| 01 | 02 | 2022-03-01 00:00:00 | None | 1017 AZ | Amsterdam | delivered |
| 02 | 02 | 2022-04-15 00:00:00 | None | 1017 AZ | Amsterdam | delivered |
| 03 | 03 | 2023-01-05 00:00:00 | Wibautstraat 150 | 1091 GR | Amsterdam | delivered |
| 04 | 03 | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR | Amsterdam | returned |
| 05 | 03 | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR | Amsterdam | delivered |
+----+---------+---------------------+------------------+----------+-----------+-----------+