Skip to content

Basic walk-through

blackline helps you easily manage your PII in data stores. This will help give your organization confidence that PII is only stored as long as it should be and is de-identified correctly.

To begin using blackline, you will need:

  • A data store with sample data. If you are running Docker you can use docker-compose to setup a local database with sample data.
  • Basic understanding of command line tooling.
  • Basic understanding of relational databases.

Installation

It's what you expect from a python package

pip install blackline-core

For more details about installing blackline-core

Adapters

blackline will interegate with a number of data platforms. Each of these platforms have their own package blackline-<adapter name>. blackline-core only includes the ability to connect to sqlite.

CLI

A CLI is the main interface with a blackline project.

blackline --help
Usage: blackline [OPTIONS] COMMAND [ARGS]...

  Blackline CLI version 0.1.2

Options:
  --debug / --no-debug  Debug mode
  --version             Show the version and exit.
  --help                Show this message and exit.

Commands:
  debug   Test data store connections.
  init    Initialize a project.
  report  Report the defined project.
  run     Run project.
  sample  Create a sample project.

Initial project

You can initialize an empty project with a folder structure and a blackline_project.yml file.

blackline init --help
Usage: blackline init [OPTIONS]

  Initialize a project.

Options:
  -p, --project-dir PATH  Project directory, where blackline_project.yml is
                          located  [default: /home/runner/work/blackline-
                          core/blackline-core]
  -n, --name TEXT         Project name  [default: blackline]
  --default-profile TEXT  Default profile to use  [default: default]
  --catalogue-path PATH   Path to the catalogue folder  [default: catalogue]
  --adapters-path PATH    Path to the adapters folder  [default: adapters]
  --help                  Show this message and exit.
blackline init
Initialized blackline project at: blackline_sample

tree blackline_sample
blackline_sample
├── adapters
│   └── organization
│       └── system
│           └── resource
│               └── dataset.yaml
├── blackline_project.yml
└── catalogue
    └── organization
        ├── organization.yaml
        └── system
            ├── resource
            │   ├── dataset
            │   │   └── dataset.yaml
            │   └── resource.yaml
            └── system.yaml

9 directories, 6 files

Sample project

You can create a simple blackline project with a fake data in a sqlite database using the blackline sample command.

blackline sample --help
Usage: blackline sample [OPTIONS]

  Create a sample project.

Options:
  -p, --project-dir PATH        Project directory, where blackline_project.yml
                                is located  [default:
                                /home/runner/work/blackline-core/blackline-
                                core/blackline_sample]
  -n, --name TEXT               Project name  [default: blackline_sample]
  --overwrite / --no-overwrite  Overwrite existing project  [default: no-
                                overwrite]
  --default-profile TEXT        Default profile to use  [default: default]
  --data-only / --no-data-only  Only create a sample sqlite database
                                [default: no-data-only]
  --help                        Show this message and exit.


blackline sample
Created sample project at: blackline_sample

tree blackline_sample
blackline_sample
├── adapters
│   └── organization
│       └── system
│           └── resource
│               └── dataset.yaml
├── blackline_project.yml
├── blackline_sample.db
└── catalogue
    └── organization
        ├── organization.yaml
        └── system
            ├── resource
            │   ├── dataset
            │   │   └── dataset.yaml
            │   └── resource.yaml
            └── system.yaml

9 directories, 7 files

The examples in this getting-started guide are taken from the blackline sample

Define Data Store Connections

Stores are defines as a simple yaml file that detail the connection parameters of an adapter. There are a few simple concepts.

  1. The data stores are a collection of yaml files that are located in the adapters folder. This location can be changed in the blackline_project.yml file.
  2. The name of the yaml file is the reference name of the store.
  3. Each file can include multiple store profiles which allow you do group the stores de-identified during a run job.
  4. The data store yaml file is defines as:
# /blackline_sample/adapters/organization/system/resource/dataset.yaml
# See details docs at https://docs.getblackline.com/
profiles:
  default:
    type: sqlite
    config:
      connection:
        database: blackline_sample.db
        uri: true

Details related to the individual adapters are found in the Adapters section.

Debug

Once your data stores are configured you can debug the connections with blackline.

blackline debug --help
Usage: blackline debug [OPTIONS]

  Test data store connections.

Options:
  --profile TEXT          Data stores profile to use  [required]
  -p, --project-dir PATH  Project directory, where blackline_project.yml is
                          located  [default: /home/runner/work/blackline-
                          core/blackline-core]
  --help                  Show this message and exit.

Usage: blackline debug [OPTIONS]

blackline debug --profile dev
Testing connections for profile: default
  dataset: good
Validating dataset definitions for profile: <function option.<locals>.decorator at 0x7fbc39e2fbe0>
Dataset: organization.system.resource.dataset:
  Collection: user:
    Collection found: True
    Field: email:
       Field found: True
       Invalid field constraint: False
    Field: ip:
       Field found: True
       Invalid field constraint: False
    Field: name:
       Field found: True
       Invalid field constraint: False
  Collection: shipment:
    Collection found: True
    Field: street:
       Field found: True
       Invalid field constraint: False
  Exceptions:
      None!

Define Catalouge of PII Data

Place a folder for each data store in the catalogue directory. The name of the folder must me the same as the data store. Add a yaml file that defines the columns with PII, the de-identifying method, and their respective retention period for each table. The period format is detailed in pydantic datetime - timedelta

The sample catalogue created:

Sample Organization

# See details docs at https://docs.getblackline.com/
organization:
  - key: organization_demo

Sample System

# See details docs at https://docs.getblackline.com/
system:
  - key: system_demo

Sample Resource

# See details docs at https://docs.getblackline.com/
resource:
  - key: resource_demo
    resource_type: Service
    privacy_declarations:
      - name: Analyze customer behaviour for improvements.
        data_categories:
          - user.contact
          - user.device.cookie_id
        data_use: improve.system
        data_subjects:
          - customer
        data_qualifier: identified_data

Sample Dataset

    dataset:
      - key: demo_db
        name: Demo Database
        description: Demo database for Blackline
        collections:
          user:
            name: user
            description: User collection
            datetime_field:
              name: created_at
            fields:
              - name: name
                description: Name of user
                deidentifier:
                  type: redact
                period: P365D
              - name: email
                deidentifier:
                  type: replace
                  value: fake@email.com
                period: P365D
              - name: ip
                deidentifier:
                  type: mask
                  value: '#'
                period: 280 00
          shipment:
            name: shipment
            datetime_field:
                name: order_date
            fields:
              - name: street
                deidentifier:
                  type: redact
                period: P185D

Report

Create a report of the data stores and the catalogue values useing report.

blackline report --help
Usage: blackline report [OPTIONS]

  Report the defined project.

Options:
  -p, --project-dir PATH  Project directory, where blackline_project.yml is
                          located  [default: /home/runner/work/blackline-
                          core/blackline-core]
  --help                  Show this message and exit.



blackline report

================================================================================
Project Settings:
Project name: blackline_sample
Project Root: blackline_sample
Adapters path: blackline_sample/adapters
Catalogue path: blackline_sample/catalogue
Default profile: default

Data Stores:
Data Store: dataset
Profiles:
  default
    Type: sqlite
    Adapter: <blackline.adapters.sqlite.sqlite.SQLiteAdapter object at 0x7fc23adfeda0>
    Config:
      Connection:
        database: blackline_sample.db
        timeout: 5.0
        detect_types: 0
        isolation_level: DEFERRED
        check_same_thread: True
        factory: <class 'sqlite3.Connection'>
        cached_statements: 100
        uri: True

Catalogue:
...

De-identify

With the data stores and catalogue defines to can run blackline to run de-identification over your data.

Beginning with sample data created by blackline sample we have two tables that are part of the sqlite database blackline_sample.db.

Existing Data

user table:
+----+------+-----------------+-------------+----------+---------------------+
| id | name |      email      |      ip     | verified |      created_at     |
+----+------+-----------------+-------------+----------+---------------------+
| 00 | Bar  | bar@example.com | 555.444.3.2 |    1     | 2021-02-01 00:00:00 |
| 01 | Biz  | biz@example.com | 555.444.3.3 |    1     | 2022-06-01 00:00:00 |
| 02 | Baz  | baz@example.com | 555.444.3.4 |    0     | 2022-02-01 00:00:00 |
| 03 | Cat  | cat@example.com | 555.444.3.5 |    1     | 2023-01-01 00:00:00 |
| 04 | Dog  | dog@example.com | 555.444.3.6 |    0     | 2023-01-01 00:00:00 |
+----+------+-----------------+-------------+----------+---------------------+

shipment table:
+----+---------+---------------------+------------------+----------+-----------+-----------+
| id | user_id |      order_date     |      street      | postcode |    city   |   status  |
+----+---------+---------------------+------------------+----------+-----------+-----------+
| 00 |    01   | 2022-06-01 00:00:00 | Ceintuurbaan 282 | 1072 GK  | Amsterdam | delivered |
| 01 |    02   | 2022-03-01 00:00:00 |    Singel 542    | 1017 AZ  | Amsterdam | delivered |
| 02 |    02   | 2022-04-15 00:00:00 |    Singel 542    | 1017 AZ  | Amsterdam | delivered |
| 03 |    03   | 2023-01-05 00:00:00 | Wibautstraat 150 | 1091 GR  | Amsterdam | delivered |
| 04 |    03   | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR  | Amsterdam |  returned |
| 05 |    03   | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR  | Amsterdam | delivered |
+----+---------+---------------------+------------------+----------+-----------+-----------+

Run de-identification with blackline run

blackline run --help
Usage: blackline run [OPTIONS]

  Run project.

Options:
  --profile TEXT                  Data stores profile to use  [required]
  -p, --project-dir PATH          Project directory, where
                                  blackline_project.yml is located  [default:
                                  /home/runner/work/blackline-core/blackline-
                                  core]
  --start-date [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S]
                                  Start date for deidentification  [default:
                                  2023-07-20]
  --help                          Show this message and exit.

blackline run --profile dev --start-date 2023-01-01

Running project: /home/runner/work/blackline-core/blackline-core/blackline_sample
Running profile: default
Running start date: 2023-01-01 00:00:00
Finished project: /home/runner/work/blackline-core/blackline-core/blackline_sample

De-identified data

user table:
+----+------+-----------------+-------------+----------+---------------------+
| id | name |      email      |      ip     | verified |      created_at     |
+----+------+-----------------+-------------+----------+---------------------+
| 00 | None |  fake@email.com | ###.###.#.# |    1     | 2021-02-01 00:00:00 |
| 01 | Biz  | biz@example.com | 555.444.3.3 |    1     | 2022-06-01 00:00:00 |
| 02 | Baz  | baz@example.com | ###.###.#.# |    0     | 2022-02-01 00:00:00 |
| 03 | Cat  | cat@example.com | 555.444.3.5 |    1     | 2023-01-01 00:00:00 |
| 04 | Dog  | dog@example.com | 555.444.3.6 |    0     | 2023-01-01 00:00:00 |
+----+------+-----------------+-------------+----------+---------------------+

shipment table:
+----+---------+---------------------+------------------+----------+-----------+-----------+
| id | user_id |      order_date     |      street      | postcode |    city   |   status  |
+----+---------+---------------------+------------------+----------+-----------+-----------+
| 00 |    01   | 2022-06-01 00:00:00 |       None       | 1072 GK  | Amsterdam | delivered |
| 01 |    02   | 2022-03-01 00:00:00 |       None       | 1017 AZ  | Amsterdam | delivered |
| 02 |    02   | 2022-04-15 00:00:00 |       None       | 1017 AZ  | Amsterdam | delivered |
| 03 |    03   | 2023-01-05 00:00:00 | Wibautstraat 150 | 1091 GR  | Amsterdam | delivered |
| 04 |    03   | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR  | Amsterdam |  returned |
| 05 |    03   | 2023-01-06 00:00:00 | Wibautstraat 150 | 1091 GR  | Amsterdam | delivered |
+----+---------+---------------------+------------------+----------+-----------+-----------+