Dataset
A dataset is a grouping of data collections. A dataset could be a database, a storage bucket, or a Big Query dataset.
A dataset is defined in the catalogue/\<organization name\>/\<system name\>/\<resource name/>/\<resource name \>/dataset.yml
file. The dataset.yml
file accepts all arguments defined in the Dataset
model with the expection of:
- The
key
is created using the catalogue folder structure.
Example
# dataset.yaml
dataset:
key: dataset_key
tags:
- foo
- bar
name: user
description: <resource description>
meta:
last_updated: '2021-01-01'
version: 1.0.0
data_categories:
- user.contact
data_qualifier: identified
joint_controller:
name: Dave
address: Museumplein 10, 1071 DJ Amsterdam, Netherlands
email: dave@organization.com
phone: 020 573 2911
third_country_transfers:
- USA
- CAN
collections:
user:
name: user
description: user data
data_categories:
- user.contact
data_qualifier: identified
fields:
- name: email
description: user email
data_categories:
- user.contact.email
data_qualifier: identified
deidentifier:
type: replace
value: fake@email.com
period: P365D
- name: name
description: user name
data_categories:
- user.name
data_qualifier: identified
deidentifier:
type: redact
period: P365D
datetime_field:
name: created_at
Models
blackline.models.catalogue.Dataset
Bases: BlacklineModel
The Dataset resource model.
Todo: This breaks the Liskov substitution principle because it restrics the BlacklineModel, not expand it. This model has no children.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
meta |
Optional[dict[str, str]]
|
None
|
|
data_categories |
Optional[list[Key]]
|
Array of Data Category resources identified by `key`, that apply to all collections in the Dataset. |
None
|
data_qualifier |
Key
|
required | |
joint_controller |
Optional[ContactDetails]
|
None
|
|
third_country_transfers |
Optional[list[str]]
|
An optional array to identify any third countries where data is transited to. For consistency purposes, these fields are required to follow the Alpha-3 code set in [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3). |
None
|
_check_valid_country_code |
classmethod
|
required | |
_alias | required | ||
children |
Optional[dict[str, DatasetCollection]]
|
None
|
|
stem | required | ||
children_stem | required | ||
children_cls | required | ||
collections |
Optional[dict[str, DatasetCollection]]
|
required |
Source code in BAR /opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/blackline/models/catalogue.py
blackline.models.catalogue.BlacklineModel
Bases: BaseModel
The base model for all Resources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
Key
|
A unique key used to identify this resource. | required |
tags |
Optional[list[str]]
|
A list of tags for this resource. |
None
|
name |
Optional[str]
|
None
|
|
description |
Optional[str]
|
None
|
|
children |
Optional[dict[str, Type[BlacklineModel]]]
|
The children resources. |
None
|
stem |
str
|
The stem of the resource. | required |
children_stem |
Optional[str]
|
required | |
children_cls |
Optional[type[BlacklineModel]]
|
required |
Source code in BAR /opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/blackline/models/catalogue.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
|
Config
parse_children(path, key_prefix=None)
classmethod
Parse a directory of YAML files into a dictionary of Dataset objects.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the directory of YAML files. |
required |
path |
Path
|
Path |
required |
Returns:
Type | Description |
---|---|
dict[str, Type[BlacklineModel]]
|
A dictionary of Dataset objects. |
Source code in /opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/blackline/models/catalogue.py
parse_dir(path, key_prefix=None)
classmethod
Parse a directory of YAML files into a dictionary of Dataset objects.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the directory of YAML files. |
required |
path |
Path
|
Path |
required |
Returns:
Type | Description |
---|---|
A dictionary of Dataset objects. |
Source code in /opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/blackline/models/catalogue.py
parse_yaml(path, key, children={})
classmethod
Parse a yaml file into a the children_cls object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
Path location of the yaml file. |
required |
key |
str
|
Key to identify the dataset. |
required |
Returns:
Type | Description |
---|---|
Dataset object. |
Source code in /opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/blackline/models/catalogue.py
blackline.models.catalogue.ContactDetails
Bases: BaseModel
The contact details information model.
Used to capture contact information for controllers, used as part of exporting a data map / ROPA.
This model is nested under an Organization and potentially under a system/dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
Optional[str]
|
An individual name used as part of publishing contact information. |
None
|
address |
Optional[str]
|
An individual address used as part of publishing contact information. |
None
|
Optional[str]
|
An individual email used as part of publishing contact information. |
None
|
|
phone |
Optional[str]
|
An individual phone number used as part of publishing contact information. |
None
|
Source code in BAR /opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/blackline/models/catalogue.py
blackline.models.catalogue.DatasetCollection
Bases: BlacklineModel
The DatasetCollection resource model.
This resource is nested witin a Dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the collection. | required |
datetime_field |
Optional[DatetimeField]
|
The datetime field to use for the retention limit calculations. |
None
|
where |
Optional[str]
|
An addional where clause to append to the exeisting: 'WHERE {{ datetime_column }} < %(cutoff)s'. |
None
|
fields |
Optional[list[DatasetField]]
|
An array of objects that describe the collection's fields. |
None
|
data_categories |
Optional[list[Key]]
|
Array of Data Category resources identified by `key`, that apply to all fields in the collection. |
None
|
data_qualifier |
Key
|
required | |
_sort_fields |
classmethod
|
required | |
dependencies |
Optional[list[str]]
|
The collection dependencies. |
None
|
Source code in BAR /opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/blackline/models/catalogue.py
blackline.models.catalogue.DatasetField
Bases: DatasetFieldBase
The DatasetField resource model.
This resource is nested within a DatasetCollection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields |
Optional[list[DatasetField]]
|
An optional array of objects that describe hierarchical/nested fields (typically found in NoSQL databases). |
None
|