Skip to content

[PROPOSAL] External masks library #240

@adrienaury

Description

@adrienaury

Definitions

A masking definition contains the following parts :

  • the generator : describe the process to generate a new value
  • the coherence context : describe the level of coherence expected for the new value (consistency with other current values or previous values)
  • the location : where the value will be written in the json data

The generator is usually defined by the mask part of the masking.yml, except for "hash" and "hashInUri" masks which contains a coherence element.

The coherence is usually defined by some properties added to the mask : seed, cache or the hash part in "hash" and "hashInUri" masks.

The location is defined by the selector part.

What we need to store in a masking library, is only the generator part. When applied in a given context, we can choose where we apply it (selector) and how we handle consistency (cache, seed, hash + what source field is used).

Note: we can allow coherence information in some dedicated masks.
Note: we can allow selector information in case of multiple fields output.

Examples

This generator :

- randomChoiceInUri: "pimo://nameFR"

Can be used in differnt contexts :

# synthesize new data :
- selector:
    jsonpath: "name1"
  masks:
    - add: ""
    - randomChoiceInUri: "pimo://nameFR"

# synthesize new data consistently with another field:
- selector:
    jsonpath: "name2"
  masks:
    - add: ""
    - randomChoiceInUri: "pimo://nameFR"
  seed:
    field: "id"

# pseudonymize consistently with another field:
- selector:
    jsonpath: "name3"
  mask:
    randomChoiceInUri: "pimo://nameFR"
  seed:
    field: "id"

...

How to define a mask library

The library should expose a variety of data types

  • how to generate a french familly name (locale fr_FR)
  • how to generate a french siret
  • how to generate a birth date
  • etc ...

This can be done by storing a single file for each data type, that contains the list of masks to apply.

filename : person_name_fr_FR.yml

version: "1":
masking:
- selector:
    jsonpath: "."
  mask:
    randomChoiceInUri: "pimo://nameFR"

It's similar to a normal masking. Except for the "." jsonpath that allow to write on the current location in the json stream (where the mask is applied).

Some generators can take parameters

filename : nir.yml

masking:
  - selector:
      jsonpath: "gender"         #if present then gender is used a parameter 
    masks:
      - add: true                       #add parameter if not present 
      - randomChoice: [1, 2]
    preserve: "value"               #preserve parameter value if present 
# other parameters ...
  - selector:
      jsonpath: "nir"
    masks:
      - add: true  #in this example, the result will be created in a new subfield
      - template: '{{if eq .gender "M" }}1{{else}}2{{end}}{{.birth_date | substr 8 10}}{{.birth_date | substr 3 5}}{{.department_code | printf "%02d"}}{{.city_code | printf "%03d"}}{{.order | printf "%03d"}}'
      - template: '{{ sub 97 (mod (int64 .nir_start)  97)}}'

How to use masks library

The library can be a folder, a git repository, a website, ...

A new property need to be created to load the library, in the masking.yml

version: "1"
librairies:
- "http://domain.org/mylibrary"
- "pimo://internal-library"
- "https+git://github.com/repo/library.git@v0.1.0"
- "file://mylocalibrary"

Then a mask from library can be used via a new type of mask

- selector:
    jsonpath: "nir"
  mask:
    generate:
      using: "nir" # name of the yaml file in the library

Passing parameters : option 1

- selector:
    jsonpath: "nir"
  mask:
    generate:
      using: "nir" # name of the yaml file in the library
      with:
        gender: "M"

or, if we want to use an existing field as parameter

- selector:
    jsonpath: "nir"
  mask:
    generate:
      using: "nir"
      with:
        gender: { from: "gender" }

Passing parameters : option 2

# precreate a param with a value
- selector:
    jsonpath: "gender"
  mask:
    constant: "M"
# call mask on the current document (selector: ".")
- selector:
    jsonpath: "."
  mask:
    generate:
      using: "nir" # name of the yaml file in the library

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions