[WG-InfoSharing] W3C Data Privacy Vocabulary - Consent Receipt Inputs

Andrew Hughes andrewhughes3000 at gmail.com
Sun Jun 23 19:45:35 UTC 2019

This is great, thank you! In-line responses >>

On Sun, Jun 23, 2019 at 1:04 AM Harshvardhan J. Pandit <me at harshp.com>

> Hi Andrew, replying inline.
> On 21 Jun 2019, 15:32 +0100, Andrew Hughes <andrewhughes3000 at gmail.com>,
> wrote:
> A) can we treat the list of terms in the vocabulary as exactly that: a
> controlled word list?
> In essence, yes. However, this word list also contains an expandable
> hierarchy - which you or others can expand for your own use-cases. For
> example, if Org A says they collect "common name” and Org B says they
> collect “given name”, they are both collecting specific categories of a
> more abstract category - “name”. It is impossible to have a vocabulary that
> will contain EVERY term required as there will always be new use-cases and
> types of personal data - so the DPV instead chose to create a top-level
> hierarchy of terms, under which all terms can be coalesced. As an analogy,
> this is how beings are structured in science, with top-level categories for
> animals, birds, etc. and whenever a new species is found or required, they
> select the closest applicable category and create a new ’term’ for the
> species as part of it, such as “Labrador” under “dog”. Additionally, the
> DPV also provides a way to express relationships, and (one possible) way to
> model personal data handling. So I’d argue it is much more than just a
> controlled word list.

ACH>> OK - I think I better understand how DPV might be used by different
roles. If a policy designer uses DPV, they will dig into the hierarchy and
extensibility aspects - to ensure that the categories, sub-categories and
instances sufficiently capture the intended scenarios. If a system designer
wants to use the DPV, then they will take the output of the policy designer
and treat it as a controlled vocabulary/word list.

> B) what is supposed to happen when a word has more than one definition? Or
> is the vocabulary not about definitions but rather about "list of words”?
> Do you mean when there are differing uses of a word? I’m trying to think
> of where this would occur, but am lost at the moment to come up with
> something. Do you have an example?
> I think each term comes with its ‘definition’ or ‘description’, so if
> there is another definition that is completely separate, then I don’t think
> the two terms would be compatible. For example, I cannot have the term
> ‘dog’ mean the canine animal, and also a ‘bad person’ in the same
> vocabulary. This is what separates it from a dictionary/thesauri - each
> term has an intended meaning, and semantics. There are instances where this
> meaning changes over time (generations) - this is called semantic drift,
> but this is not something that is intended or can be planned for.
> For example, when we say “messages”, it could have meant post in the 19th
> century, telephone/telegraph in the 20th, SMS/VoIP/internet messages in the
> 21st.
> ACH>> OK - that clarifies very well. In a 'dictionary' many definitions
can exist per word/term. In the DPV each term has a unique
description/definition, so there is no possibility of direct
misinterpretation. There might be some confusion if the same term is
defined in different branches of the vocabulary, but those can be
disambiguated by following the paths back to a common node.

> C) regarding the RDF - if one were to use, for example, JSON-LD and refer
> to schema.org context and also this RDF - should it work? (Recognizing
> that this question is really stretching the limits of my knowledge on
> semantic web-ish topics - so please rephrase the question if needed)
> JSON-LD is a serialisation of RDF using JSON. So JSON-LD *is* a way to
> write RDF. Similarly, schema.org is also RDF (or can be expressed in
> RDF), so it easily and readily possible to use the DPV in JSON-LD and with
> schema.org.
> ACH>> Good point!

> In the most simplistic scenario, does this usage sound right:
> - I am a Data Controller designing my Consent Receipt data structure
> - in this scenario, I have only one processing purpose
> - in order to choose which Purpose for Data Processing to include in the
> design, I choose the appropriate Purpose word from the DPV document.
> - therefore I have confidence that other Data Controllers and Data
> Processors who also use the DPV will know what that specific Purpose word
> means when they see it in the Consent Receipt output file and can act
> accordingly
> Yes : )
> I’ll expand the use-case to show how the RDF thing fits in this picture
> Let’s say you have a novel purpose specific to your processing, called
> PurposeX - which is a special type of R&D you do. So the closest term in
> the DPV to match your purpose is R&D. You have two options:
> a) You declare your purpose as R&D in the DPV, and it’s exactly as you
> described
> b) You declare your purpose as PurposeX, which is a special type of term
> R&D defined in DPV
> In both cases, any organisation can see that you’re doing R&D as defined
> in the DPV. However, there might be a consortium or partners who also know
> or are doing PurposeX. For them, it is beneficial to know you are doing
> PurposeX rather than just generic R&D. This can also be part of a shared
> vocabulary organisations develop in a particular domain - say medical or
> insurance.
> ACH>>
After re-reading the DPV doc (i.e. Section 1 ! ) a bit more deeply, I
understand more deeply that the DPV is all about annotation of a
defined/established Personal Data Handling (PDH) activity. A Data
Controller envisions a PDH activity and uses the DPV to define the
organizational policy for that PDH activity.
The part that tripped me up when trying to read this the first time is that
it was/is not obvious when reading the document sequentially that
dpv:PersonalDataHandling is the top level class (took a couple of readings
to understand that - possibly because the dpv:PersonalDataHandling is in
the table of contents at the same level as the other terms in the Base
Vocabulary) When I look at https://www.w3.org/ns/dpv , Figure 1 is missing
- and I suspect that it might show the thing I failed to understand.

I'm having trouble parsing the Description of dpv:PersonalDataHandling
<https://www.w3.org/ns/dpv#dpv:PersonalDataHandling> - is it a noun? a
verb? does it describe a business process that could be executed? or does
it describe that business process that is currently being executed? or one
that executed in the past?
While the language used seems to indicate that dpv:PersonalDataHandling is
about the currently-executing business process, there are no temporal
Properties to describe when the event occurs/occurred/will occur (should
there be?) In that respect, it leads me to think that the
dpv:PersonalDataHandling is a description of an abstract instance as
characterized by the combination of data controller, data subject, legal
basis, category, purpose, etc. And when the Data Controller invokes that
dpv:PersonalDataHandling process, the Data Controller could record
transactional metadata including timestamps, serial numbers, transaction
codes, etc.

The paragraph above indicates how I'm having difficulty parsing the Section
8: dpv:Consent <https://www.w3.org/ns/dpv#dpv:Consent> Classes and
Properties. Because dpv:Consent does include temporal properties and can
only be instantiated at the time of the data subject agreeing to personal
data handling or some point in time afterwards.

I tend to think of this stuff in maybe 3 phases: 'setup'; 'transaction
execution'; 'historical/forensic'. The 'setup' phase is more the realm of
defined practices and defined policy. There are no temporal metadata. The
'transaction execution' phase is where temporal metadata are captured. The
'historical/forensic' phase is concerned with looking back into the event
logs to see the metadata captured during the 'transaction execution' phase.

Am I confusing myself? Given this email, can you spot where I'm
misunderstanding the document?

Also, is dpv:Consent a SubClass of dpv:LegalBasis? And if yes, then is the
dpv:Consent entry missing "is SubClass of: dpv:LegalBasis".

And also, if dpv:Consent is a SubClass of dpv:LegalBasis, an obvious future
extension could be a subclass for other legal bases, correct?

Apologies for being dense (and thank you for taking the time to respond!)

> Harsh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kantarainitiative.org/pipermail/wg-infosharing/attachments/20190623/ecec6587/attachment.html>

More information about the WG-InfoSharing mailing list