Commit 98d8a13c authored by taco@waag.org's avatar taco@waag.org
Browse files

integrated stefanos feedback

parent ea56b80a
Title: First steps towards a Biocommons
Title: Prototyping geneconsent
Author: Taco van Dijk
# First steps towards a Biocommons
# Prototyping geneconsent
## Introduction
This article is about what we learned during the development of the GeneConsent prototype that might apply for future Datacommons projects at Waag.
This article is about what we learned during the development of the GeneConsent demonstrator that might apply for future Datacommons projects at Waag.
The GeneConsent prototype interactively demonstrates a flavour of [Dynamic Informed Consent](https://waag.org/en/article/waag-dutch-design-week-2020-donor-codicil-your-data) in the context of genetic data. We worked on a very specific use case about eye melanoma research. To illustrate the complexity of what we were dealing with, the following pictures show many of the things that happen with our data just in our eye melanoma case;
The GeneConsent prototype interactively demonstrates a flavour of [dynamic informed consent](https://waag.org/nl/article/waag-op-dutch-design-week-2020-een-donorcodicil-voor-je-data) in the context of genetic data. We worked on a very specific use case about eye melanoma research. To illustrate the complexity of what we were dealing with, the following pictures show many of the things that happen with our data just in our eye melanoma case;
<img src="eye_melanoma_1.png" alt="drawing" width="300" style="display:block;margin-left: auto;margin-right:auto;"/>
......@@ -15,22 +15,32 @@ The GeneConsent prototype interactively demonstrates a flavour of [Dynamic Infor
<img src="eye_melanoma_2.png" alt="drawing" width="500" style="display:block;margin-left: auto;margin-right:auto;"/>
2\. The donor sends a bio sample (genetic material) to the Researcher, and fills out a form with phenotypical information (things like gender, age etc) that are needed to conduct the research. The researcher is responsible for saving and guarding this sensitive data.
2\. The donor sends a bio sample (genetic material) to the Researcher, and fills out a form with phenotypical information (things like gender, age, lifestyle etc) that are needed to conduct the research. The researcher is responsible for saving and guarding this sensitive data.
<img src="eye_melanoma_3.png" alt="drawing" width="500" style="display:block;margin-left: auto;margin-right:auto;"/>
3\. The researcher sends a bio sample to a third party lab that conducts the genetic sequencing. For this it stores the bio sample in a bio vault (think refrigerator). It conducts the sequencing and the result is binary array data. A copy of this file is returned to the Researcher.
3\. The researcher sends a bio sample to a third party lab that conducts a generic micro array test, this is:
> A tool used to determine whether the DNA from a particular individual contains a mutation in genes.
>
> very large numbers of 'features' can be put on microarray chips, representing a very large portion of the human genome
>
[source](https://www.genome.gov/about-genomics/fact-sheets/DNA-Microarray-Technology).
The array test lab stores the bio sample in a bio vault (think refrigerator). It conducts the test and the result is binary data that can be read by a computer. A copy of this file is returned to the Researcher.
<img src="eye_melanoma_4.png" alt="drawing" width="500" style="display:block;margin-left: auto;margin-right:auto;"/>
4\. The researcher performs analysis on the Array Data and the phenotypical data with software in a "Research Cloud" service. The output is a Copy Number Variation, a statistical analysis in a binary format, and a human readable report based on the statistical analysis. Both files are saved in a data vault. This can be a database or filesystem with protections. If the researcher wants to make this data available for future research, some form of research meta data is extracted and saved in a catalog. This whole image can be repeated for other kinds of research such as the Biomarker analysis.
4\. The researcher performs analysis on the Array data with software in a "Research Cloud" service. Specifically [Copy Number Variation](https://www.genome.gov/genetics-glossary/Copy-Number-Variation), a statistical analysis that counts the number of variations of a specific segment of DNA. The result is saved as a binary file, and a human readable report based on the statistical analysis and the phenotypical data. Both files are saved in a data vault. This can be a database or filesystem with protections. If the researcher wants to make this data available for future research, some form of research meta data is extracted and saved in a catalog. Above picture can be repeated for other kinds of research such as Biomarker analysis:
> biological measurements that can be used to predict risk of disease, to enable early detection of disease, to improve treatment selection and to monitor the outcome of therapeutic interventions:
[source](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3377087/)
<img src="eye_melanoma_5.png" alt="drawing" width="300" style="display:block;margin-left: auto;margin-right:auto;"/>
5\. The researcher sends the report to the Donor either by normal mail or email or other digital channels.
5\. The researcher sends the outcome of the research as a report to the Donor either by normal mail or email or other digital channels.
Now generally speaking the donor is happy when he gets the report in step 5. But what exactly happens with sensitive data during the steps? Is the agreement itself personal data? (1) How does the researcher store contact information? Is the Bio sample tagged with the name of the donor? How is the phenotype information stored? (2) Does the bio sample get destroyed after processing? Does the processor keep a copy of the array data? If so what are they going to use it for? (3) On what servers is the research service hosted? How is the array data transferred to the research service? Is the array data removed from the research service after the analysis? What meta data is extracted for the catalog. How is this information aggregated? What happens to the copies of the CNV file and the report? How are they transferred? (4) And finally, what happens with the contact information of the donor when the research is concluded? (5)
Now generally speaking the donor is happy when they get the report in step 5. But what exactly happens with sensitive data during the steps? Is the agreement itself personal data? (1) How does the researcher store contact information? Is the Bio sample tagged with the name of the donor? How is the phenotype information stored? (2) Does the bio sample get destroyed after processing? Does the processor keep a copy of the array data? If so what are they going to use it for? (3) On what servers is the research service hosted? How is the array data transferred to the research service? Is the array data removed from the research service after the analysis? What meta data is extracted for the catalog. How is this information aggregated? What happens to the copies of the CNV file and the report? How are they transferred? Who will be able to do predictions based on my biomarkers in the future? (4) And finally, what happens with the contact information of the donor when the research is concluded? (5)
It's easy to immediately think of all the things that can go wrong with sharing sensitive information from a privacy perspective. And it is clear that the necessary protections need to be put in place to ensure these concerns from an individual point of view are met.
......@@ -43,28 +53,37 @@ From the experts we partnered with in our team we quickly learned a couple of th
On the other side of the coin you get half of your DNA from each of your parents (and they from theirs), share most of it with any siblings, and you would pass on half of your DNA to any children you might have, so this data is also very communal by nature.
This paradox illustrates something valuable about data ownership. At least in this case there can not be such a thing as ownership. So I think we have to think about it a bit different. One of the alternative terms that we came up during the project with was data stewardship instead of ownership.
This paradox illustrates something valuable about data ownership. At least in this case of personal data the idea of ownership seems contested.
So I think we might have to think about it a bit different. One of the candidate terms that came up in discussions during the project was data stewardship instead of ownership but there was no consensus within the team. Another approach for future research could be to design a mechanism where next of kin are made part of (discussions during) the consent process.
I changed my mental model to view 'data ownership' along the following heuristic; The more data says something about you, the more you should have something to say about how it is handled and who gets access to it and under what terms sharing of that data would happen.
## Dynamic informed consent
The Geneconsent prototype is a demonstration of a system that tries to facilitate Dynamic-informed consent, but since this is pretty young, how do we understand the idea after doing this project?
The Geneconsent prototype is a demonstration of a system that tries to facilitate dynamic informed consent, but since this is a pretty young concept, how do we understand the idea after doing this project?
Clearly 'dynamic informed consent' is just 'informed consent' with the word dynamic put in front of it. But before we go to that word, we first need to understand informed consent.
The [iConsent guidelines](https://i-consentproject.eu/project-guidelines-now-available/) provides a great checklist about informed consent in a clinical context like our use case in Genecoop;
First we need to understand 'informed consent'. The [iConsent guidelines](https://i-consentproject.eu/project-guidelines-now-available/) provides a great checklist about informed consent in a clinical context like the eye melanoma use case;
<img src="Screenshot2022-04-04at153239.png" alt="drawing" width="450" style="display:block;margin-left: auto;margin-right:auto;"/>
To paraphrase the checklist, it seems wise to view informed consent as a co-creation process where the consent procedure is designed together with a group of people that are representative of the people who need to understand everything involved to a degree where the consent can be considered informed. We think that in the case of a data commons the 'I' in the acronym also means being complete and clear about what the consequences of a consent mean in terms of data storage, access, handling, sharing and so forth.
Now that we have a better understanding of informed consent we can continue unpacking. With 'dynamic' we mean a mechanism that enables responsible sharing and reuse of data where the people who are subject of that data are kept in the loop. Researchers can look for data that was already collected in earlier studies, but a mechanism always provides a recurring informed consent when appropriate.
Adding the word dynamic gives us 'dynamic-informed' consent as described in [Dynamic-informed consent: A potential solution for ethical dilemmas in population sequencing initiatives](https://www.sciencedirect.com/science/article/pii/S2001037019304969).
It is a process of keeping participants informed before, during and after the research is conducted. The linked article is an overview of possible requirements that can help design such a process that are classified in three categories (dynamic permissions, dynamic education and dynamic preferences).
The parts that we researched during the prototype were mainly focused on enabling 'dynamic permissions' in the consent service. We always had in the back of our minds a mechanism that enables responsible sharing and reuse of data by keeping the subjects in the loop after they give their consent.
Researchers can discover data that was already collected in earlier studies, but the mechanism can facilitate a recurring informed consent before providing access to it.
<img src="dynamic_informed_consent.png" alt="drawing" width="500" style="display:block;margin-left: auto;margin-right:auto;"/>
With this mechanism someone does not have to consider all future possible consequences of donating 'their' data to the public cause with a blanket consent, but only for the case at hand in the here and now, one at a time.
Not only will sharing sensitive information become safer, but control over what happens with data about you is back in your hands. And hopefully it will be more attractive to share sensitive information for public causes because there is a higher level of trust by knowing that you will be informed and asked for consent in the future.
We also did some work in the category 'dynamic preferences' by trying to design a consent form that allows for more granular options of consent, and providing an explanation and purpose for each of the research options (dynamic education).
And we integrated a summary in the consent form that lists all the consequences of checked options, which can be seen as a form of dynamic assessment.
Dynamic informed consent has the potential to make sharing of sensitive information become safer, and to give back control over what happens with data about you. And hopefully means that it will become more attractive to share sensitive information for public causes because there is a higher level of trust by knowing that you will be informed and asked for consent in the future.
## Datacommons for genetic data
......@@ -80,7 +99,7 @@ Depending on the conditions of consent, legally speaking many of this data can n
What we do want is to create a distributed network of datasources where all parties involved can responsibly share, create, find and reuse data, knowing that they will stay appropriately informed every step of the way. We see a consent service as one of the components in an ecosystem where responsible sharing of (research) data according to the FAIR principles is enabled. Fair stands for Findability, Accessibility, Interoperability and Re-use. For more information read the [FAIR principles](https://www.go-fair.org/fair-principles).
Other future compatible components could be personal/cooperative data vaults, specialised in storage of sensitive data, on demand research clouds or services, where data can be analysed, manipulated and derived in ephemeral environments (these only exist for the duration of an operation on data). And catalogs where indexed, aggregated and or anonimized data may be discovered for responsible reuse.
Other future components in this network could be personal/cooperative data vaults, specialised in storage of sensitive data, on demand research clouds or services, where data can be analysed, manipulated and derived in trusted computing environments. These environments can for example just exist for the duration of the operation on the data, a copy data never has to leave the enviroment. Another future component is a catalog where indexed, aggregated and/or anonimized data may be discovered for responsible reuse.
The consent service is functionally isolated from these other places where data is stored, manipulated and can be discovered. The reasons for this are three-fold.
The first is that the service doesn't need to know anything about the donor itself, minimising the data that we collect, this way the service automatically complies with the GDPR.
......@@ -187,7 +206,7 @@ Although we think it is very nice that the complete information that is presente
As mentioned earlier the end goal is to create a distributed network of datasources where all parties involved can responsibly share, create, find and reuse according to FAIR principles.
The demonstrator doesn't yet show any facilities for finding or discovering (new) data. But we do have ideas about this that we want to develop further. We want to design and develop a decentralized distributed catalog using semantic web technology similar to what we did with the Verifiable Credentials for the consent. This way the data can stay in all it's different shapes and forms and places, but it is necessary to syndicate uniform metadata about everything that is available from these sources.
And perhaps the most challenging technical part is to find technology to use for the decentralized distributed data storage especially when it comes to 'raw' genetic data such as the case with Genecoop, because it needs the highest levels of protection.
And perhaps the most challenging technical part is to find technology to use for the decentralized distributed data storage especially when it comes to 'raw' genetic data, because it needs the highest levels of protection.
## Acknowledgements
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment