Pseudonymisation in clinical trials: dissociation vs encryption

Discover the advantages and inconveniences of these two pseudonymisation methods used in clinical research.

I am sure you know this already:

Data from participants in clinical studies is considered sensitive personal data. Sponsors and sites participating in such studies contract legal obligations when collecting and managing that personal information.

Some of the most known regulations might ring a bell: GDPR in the European Union or HIPAA in the United States, but there are many others, and though there are differences among them, they also agree on many issues.

Let me put forward that my intention is not to deeply elaborate on laws and legal issues about these norms, there are a lot of websites you can visit for that.

My goal here is to explain, practically, the two security methods applied in clinical studies regarding identifiable personal data, which means, those data that allow a connection between the clinical data and the person.

Let’s explain and compare the two different ways of unrelating the identifiable data from the clinical data.

What do we mean by pseudonymization?

We have “borrowed” the definition of this term from the GDPR of the EU:

“the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;” (Article 4.5)

Or put in simple language and considering we are talking about clinical trials, it means:

“to accomplish pseudonymisation, clinical data of a person can not be stored with the data which identifies the participant”

Even simpler: the official identification number of a participant (or any other identifiable data: name, last name, social security number…etc.) can not be stored together with the clinical data of the participant.

Pseudonymization can be accomplished in two different ways, and we will explain those two that can be used in clinical trials and studies besides explaining the pros and cons of both of them.

Dissociation: Separation of the identifiable information of a person. Encryption: Encryption of the identifiable information of the person. Encryption: Encryption of the identifiable information of the person.

Dissociation: Separation of the identifiable information of a person

This is by far the most used method in clinical trials to accomplish pseudonymization. It is the most used, among other things, because it is technically the easiest to accomplish and it offers a high degree of security.

The method of dissociation is based on separating clinical data from the identifiable data of the participant.

The goal of this method is to prevent the identification in the EDC of the person to which the clinical data belongs to. A very simple example (very simple actually ;-)) of a dissociated database would be something like this:

The ID EDC field is an auto generated field when a new record of a participant is created and is therefore impossible to, with just this data, identify the person to whom the clinical data belongs to.

In most clinical studies it is necessary to identify participants within the EDC for tasks such as: adding new information in future visits or validation of source data during monitoring. These are some of the examples in which it is necessary to perform this identification.

But, how is this identification possible when there is no identifiable data?

To achieve this a system is used in which the identifiable data of the person is separated from the clinical data.

To that end each site counts with an alternative database (in a system different from the clinical database of the study) where the identification of the participants is stored and related to the internal ID associated with the participant. The column “ID EDC” will be the key element that will relate both systems in order to identify the person in the EDC.

The external system can come in different formats: paper, spreadsheets or the information could even be kept in the clinical history of the patient in each site. Ultimately, the important thing is that the personal information (ID number, Social Security number) and the internal identificator in the EDC will be stored somewhere outside of the EDC.

Following on the previous example, we would have another database different from the EDC with these data:

In this way the clinical data would be separated from the identifiable data of the person. If we want to access the clinical data of a specific participant we only need to consult the external system to know the associated ID EDC. Once we have the value of this field we can go over to the clinical database and access the data for this participant.

According to the previous example, we could say that “John Mars” is patient number 01-001 in the EDC. With this information we could access the EDC and find the participant with the number 01-001 and localize the clinical information of John Mars in the EDC.

Advantages_

Easy to run, we need only to store the information that allows us to identify the participant in a different system.
In case of unauthorized access to the EDC, the identification of the patients will not be possible without the information of the external system.

Disadvantages

Two databases, that need to be managed.
Slowness when finding a patient’s number within the EDC, since it is necessary to consult an external system to know the ID in the EDC.
Greater risk of error, because since we are looking in the EDC by the ID found in the external system, a mixup of the ID can result in the modification of a different patient’s data.

Encryption of the identifiable personal data.

There is an alternative way of keeping the security of the data without having to suffer the disadvantages of a dissociated system.

The solution is based on having all the data in the EDC, including personal datas in a way that everything is managed in a single system.

As you probably noticed this is exactly what we were trying to avoid with the previous method: to keep personal data separated from clinical data.

So?

The difference is based on encrypting the identifiable data.

What does it mean to encrypt the identifiable data of a person and what is good for?

This means that the EDC has the power to convert the identifiable data in a string of unrecognizable characters. To that end, an encryption key is used in the encryption and decryption processes.

It is important that the encryption key will not be stored with the EDC data, but in a different system external to the EDC. This way pseudonymization is achieved, since in order to identify the clinical data the encryption key, stored in a different system, is necessary.

Continuing with the previous example, we would have to encrypt the fields “Name” and “Personal ID” that identify the person associated with the clinical data:

Once the fields that allow the identification of the person (marked in red) are encrypted in the database, it is no longer possible to identify the person without the encryption key.

Advantages

Only one database is necessary for all the information.
Easy to use and time efficient, since a person can be looked for directly in EDC.
Less mistakes, since the ID EDC is not necessary to find a person.

Disadvantages

It is necessary that EDC incorporates an integrated secure field encryption system and the encryption must not be stored in the same system as the clinical data.
Data protection is lower if the EDC allows the export of identifiable data, for this reason it is important that the EDC allows the exclusion of these fields in the export processes and/or limit the access to these fields only to authorized users.

Conclusions

Dissociation is generally a safer method than encryption, but encryption allows for a faster and easier to use system that helps minimizing errors in clinical studies.

It is, however, important to remark that encryption can reach similar levels of security to those of dissociation as long as the system is properly configured and designed.

Encryption can be used not just in direct identifiable fields but also in indirect identifiable fields, which means fields that would allow to indirectly identify the participants.

ShareCRF allows the use of both methods of pseudonymization in clinical studies and allows the use of the encryption method in a secure way, since any desired fields can be excluded from export and the roles and users with access to these identifiables fields can be defined.

With ShareCRF it would be possible to configure a study with encrypted identifiable fields that are also excluded from exports and to which access is only granted to the staff of the site to which the patient is associated. The level of security for this system would be slightly lower than for a dissociated system but in return would not have the disadvantages of a dissociated system: two different systems, slow process of identification of the patient to its clinical data and an increased risk of error in the data input.

If you need an EDC for your clinical study or trial, request a free demo.