Complying with GDPR

In this article we discuss methods of complying with GDPR, when designing new software.

As we saw in the previous article on GDPR, there are some main elements to GDPR we need to be aware of.

Transparency for data subjects.

Users should be aware of what data you are collecting, why you are collecting it, who has access to the data, and how long you will keep the data for.

You should also be able to demonstrate you have a right to collect and process this information.
Active User Consent

Users should actively agree to any storage and processing of PII.
Privacy by design / default

This means that the software should make every effort to be designed in a secure way (obviously, we cannot count for every situation, by their nature things like zero day exploits cannot be accounted for). However, you should be making use of technology like encryption, and having appropriate levels of access, to avoid unauthorised users accessing the data.

Identifying and Storing Personally Identifiable Information

The core of GDPR is Personally Identifiable Information (PII) about people. The legislation requires that you take reasonable steps to protect any PII the system stores.

The first stage it to identify any PII that your application may store.

There are two classes of PII:

Data that can be used to uniquely identify a person (such as email address, or National Insurance Number), and information directly connected to it (such as an users purchase history). In terms of technology, it also includes IP addresses, and Geo-location.
protected (or sensitive) information, such as medial data, religion or data collected from those below the age of consent (13 in the UK) Note that this may vary between member states, for example Ireland has the age of consent as 16. There are stronger rules on how this data can be processed.

If your program stores any data related to a user of their actions, its likely to be PII, and you should consider how you both store and process this data. As we saw from the categories of PII above, its not just things like credit-card details we need to protect, but even details like email-addresses used for sign up.

Our next question is Do I need to store this information?.

The concept of data minimisation runs throughout GDPR. Its good sense to only collect information that is needed to complete the task. While the use of e-mail addresses may be unavoidable for authentication purposes. Other information may be un-necessary for the system to function. Pay special attention to protected data Does your messaging application need to collect demographic information like date of birth?

Example

While it might be useful to keep details of device-types used to access a system, is it strictly necessary to link it to a specific user? Aggregated, or anonymity data on the number of devices of a specific type accessing your service, provides much the same information without needing PII.

Another consideration is limiting access to PII to those that need it for their job. (We will cover this more in Limiting Access below). The vast majority of a program can function without needing to access the actual customer data.

TL/DR

When it comes to storing PII:

Store the Minimum Data amount of data possible
Document WHAT the data is used for
Document WHO will be using the data
Consider how you could get similar results without the personal data.

Storing PII

We also need to consider where the data is stored, especially in the case of cloud services, where processing may happen outside of the EU. Not all countries have equivalent rules for data storage and processing. you are required to demonstrate the GDPR compliance of the 3^rd party.

This doesn't mean that using cloud services to store the data is a bad idea. Things like AWS don't dick around when it comes to the security of their infrastructure. However, it does mean that you would need to do appropriate due diligence on the cloud provider, to ensure they do comply with GDPR requirements.

When it comes to storing data, there are a couple of technical approaches that can be taken to limit the effects if a breach happens.

pseudonymisation

Where PII is replaced with artificial identifiers to conceal the data subject it belongs to. Essentially we store PII in a separate database, and refer to that if we need the information.

A lot of this is good sense (and sensible database design), using foreign keys in normalised tables to link information is nothing new. Breaking a link between PII, and other data makes sense from an efficiency standpoint.

However, there are limits to this approach. Firstly, it can be possible to identify an individual from a collection of sources. For example, gaining access to age-band, ethnicity, gender and postcode (Similar to the "Anonymised NHS data the government released), Given that a postcode covers a small number of properties, its not difficult to reduce the number of people the data may relate to. (There is a really nice example of researchers identifying Netflix customers via their IMDB ratings¹)

Additionally, it would rely on keeping the data separate. If someone has access to your entire database, then they are going to be able to reconstruct the data in the same way your systems do.
- Encryption
Encryption is where we encode data using an algorithm, and a "key" (more on this in a couple of weeks). If done correctly, this means that only people with the correct key to the data can access it.

Data that is transmitted between systems should be encrypted. This can reduce the impact of eavesdropping on data in transit. (I cant think of a situation where this is not a good idea)

When it comes to storing data, there are no requirements for encryption in GDPR.

However, Encrypting data can also help in the case of a data breach. Without the correct method to decrypt the data, then the information is useless.

Note

Under Article 4 of GDPR processing is defined as "any operation or set of operations that is performed on personal data" In my book this would cover encryption.

(I still haven't worked out what this means, do I have to consent to have my information stored in a safe way? How does that fit with the whole No to any processing. Tell me your views on Aula)

TL/DR

When it comes to storing data:

Where will the data be stored?
- Do you need to check the location has equivalent data protection rights
- If you are using Cloud storage, what SLA do you have around data protection.
If you are transmitting data encrypt it
If you are storing data, consider encrypting it.

Supporting user Rights

Under GDPR, users have a set of rights:

right of access
right of rectification
right to erasure
right to restrict/object to processing
right of data portability
right to be notified of data breaches.

The amount of these rights we have to implement directly will depend on our Legal Basis. (For example, if we are using Consent, then the user has the right to ask us to stop processing the data)

Our first step comes back to being clear with the user. Regardless of your legal basis, you need to explain what data you collect and why you are collecting it.

If your basis is Consent, Have a mechanism that allows the user consent to any data processing and storage.

However, for the rest of the elements, GDPR does not require that these are "real-time". You have 30 days to respond to any request. Its also worth noting that the "Right to Erasure" is not absolute. Depending on the Legal Basis for collection and processing, there are different requirements for data retention.

There is an excellent article with examples of designing for informed consent.

TL/DR

Have clear information for the user explaining what information is collected and why
If you are using Consent as your legal basis, the user must actively consent to collection and processing of data
Have a mechanism for accessing and editing stored information. Even if you don't allow the user the view or edit their own data, you may need to do this as part of a request.

Limiting Access

Another thing you need to implement is limiting access to data. Here we need to consider the principles of Authentication and Authorisation

In this case we need to implement a robust authentication system to ensure that users are who they claim to be.

In terms of authorisation, we need to support different levels of access for users. This means that we can ensure that users can only access data they have a need to access. Here it is sensible to default to providing users the least privileges possible.

Personal information should only be available to those who need it to perform their task. For example, while it makes sense that people processing fanatical transactions (payroll) have access to the bank details there is no reason for anyone else to have them. It is a similar case with addresses, ethnicity etc. On the whole its likely that the vast majority of any system can be developed without needing any PII.

Warning

This is where it gets interesting.
Aside from the usual problems with setting up accounts "Temp given access to all data",
A high number of data breaches start internally, either through accident or malicious intent. (For example mass mailing revealing PII)

Additionally, we need to consider what happens in testing. If the test data-set makes use of de-anonymised data then the people testing the system are also data controllers.

So how do we implement this, as with so many things in this section, its down to the documentation. We should have a clear idea who needs access to what data, from this we can start to create roles that limit user to this access level.

It then becomes a reasonably simple process to add access levels to the software, based on these roles.

Note

We will come across the principle of least privilege next week when we talk about infrastructure security. Its much the same thing, limiting peoples access to data and systems to the minimum level they need to do their job.

TL/DR

Look at the PII that is required for someone to achieve their job.
Build sets of "Roles" with these access levels
Keep access to PII to a minimum.

Dealing with Third Party Software

Most applications are built using 3^rd party libraries. This gives us access to functionality that we don't have the time or expertise to put together ourselves (and it good programming practice to avoid the "Not Invented here" problem)

Note

Libraries are a constant source of problems, for example keeping them up to date or the dependency cascade issues that have effected node.js

However, its difficult to be efficient without using them.

The best we can do is to keep track of the libraries we use, (and their dependencies), and make sure that we keep them up to date, and are aware of any data they use.

However, while most stand-alone libraries may not be a problem when it comes to GDPR, anything that performs cloud based processing of the data could be an issue.

As data controllers we are responsible for "Any" processing of PII, whether we do it or a third party partner does it. Again, it depends on the types of data you are dealing with, if its not PII then there is no problem.

But if using cloud services to deal with data, it you will need to take the same steps as you would for your own data processing.

Have a Valid Legal basis for processing
Ensure the user can actively consent to this collection and processing
If consent is the legal basis, have a mechanism for informing all 3^rd parties of changes in user consent.

Google Analytics

Its hard to think of a situation where analyitics wont be used in a web app. It gives a huge number of benefits for maintenance, and future development of your site.

Understanding the types, and number of visits can help plan for expansion, the types of browser used can help plan for future development effort. However, by default the types of data (IP Address, Location, Browser) could be enough to identify an individual.

So we end up with three choices.

Stop using analytics, and lose the insight it gives us
Use analytics, but ask for user consent before we do.
Use a version of analytics that removes any possible PII

None of these is ideal, but personally, I would go for option 3. It gives the best balance between effort on our side, and useful information.

Advice on Analytics from Google

Summary

In this article we have looked at how we can start to implement GDPR compliment software.

Understanding the PII data that is collected, and minimising processing and access relies on good documentation.

Getting user data from anonymised sources Robust De-anonymization of Large Sparse Datasets ↩