PROTECTING DATA IN THE INFORMATION AGE: WE TRY TO DECIPHER PSEUDONYMISATION AND ANONYMISATION

In a digital age where the daily collection and management of vast amounts of personal data is intertwined with growing and justified privacy and security concerns, pseudonymisation and anonymisation emerge as key tools. The proper adoption of these techniques not only safeguards data, but above all protects fundamental rights such as the privacy and security of the individuals involved, while constantly maintaining the delicate balance between the proper use of data and principles of integrity; while distinct in their application and impact, they share the common goal of protecting personal information from misuse and unauthorized access.

Pseudonymisation, defined in the European Union’s GDPR, involves the adoption of technical solutions aimed at replacing direct identifiers to prevent, without further information, the direct identification of the individual. These types of solutions, when applied correctly, offer an effective balance between the need to protect personal data and the possibility of using it for legitimate purposes, such as analysis and research.

On the other hand, anonymisation permanently and irreversibly removes any identifying trace from the data, turning it into non-personal information. This process, if done correctly, can offer greater freedom in the processing and sharing of data.

In this article, we will explore these two techniques, analyzing them mainly from a design and technology integration point of view. Through some examples, analyzing best practices and also evaluating considerations involving ethical aspects, it will be possible to understand how pseudonymisation and anonymisation, if effectively designed and implemented, can significantly contribute to the correct and responsible use of data in the information age.

Insight into Pseudonymisation

Pseudonymisation is a technical process that aims to reduce the risks associated with the processing of personal data by masking out identifiers that could lead back to a specific individual. This technique differs from anonymisation in that, with additional tools and techniques, it may be possible to link the data back to the original person. In pseudonymisation, clear identifiers such as names, addresses or identification numbers are replaced with codes or pseudonyms.

There are several methodologies for implementing pseudonymisation. For example: 

  • Cryptographic hashing transforms data into a seemingly random, but constant string of text for the same input, making it difficult, but not impossible, to trace back to the original information without the hash key.
  • Encryption applies algorithms to transform data into a form readable only by those who possess the decryption key.
  • Tokenization, another popular technique, replaces sensitive data with tokens that cannot be directly traced back to the original information, but can be mapped to it via a secure token management system.

The main advantage of pseudonymisation is its ability to maintain a certain usefulness of the data for analysis and research, while protecting the identity of individuals. This is particularly relevant in contexts such as healthcare or market research, where data must be used for specific purposes, but the privacy of individuals must be strictly safeguarded.

Pseudonymisation, however, is not without its challenges. The main one is ensuring that the pseudonymisation process is robust enough to prevent re-identification, especially in the presence of other data that, if combined, could reveal an individual’s identity. Therefore, the choice of techniques and their implementation must be carefully evaluated and monitored to ensure an adequate level of data security and compliance with applicable directives.

Insight into Anonymization

Anonymisation is the process that renders personal data completely unrecognizable, eliminating any possible link with an individual’s identity. Unlike pseudonymisation, anonymisation is an irreversible process: once the data has been anonymised, it is no longer possible to trace the identity of the individual. 

Technically, anonymisation can be implemented through various methodologies, each with its own specific approach and level of effectiveness:

  • Removal of Identifying Information: removal of data that can be directly linked to the individual, such as names, addresses, telephone numbers or identities. This is the most direct method, but requires care to avoid leaving data that could be combined to identify the individual.
  • Statistical bias: slight modification of data to prevent direct association with an individual. Used in statistical analysis where absolute precision is not critical.
  • Randomisation: introduction of an element of randomness into the data. It helps to mask patterns that could lead to identification.

Anonymisation is particularly important in areas where privacy is of utmost importance, such as in medical research, where patient data must be used for study purposes without compromising their identity in any way. An effective anonymisation process ensures that data can be used safely and responsibly, significantly reducing the risks of privacy and data security breaches.

Comparison and contexts of use

The choice between pseudonymisation and anonymisation depends on the specific context and objectives of data use. 

Pseudonymisation is preferable when a balance is required between privacy and usefulness of the data, situations where the data must be used for analytical or research purposes, while maintaining a certain degree of traceability for verification or updating purposes. 

Anonymisation, on the other hand, is the ideal choice in contexts where it is not necessary or desirable to maintain any link between the data and the individual, such as in the case of data publications for public use or large-scale studies requiring maximum privacy protection. 

This decision implies a careful assessment of the risks, needs and applicable regulations.

Best Practices

Effectively designing and implementing pseudonymisation and anonymisation techniques requires not only the adoption of best practices that are constantly evolving with the technological state of the art, but also consideration and awareness of the relevant ethical and regulatory aspects. 

It is essential to conduct regular risk assessments to identify potential vulnerabilities. The choice of appropriate techniques must be based on the sensitivity of the data and the context of use. In addition, a continuous review of data security strategies is essential to respond to new threats and technological developments. 

Organizations must ensure that consent for the use of data is respected and that the rights and privacy of individuals are constantly safeguarded. Transparency in data management policies and accountability to stakeholders are crucial to build trust and ensure compliance with ethical standards and principles.

Real-world applications

Case studies in the field of healthcare offer concrete and easier-to-understand examples of the importance of pseudonymisation and anonymisation. 

For instance, patient data used in clinical research are often pseudonymised to protect their identity, while allowing the analysis of treatments and health trends.

In academia, however, researchers use anonymised data for large-scale studies, ensuring that personal information cannot be linked back to individuals. 

This possibility of a dualistic approach demonstrates how personal data protection can be effectively integrated into practices that are important for sustainable social and scientific progress, protecting the privacy of the individuals involved while maintaining the usefulness of the data for legitimate purposes.

Final considerations

In conclusion, conscious or not, we are all living through what many call the fourth industrial revolution, characterized by unprecedented advances in data management and analysis, an era in which techniques such as pseudonymisation and anonymisation are emerging as indispensable tools. 

These techniques not only help comply with privacy regulations but also promote a data security conscious culture, which is essential for building public trust in the responsible use of personal information. 

A thorough understanding and proper application of these strategies are indispensable aspects for any organization handling personal data, to ensure effective protection and respect for the rights and ethics of confidentiality and integrity.

Pseudonymisation and anonymisation: the blurred line between personal and non-personal data

In the context of the General Data Protection Regulation (GDPR), Article 4(5) defines pseudonymisation as the processing of personal data in such a way that it can no longer be attributed to a specific data subject without the use of additional information. It is essential to note that this additional information must be stored separately and subject to technical and organizational measures to ensure that such personal data is not attributed to an identified or identifiable natural person.

Contrary to a common perception, pseudonymisation should not be regarded solely as a technological aspect, but rather as an operational and organizational strategy. In fact, the GDPR, in recital 29, recognises the possibility of pseudonymisation measures with the capacity for general analysis within the same controller, provided that the necessary technical and organizational measures are taken and that the additional information for attributing personal data to a specific data subject is stored separately.

Conceptual and Legal Foundations of Pseudonymisation and Anonymisation

The conceptual elaboration reveals that pseudonymisation is not an isolated concept, but rather an integral part of an orchestral complex of measures aimed, on the one hand, at protecting the data of the data subject and, on the other hand, at facilitating the circulation of data by safeguarding compliance with data protection obligations by data controllers.

In this context, discerning between pseudonymisation and anonymisation is of crucial importance. In short, while pseudonymisation allows the information to be reconstructed, anonymisation renders the data unconstructable.  This principle is clearly stated in Recital 26, which excludes the application of data protection principles to anonymous information, i.e. information that does not relate to an identified or identifiable natural person or to personal data rendered sufficiently anonymous to prevent or no longer allow the identification of the data subject.

But how do we determine whether a piece of data is pseudonymous or anonymous? Here again, we are helped by recital 26 of the GDPR, which states that to establish the identifiability of a person, account should be taken of all the means, such as identification, which the controller or a third party may reasonably use to identify that natural person directly or indirectly. 

Judgment T-557-20 of the European Court of First Instance on Pseudonymisation and Anonymisation of Data

The recent judgment delivered by the European General Court on 26 April 2023, in the context of Case T-557-20, represents a significant milestone in the legal understanding of anonymisation and pseudonymisation practices. Moving away from the previous orientation of the Article 29 Working Party (now replaced by the European Data Protection Board), which postulated a more restrictive approach, the General Court adopted a more nuanced and relativist perspective.

The Court’s decision emphasized the need to carefully consider the specific circumstances when assessing the identifiability of data. In the present case, concerning the transmission of shareholder and creditor comments by the Single Resolution Committee (CRU) to third parties, the General Court rejected the idea that the possibility of automatic re-identification qualifies the data as personal. In particular, the General Court concluded that, despite the fact that the CRU had access to additional data for identification purposes, the transmitted comments and alphanumeric codes had to be qualified as anonymous data by consistently applying a principle that is contained in Recital 26 of the GDPR and Recital 16 of Regulation 1725/18 such that if personal data have been rendered sufficiently anonymous that the data subject cannot or can no longer be identified, data protection principles do not apply.

This change of course represents a significant departure from previous restrictive interpretations, emphasising the need to carefully assess the actual identifiability of data in specific contexts. The European Court’s ruling has significantly influenced the legal landscape with regard to anonymisation and pseudonymisation techniques, raising crucial questions about the practical application of these concepts in the current regulatory context.

Conclusions and Key Role of Pseudonymisation and Anonymisation Techniques

In conclusion, the proper implementation of pseudonymisation and anonymisation techniques is imperative to ensure user privacy, especially in sensitive sectors such as health and finance. The technologies used must comply with legal principles, and the choice between pseudonymisation and anonymisation should be guided by specific needs and the required reversibility. A thorough understanding of these concepts and their accurate implementation are crucial to address the legal and regulatory challenges related to the protection of personal data.

In this context, the ruling of the European Court of First Instance not only provides a crucial clarification of the distinction between anonymous and pseudonymous data, but also raises important reflections on the future of data protection practices. The decision emphasizes the importance of taking a contextual and circumstantial approach when assessing the anonymisation and pseudonymisation of data. It defines that, in order to determine whether information constitutes personal data, it is necessary to put oneself from the perspective of the recipient, assessing whether the possibility of combining the information transmitted with any additional information held by the third party is a reasonably feasible means of identifying data subjects.

This new orientation of the Luxembourg courts may influence the way organizations implement data protection measures. A careful analysis of the specific circumstances therefore becomes crucial to determine whether data can indeed be considered anonymous, even when they are associated with alphanumeric codes or other identifiers.

Compensation for Damages for Unlawful Processing of Personal Data

The judgment of the Court of Cassation, Cass. civ., Sec. I, Ord. 12-05-2023, No. 13073, addresses a case in which a municipality was ordered to compensate damages caused to an employee as a result of unlawful processing of her personal data. This judgment raises important questions regarding compensation for damages resulting from breaches of data protection regulations, in particular Regulation (EU) 2016/679, known as GDPR.

The Case

In the case at hand, the municipality had accidentally published on its institutional website a determination regarding the garnishment for a certain amount of a municipal employee’s salary, thus violating the data protection rules of the GDPR. Upon discovering the error, the municipality had admitted that the disclosure of the data had occurred accidentally, and promptly took steps to remove the data in little more than 24 hours.

Nevertheless, the Court of First Instance had found that the municipality was liable and ordered it to pay damages. The Court of Appeal upheld that judgment, which, in turn, was appealed by the municipality before the Supreme Court.

The Supreme Court’s ruling, rejecting the Municipality’s petition, emphasised that the non-pecuniary damage that can be compensated in cases of personal data breaches is determined by the infringement of the fundamental right to the protection of personal data, enshrined both in the Constitution and in the GDPR. Recalling that the GDPR, in Article 82, states that anyone who suffers material or immaterial damage caused by a breach of the provisions of the regulation has the right to obtain compensation for the damage from the data controller or processor.

The Legal Change

Prior to the entry into force of Regulation (EU) 2016/679, in our legal system, the issue of civil liability arising from the unlawful processing of personal data found its regulation in Article 15 of Legislative Decree No. 196 of 30 June 2003 (Personal Data Protection Code). This stipulated that anyone who caused damage to others due to the processing of personal data had to pay compensation pursuant to Article 2050 of the Civil Code. Non-pecuniary damage was also compensable in the event of a breach of Article 11.

With the entry into force of the GDPR, the legislation has changed, introducing more uniform rules for liability in case of unlawful processing of personal data. The new legislation stipulates that anyone who suffers material or immaterial damage caused by a breach of the regulation has the right to obtain compensation from the data controller or processor. However, these entities may be exempted from liability if they prove that the damaging event is not attributable to them “in any way.”

The Responsibility of the Controller vs. the Responsible Party

The liability of the owner and the liability of the liable party arise from different facts. The data controller is the one who determines the purposes and means of the processing and is liable for the damage caused by his processing that violates the regulation. Moreover, according to the ermellini’s maxim, ‘the data controller is always obliged to compensate for the damage caused to a person by a processing that does not comply with the regulation itself, and may be exonerated from liability not simply if he has taken action (as is his duty) to remove the unlawfully exposed data, but only ‘if he proves that the damaging event is in no way attributable to him’.

The data controller, on the other hand, processes personal data on behalf of the data controller and is liable only if he has not fulfilled the obligations of the regulation specifically addressed to data controllers or has acted contrary to the instructions of the data controller.

The Seriousness of the Damage

As regards compensation for non-pecuniary damage resulting from an infringement of the fundamental right to the protection of personal data, the conditions of the seriousness of the injury and the seriousness of the damage must be met. The violation of data protection requirements may be considered unjustifiable, and therefore compensable, only if it has appreciably offended the scope of the right itself. Therefore, the mere violation of the formal prescriptions on the processing of data may not give rise to damage, whereas a violation that concretely offends the actual scope of the right to privacy always leads to compensation.

The burden of proof for proving non-pecuniary damage is on the injured party, while the data controller must prove that it has taken adequate measures to avoid the damage.

The Principle of Accountability

The entry into force of the GDPR introduced the principle of accountability, which requires the data controller to take responsibility for striking a balance between opposing interests, with full autonomy of judgement. Accountability requires the controller to modulate the concrete implementation of the principles enshrined in the legislation, in the abstract, and to document how it has implemented the regulatory provisions.

In conclusion, Regulation (EU) 2016/679 has redefined the legal framework for the processing of personal data, introducing more uniform rules on responsibility and accountability. These regulations place significant emphasis on the protection of personal data and compensation for damages in case of breaches. The Supreme Court’s ruling reinforces the importance of these rules and the need for organisations to comply with them in order to avoid litigation and damages. The protection of personal data is a crucial issue in today’s digital society and requires attention and compliance from all actors involved.

The Italian Data Protection Authority sanctions web scraping: the case of the portal Trovanumeri.com

The Garante Privacy recently banned web scraping and sanctioned the portal Trovanumeri.com for raking up online users in order to create lists. These violations involved as many as 26 million users, causing great concern for the protection of personal data. Concerns that culminated in a measure issued on 17 May by the Garante, which prohibited the website operator from creating and disseminating a telephone directory obtained through web scraping, a technique that consists in extracting data from one or more websites using special software programmes.

THE ISSUE

In this particular case, numerous reports were submitted to the Garante Privacy concerning the unauthorised publication of names, addresses and telephone numbers of individuals without their consent. Moreover, according to the reports, in some cases, the publication also concerned personal data of persons who had special confidentiality requirements concerning their telephone number and home address: some complainants had in fact represented that they were holders of confidential telephone numbers, i.e. not published in the general telephone directory.

Finally, several subjects complained that no indication (not even the information required by law) of the owner of the site could be found in the website and in the brief privacy policy published therein, thus making it impossible to identify the data controller.

THE VIOLATIONS 

Dissemination of personal data in the absence of an appropriate legal basis and processing in breach of the law

The processing consisting in the de facto creation of a telephone directory was deemed by the Data Protection Authority to be in breach of the law, resulting in the dissemination of personal data on the Internet in the absence of a suitable legal basis. It is important to emphasise that it is not legitimate to form a telephone directory, whether online or on paper, with data that are not taken from authorised sources, such as telephone operators’ databases. Only such a source can guarantee the correctness and up-to-dateness of the data, as well as document the willingness of those concerned to make them public.

Investigations revealed that the trovanumeri.com website also made reverse search available, but did not allow users to give free and specific consent for this functionality. The consent flag was in fact pre-selected and not modifiable, thus violating the requirements of the law in force.

It is also important to emphasise that the owner of the site had stated that the data on its websites had been collected through autonomous user input or through web scraping, i.e. through an automated process of searching for personal data on the web. This technique, however, had already been deemed unlawful by the Data Protection Authority in a ruling sanctioning the unlawfulness of the use of data collected through web scraping for purposes incompatible with the original purpose. Therefore, data acquired and processed without the consent of the data subjects and without a valid legal basis constitute a breach of privacy law.

Failure to respect data subjects’ rights, inadequate information and absence of safeguards

The reports received highlighted not only the unauthorised dissemination of data, but also the impossibility for data subjects to exercise their right to erasure and, potentially, other data protection rights. In fact, the website did not contain any information on the data controller and no contact channels with the data controller were available. 

Non-compliance with the processing Injunction

Finally, despite the prohibition ordered by the Garante Privacy, the Trovanumeri.com portal continued to operate and make available online numerous personal data. This non-compliance with the ban was further challenged as a breach of the provisions of the regulator.

CONCLUSIONS AND CORRECTIVE MEASURES TAKEN

The processing of personal data by Trovanumeri.com was found to be unlawful and to have numerous profiles of illegality. Even if some of the violations can be corrected, the main violation concerning the absence of an appropriate legal basis is sufficient to invalidate the entire processing. Therefore, the corrective measures taken must address the underlying issue and ensure that personal data are processed in compliance with privacy legislation.

In conclusion, the Trovanumeri.com portal case highlighted the importance of personal data protection and the negative consequences of unauthorised web scraping. The Garante Privacy has adopted sanctioning measures to ensure that users’ rights are respected and that data are processed in compliance with the law. This case is a reminder to companies and websites that process personal data, underlining the importance of regulatory compliance and respect for users’ privacy.

The block (and unblocking) of ChatGPT in Italy: causes, changes and solutions adopted.

ChatGPT is a language model developed by OpenAI based on the GPT-4 architecture. It is designed to understand and generate text in a similar way to humans, making it possible to create smooth and coherent conversations. However, on 30 March 2023, the use of ChatGPT was blocked in Italy due to concerns about user privacy and data protection. In this article, we will explore the reasons for the block, the changes requested by the Garante Privacy to OpenAI and the solutions that have been implemented to solve the problem and protect the privacy of Italian citizens.

The ChatGPT blockade in Italy

The blocking of ChatGPT in Italy, self-imposed by OpenAI itself, had been caused by a measure of the Garante (Italian Data Protection Authority) that had ordered the platform to temporarily restrict the processing of Italian users’ data until it complied with Italian and European privacy regulations. The Garante, in an emergency measure, had found that the use of ChatGPT could violate privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR), which provides for strict protection of individuals’ personal data.

The reason for the blockade

In its decision of 30 March, the Garante per la Protezione dei Dati Personali had identified several reasons for concern regarding the use of ChatGPT in the country. Among these, the main ones were:

  • the lack of information to users and all stakeholders whose data are collected by OpenAI, 
  • the absence of a legal basis justifying the massive collection and storage of personal data for the purpose of ‘training’ the algorithms underlying the operation of the platform;
  • incorrect processing of personal data due to the plaintiff’s inaccurate information provided by ChatGPT 
  • the absence of any filter for verifying the age of users, which exposed minors to answers that were totally unsuited to their level of development and self-awareness.

Required changes to OpenAI

To address these concerns, the Garante requested OpenAI to make a number of changes and interventions to the platform on which ChatGPT operates in order to ensure greater protection of users’ privacy. Among the main changes, the Garante requested to:

  1. Set up an information notice on the site to explain data processing and the rights of data subjects, including non-users of ChatGPT.
  2. Provide a tool to exercise the right to object to the processing of data for algorithm training.
  3. Allow the correction or deletion of inaccurate personal data through a tool on the site.
  4. Insert a link to the information during registration, visible before completing the process.
  5. Change the legal basis of data processing for algorithm training from contract to consent or legitimate interest.
  6. Provide a means to exercise the right to object to the processing of data for algorithm training, if based on legitimate interest.
  7. Implement an age gate for Italian users, excluding minors.
  8. Submit a plan to the Supervisor for the adoption of age verification tools by 31 May 2023, with implementation by 30 September 2023.
  9. Promote an information campaign by 15 May 2023, agreed with the Garante, to inform about data collection and the tools available to delete personal data.

Changes implemented by OpenAI

In response to the Garante’s requests, OpenAI implemented a number of changes to ChatGPT to ensure greater privacy protection for Italian users. Among the main changes adopted are:

  1. The provision of information accessible to both European and non-European users and non-users concerning the processing of personal data for algorithm training and the right to object to such processing.
  2. The expansion of the data processing information for users by making it accessible in the registration mask before a user registers for the service.
  3. The right to object to the processing of personal data for algorithm training can also be exercised by non-users resident in Europe by providing an easily accessible, online form.
  4. The introduction of a welcome screen when ChatGPT is reactivated in Italy, with references to the new privacy policy and how personal data are processed for algorithm training.
  5. The provision was made for those concerned to have any information they considered to be incorrect deleted. In addition, however, OpenAI declared itself technically unable to correct the errors.
  6. Explaining, in the user information, the legal basis for the processing of personal data for algorithm training and the proper functioning of the service.
  7. The implementation of a form allowing all European users to exercise their right to object to the processing of their personal data and thus be able to exclude conversations and their history from the training of their algorithms.
  8. The inclusion in the welcome screen reserved for Italian users who are already registered a button through which, in order to re-access the service, they will have to declare that they are of age or over 13 and, in this case, have parental consent.
  9. Inclusion of the date of birth request in the service registration mask, with a block on registration for users under 13 years of age and the need to confirm parental consent for users over 13 years of age but under 18.

The above actions were welcomed by the Garante, which suspended the personal data processing restriction order against OpenAI and, at the same time, reopened the platform to Italian users.