Entries by Piercosma Bisconti

The ethical and privacy issues of data augmentation in the medical field

The ethical issues arising from the use of data augmentation, or synthetic data generation, in the field of medicine are increasingly evident. This technique, which is also called synthetic data generation, is a process in which artificial data are created in order to enrich a starting data set or to overcome certain limitations. This type of technology is particularly used when AI models have to be trained for the recognition of rare diseases, on which there is little data available for training. By means of data augmentation, further data can be artificially added, while still remaining representative of the starting sample.  

From a technical point of view, data augmentation is performed using algorithms that modify existing data or generate new data based on existing data. For example, in the context of image processing, original images can be modified by rotating them, blurring them, adding noise or changing the contrast. In this way, different variants of an original image are obtained that can be used to train artificial intelligence models. The use of this technology makes it increasingly effective to use AI to recognise diseases, such as certain types of rare cancers.

However, there are several ethical issues that arise from the use of data augmentation in medicine. One of the main concerns relates to the quality of the data generated. If the source data are not representative of the population or if they contain errors or biases, the application of data augmentation could amplify these issues. For example, if the original dataset concerns only Caucasian white males, there is a risk that the data augmentation result will have a bias towards these individuals, transferring the inequalities present in the original data to the generated data.

Replication bias is certainly the most critical issue with regard to data augmentation. If the artificial intelligence model is trained on unrepresentatively generated data or data with inherent biases, the model itself may perpetuate these biases during the decision-making process. For this reason, in synthetic data generation, the quality of the source dataset is an even more critical issue than in artificial intelligence in general.

Data privacy is another issue to consider. The use of data augmentation requires access to sensitive patient data, which might include personal or confidential information. It is crucial to ensure that this data is adequately protected and only used for specific purposes. To address these concerns, solutions such as federated learning and multiparty computation have been proposed. These approaches make it possible to train artificial intelligence models without having to transfer sensitive data to a single location, thus protecting patients’ privacy.

Federated learning is an innovative approach to training artificial intelligence models that addresses data privacy issues. Instead of transferring sensitive data from individual users or devices to a central server, federated learning allows models to be trained directly on users’ devices.

The federated learning process works as follows: initially, a global model is created and distributed to all participating users’ devices. Subsequently, these devices train the model using their own local data without sharing it with the central server. During local training, the models on the devices are constantly updated and improved.

Then, instead of sending the raw data to the central server, only the updated model parameters are sent and aggregated into a new global model. This aggregation takes place in a secure and private manner, ensuring that personal data is not exposed or compromised.

Finally, it is important to note that there are many other ethical issues related to the use of data augmentation in medicine. For instance, there is a risk that synthetic data generation may lead to oversimplification of complex medical problems, ignoring the complexity of real-life situations. In the context of the future AI Act, and the European Commission’s ‘Ethics Guidelines for Trustworthy AI’, the analysis of technologies as complex, and with such a broad impact, as AI systems in support of medical decision-making is becoming increasingly crucial.

Reinforcement Learning (RL): how robots learn from their environment

Reinforcement Learning (RL) has been increasingly applied in recent years in the world of autonomous robotics, especially in the development of what have been called ‘curious robots‘, i.e. robots programmed to mimic human curiosity about the external environment.

Indeed, in general, one of the fundamental problems of autonomous robots concerns their ability to autonomously generate strategies to solve a problem, or to autonomously explore an environment. RL makes it possible to improve the robot’s performance in both these areas. Reinforcement learning is one of the three basic paradigms of machine learning, together with supervised learning and unsupervised learning. In the field of ‘open ended robotics’, RL is used to allow the robot to explore and learn from an environment even in the absence of an explicit goal. Briefly, how RL works in this context is as follows: the robot starts to explore a part of the environment with sensors and actuators, i.e. mechanical arms. As soon as the environment is known beyond a certain threshold, the RL algorithm decreases the reward, i.e. positive ‘reinforcement’ – hence Reinforcement learning – in exploring that part of the environment, and forces the robot to explore a new portion. In this way, the robot is driven, autonomously, by a curiosity-like principle. One of the major advantages of using reinforcement learning in the development of ‘curious robots’ is that it allows these robots to learn from their environment in a more natural way. Traditional programming techniques require engineers to specify every step a robot must perform to complete a task, which can be time-consuming and inefficient, especially if the robot finds applications in unpredictable and changing environments. Reinforcement learning, on the other hand, allows robots to learn autonomously from their environment and develop the best interaction strategies. These techniques can also be used to make the robot discover, in a trial-and-error procedure, which is the shortest way out of a maze. In general, RL works very well for exploratory objectives, and for interaction with extremely unpredictable environments, where normal programming techniques would certainly fail. The evolution of this approach could lead in the coming years to robots capable of exploring vast portions of the environment, for long periods of time, without the need for any human supervision. Such technology has applications in multiple fields, both civil and military.

Despite these advantages, there are also some potential risks associated with the use of reinforcement learning in curious robots. One of the main concerns is that reinforcement learning algorithms can be difficult to interpret, which makes it complex to understand how a robot makes decisions and to predict how it will behave in a given situation. Furthermore, reinforcement learning algorithms carry the risk that a robot will learn to perform sub-optimal or even harmful actions if the interpretation of environmental feedback is ineffective.

Overall, although there are certainly risks associated with the use of reinforcement learning in robotics, the advantages of this technique can be significant. By enabling robots to learn complex tasks and adapt more easily to new environments, reinforcement learning can help make robots more versatile and efficient. As long as these algorithms are used carefully and with proper supervision, they can be a powerful tool for improving performance and advancing the field of robotics.