Παράκαμψη προς το κυρίως περιεχόμενο
European data
data.europa.eu
Η επίσημη πύλη ευρωπαϊκών δεδομένων

Can we trust synthetic data? Key insights from the June data.europa academy webinar

Exploring how AI‑generated and synthetic data are reshaping research, ethics, and trust

Can we trust data that was never directly collected from the real world? This was the central question explored during the data.europa academy webinar on Friday, 5 June 2026 titled ‘Responsible use of AIgenerated and synthetic data in research'. The session, part of the series “Open data, academia, and ethics”, introduced participants to the growing role of AIgenerated and synthetic data in research. For those less familiar with the topic, synthetic data refers to data that is artificially created, often to overcome challenges such as limited availability or privacy concerns, while still aiming to resemble real-world patterns. 

The webinar provided both a technical and a critical perspective on this evolving field. In the first presentation, Thomas Lampert explained how machine learning models are trained on data, ranging from images to text, and how these models can generate new, synthetic outputs. Through simple examples, such as distinguishing between apples and bananas, he illustrated how algorithms learn patterns from existing data and reproduce results rather than the underlying human reasoning process. The session also explored how generative models, including systems like ChatGPT, are capable of creating entirely new content based on learned patterns, raising important questions about how such data should be interpreted and used.  

The second presentation shifted the focus to the broader implications for research and society. Ella Hafermalz highlighted that data has always been “constructed” to some extent, encouraging participants to reflect on where the boundary lies between real and synthetic data. She introduced the idea of a ‘synthetic threshold’, the point at which data can no longer be considered purely real and discussed three forms of synthetic data use: legitimate and transparent applications, fraudulent or misleading uses, and cases where human and AI-generated data become intertwined. This perspective underscored the importance of transparency, clear documentation, and ethical responsibility when working with AI-generated content.  

Ultimately, the webinar demonstrated that while synthetic and AI-generated data open up new possibilities for innovation, such as in fields like medical imaging, they also come with risks, including bias, misinterpretation, and reduced trust if not handled carefully. As research practices continue to evolve, ensuring responsible and transparent use of these technologies will be key to maintaining credibility and public trust. Watch the full recording of the webinar here. Interested in exploring more topics like this? The data.europa academy organises monthly webinars covering topics such as open data, AI, and digital transformation. Visit the events page to stay up to date with upcoming sessions and access past recordings.

Text of this article