Skip to Content

Auditing ChatGPT – Part II

Grégoire Martinon, Aymen Mejri, Hadrien Strichard, Alex Marandon, Hao Li
Jan 12, 2024
capgemini-invent

A Survival Issue for LLMs in Europe

Large Language Models (LLMs) have been one of the most dominant trends of 2023. ChatGPT and DALL-E have been adopted worldwide to improve efficiency and tap into previously unexplored solutions. But as is often the case, technological developments come with an equal share of opportunities and risks.  

In the first part of our LLM analysis, we provided a comprehensive definition, examined their technological evolution, discussed their meteoric popularity, and highlighted some of their application. In this second part, we will answer the following questions: 

In this second part, we will answer the following questions:

Are LLMs dangerous?

The short answer is sometimes. With Large Language Models having such a diverse range of applications, the potential risks are numerous. It is worth pointing out that there is no standard list of the risks but a selection is presented below.1, 2

Figure 1: A breakdown of risks posed by LLMS3

Some of these dangers are linked to the model itself (or to the company developing it). The data in the model could contain all sorts of biases, the results might not be traceable, or user data or copyrights could have been used illegally, etc.  

Other dangers are linked to the use of these models. Users seek to bypass the security measures of templates and use them for malicious purposes, such as generating hateful or propagandic texts.  

Additionally, Large Language Models have social, environmental, and cultural consequences that can be harmful. They require enormous amounts of storage and energy. Moreover, their arrival in society has weakened employee power in many industries. For example, writers striking in Hollywood have complained about the use of LLMs.3 Finally, Large Language Models’ AI is challenging the boundaries of literary art, just as DALL-E did with graphic art.4

How can you deal with these risks?

It often takes a while before the risks of emerging technology are fully understood. This is also true of the survival tactics. However, we are already beginning to see early strategies being deployed.  

LLM developers invest in safeguards

OpenAI invested six months of research to establish safeguards and secure the use of Generative Pre-trained Transformers (GPT) models. As a result, ChatGPT now refuses to respond to most risky requests. And as for its responses, they now perform better on such benchmarks as veracity or toxicity. Furthermore, unlike previous models, the ChatGPT LLM has improved since it was deployed.1 

However, it is possible to circumvent these safeguards, with examples of such prompts being freely available on the Internet (Do Anything Now (DANs)). These DANs often capitalize on ChatGPT’s human-centric nature – the model seeks to satisfy the user, even if this means overstepping its ethical framework or creating a confirmation bias. Furthermore, the opacity of the model and its data creates copyright problems and uncontrolled bias. As for benchmark successes, suspicions of contamination with the training database undermine their objective value. Finally, despite announced efforts to reduce their size, OpenAI models consume a lot of resources.5 

Some LLMs now claim to be more ethical or safer, but this is sometimes to the detriment of performance. None of the models are faultless, and there is currently no clear and reliable evaluation method on the subject.

GPT-4 safety in five steps

To go into more details about implementing guardrails, let’s look at the 5 steps implemented by OpenAI for GPT models, as shown in Figure 2.1

  1. Adversarial testing: Experts from various fields have been hired to test the limits of GPT-4 and find its flaws.
  2. Supervised policy: After training, annotators show the model examples of the desired responses to fine-tune it.
  3. Rule-based Reward Model (RBRM) classifiers: The role of these classifiers is to decide whether a prompt and/or its response are “valid” (e.g., a classifier that invalidates toxic requests).
  4. Reward model: Human annotators train a reward model by ranking four possible model responses from best to least aligned.
  5. Reinforcement learning: Using reinforcement learning techniques, the model takes user feedback into account.
Figure 2: GPT-4 Safety Pipeline1, 3

Governments and institutions worry about LLMs

Several countries have decided to ban ChatGPT (see Figure 3).6 Most of them (Russia, North Korea, Iran, etc.) have done so for reasons of data protection, information control, or concerns around their rivalry with the USA. Some Western countries, such as Italy, have banned it and then reauthorized it, while others are now considering a ban. For the latter, the reasons cited are cybersecurity, the protection of minors, and compliance with current laws (e.g., GDPR). 

Figure 3: Map of countries that have banned ChatGPT3.

Many tech companies (Apple, Amazon, Samsung, etc.) and financial institutions (J.P. Morgan, Bank of America, Deutsche Bank, etc.) have banned or restricted ChatGPT. They are all concerned about the protection of their data (e.g., a data leak occurred at Samsung).7 

Scientific institutions, such as scientific publishers, forbid it for reasons surrounding trust – given the risk of articles being written surreptitiously by machines. Finally, some institutions are concerned about the possibility of cheating with such tools.8  

European regulation changes

Many articles in Quantmetry’s blog have already mentioned the future EU AI Act, which will regulate artificial intelligence as soon as 2025.9 However, we should add here that this legislation has been amended following the rapid adoption of the ChatGPT LLM, and the consequences of this amendment are summarized in Figure 4. The European Union now defines the concept of General Purpose AI (GPAI).10 This is an AI system that can be used and adapted to a wide range of applications for which it was not specifically designed. The regulations on GPAIs therefore concern Large Language Models’ AI as well as all other types of Generative AI. 

GPAIs are affected by a whole range of restrictions, summarized here in three parts:

  • Documentary transparency and administrative registration, should not be complicated to implement.
  • Risk management and setting up evaluation protocols. These aspects are more complicated to implement but feasible for LLM providers, as outlined by OpenAI with ChatGPT.1
  • Data governance (RGPD and ethics) and respect for copyright. LLM providers are far from being able to guarantee these for now.

The European Union will therefore consider LLMs to be high-risk AIs, and LLM providers still have a lot of work to do before they reach the future compliance threshold. Nevertheless, some believe that this future law is, in some respects, too impractical and easy to circumvent. 

Figure 4: Impact of the EU AI Act on LLMS3

Assessing model compliance is one of Quantmetry’s core competencies, particularly in relation to the EU AI Act. Regarding LLMs specifically, Stanford researchers published a blog post evaluating the compliance of 10 LLMs with the future European law.11 The results are shown in Figure 5. To establish a compliance score, the researchers extracted 12 requirements from the draft legislation and developed a rating framework. Annotators were then tasked with conducting an evaluation based on publicly available information. The article identifies copyright, data and risk management, and the lack of evaluation standards as the main current issues, aligning with our analysis above. The researchers estimate that 90% compliance is a realistic goal for LLM providers (the top performer currently achieves 75%, with an average of 42% across the evaluated 10 LLMs). 

Figure 5: Results of the compliance evaluation made by Stanford researchers11

A few tips

Faced with all these risks, it would be wise to take a few key precautions. Learning a few prompt engineering techniques to ensure that prompts provide reliable and high-quality responses could be a good way forward.12 It’s also worth watching out for data leaks via free chatbots (e.g., on the free version of ChatGPT). The paid version does not store your data a priori. Finally, Figure 6 illustrates how to use tools like ChatGPT with care.   

Figure 6: Diagram for using ChatGPT with care13

How do you audit such models?

There are three complementary approaches to auditing an LLM, summarized in Figure 9.14

Organizational audit

An organizational audit can be carried out to check if the company developing the LLM is working responsibly, along with ensuring, for example, that its processes and management systems are compliant.   

It will be possible to do this for our clients who are not suppliers of LLMs but wish to specialize them further, to ensure that they are well employed.  

Audit of the foundation model

Auditing the foundation model is the current focus of scientific research. For such an audit, it would be necessary to be able to explore the dataset (which is inaccessible in reality), run test benches on recognized benchmarks and datasets (but face the problem of contamination), and implement adversarial strategies to detect the limits of the model. If we go into more detail, there is a multitude of possible tests for evaluating the following aspects of the model:15 

  • Responsibility: Understanding how risks materialize and finding the limits of the model (typically with adversarial strategies).
  • Performance: This involves using datasets, test benches, or Turing tests to assess the quality of the language, the skills and knowledge of the model, and the veracity of its statements (see Figures 7 and 8).
  • Robustness: The aim here is to assess the reliability of responses by means of calibration or stability measurements in the face of prompt engineering strategies.
  • Fairness: Several methods exist to try to identify and quantify bias (even without access to the dataset) but remain limited. For example, a method could be counting biased word associations (man = survival, woman = pretty).
  • Frugality: Some inference measurements can be made to estimate the environmental impact of the model, but they are also limited without access to supplier infrastructures.
Figure 7: Performance of GPT -4 within Truthful QA1
Figure 8: Performance of GPT-4 on human examinations1

Theoretically, an LLM can be assessed on five of the eight dimensions of a Trustworthy AI defined by Quantmetry.16 On the dimension of being explainable, the previously mentioned solution of the Chatbot citing its sources responds to this problem, to a certain degree. 

Use case audit

Quantmetry and Capgemini Invent are currently working together to define a framework that enables our clients to audit their AI systems based on LLMs. The primary aim of this audit is to check that the impact of the system on the user is controlled. To do this, a series of tests check compliance with regulations and the customer’s needs. We are currently developing methods for diagnosing the long-term social and environmental impact of their use within a company. Finally, we will create systems that can assess risks and biases, as well as operational, managerial, and feedback processes. The methods used are often inspired by, but adapted from, those used to audit the foundation model.  

Figure 9: Three approaches to auditing an LLM3, 14

How can Capgemini Invent and Quantmetry help you capitalize on LLMs?

Amidst the media excitement surrounding the widespread adoption of ChatGPT, harnessing the full potential of Generative AI and LLMs while mitigating risks lies at the heart of an increasing number of our clients’ strategic agendas. Our clients must move quickly along a complex and risky path, and the direct connection between the technology and end-users makes any errors immediately visible – with direct impacts on user engagement and brand reputation.  

Drawing upon our experience in facilitating major transformations and our specific expertise in artificial intelligence, our ambition is to support our clients at every stage of their journey, from awareness to development and scalable deployment of measured-value use cases. Beyond our role in defining overall strategy and designing and implementing use cases, we also offer our clients the opportunity to benefit from our expertise in Trustworthy AI. We assist them in understanding, measuring, and mitigating the risks associated with this technology – ensuring safety and compliance with European regulations.  

In this regard, our teams are currently working on specific auditing methods categorized by use cases, drawing inspiration from the academic community’s model of auditing methods. We are committed to advancing concrete solutions in this field.  

Authors

Hadrien Strichard

Data Scientist Intern at Capgemini Invent
Hadrien joined Capgemini Invent for his gap year internship in the “Data Science for Business” master’s program (X – HEC). His taste for literature and language led him to make LLMs the main focus of his internship. More specifically, he wants to help make these AIs more ethical and secure.

Alex Marandon

Vice President & Global Head of Generative AI Accelerator, Capgemini Invent
Alex brings over 20 years of experience in the tech and data space,. He started his career as a CTO in startups, later leading data science and engineering in the travel sector. Eight years ago, he joined Capgemini Invent, where he has been at the forefront of driving digital innovation and transformation for his clients. He has a strong track record in designing large-scale data ecosystems, especially in the industrial sector. In his current role, Alex crafts Gen AI go-to-market strategies, develops assets, upskills teams, and assists clients in scaling AI and Gen AI solutions from proof of concept to value generation.

Hao Li

Data Scientist Manager at Capgemini Invent
Hao is Lead Data Scientist, referent on NLP topics and specifically on strategy, acculturation, methodology, business development, R&D and training on the theme of Generative AI. He leads innovation solutions by confronting Generative AI, traditional AI and Data.