Using generative AI
Last updated: 8 September 2023
Chatbots which utilise generative Artifical Intelligence (AI) have exploded in public popularity with the release of ChatGPT from the OpenAI group. These services use large language model (LLM) machine learning to train a chat bot to respond to questions in a real life manner.
A lot of the current popular chat bots utilise the GPT3 language model (Generative Pre-trained Transformer, released in 2020), this is specifically designed to generate human-like text for chatbot applications. This, and more recent machine learning and artificial intelligence, is becoming baked into services we use on a daily basis. For example:
- ChatGPT
- Google Bard
- StableDiffusion
- GitHub Copilot
- Amazon Lex
- Bing Search AI
The capability of LLMs and generative AI is predicated on storing and feeding models with large amounts of real life data, which might include subsequent questions asked of it since going live. As the functionality of the models is largely opaque, and assurance of data confidentiality is not always assured, use of these services potentially introduces risks to the governance of Home Office data, information and processes.
In addition to this, some of these services can generate or examine code, to provide suggested solutions. While there is a potential benefit to AI assisted development, care needs to be taken with what code is provided to the service, and how the assistance is assured for quality control.
Solution
With this in mind there are some guidelines that engineers in Home Office must keep in mind when dealing with, asking questions and sending data to any of these services. This should be considered alongside Cabinet Office guidance on using generative AI.
- You MUST NEVER enter sensitive data, information, processes or secrets into a 3rd party hosted generative AI service
- Do not ask questions related to unreleased policy
- Keep in mind that these sites could host their data anywhere
- Never put in any code unless that code is already open sourced
- Never send code that is related to anti-forgery or security related processes
When using generative AI services to generate code.
- Take care to not expose internal business processes, technology, configuration or secrets in prompts - if necessary, abstract problems sufficiently to avoid specifics needing to be used
- Tag generated code blocks with some tagging scheme
- Be aware of the potential for data poisoning. Do not explicitly trust generated code - it must be subject to regular peer review as with any other written code
- You are responsible for all code you commit, regardless of how it has been created or generated