Peters & Peters

AI and the SFO: What could possibly go wrong?

The digital age has permanently changed the way in which criminal investigations are conducted. As people and businesses make greater use of technology, so grows the volume of material the police and prosecution bodies receive when an investigation begins. This is an important issue for the Serious Fraud Office (SFO), which often receives upward of 10 million documents in relation to a case. The mishandling of such volumes can have catastrophic consequences as the collapse of the Serco trial showed.

Lisa Osofsky, Director of the Serious Fraud Office (SFO), stated during a speech at Cambridge Symposium that one of the “biggest challenges” the SFO faces is disclosure. In a move to free up staff to focus on analysis and decision making, Ms Osofsky proposed making greater use of technology in disclosure processes, including using artificial intelligence (AI) and machine learning. This was foretold by the SFO’s response to Parliament’s questions, in September 2021, where the organisation confirmed it “intends to invest in such technologies in the next financial year”. That time has seemingly come.

Although an exciting step forward for the SFO, which is often considered non-progressive, some risks remain unaddressed. A strive towards speed and efficacy cannot overshadow the need to comply with current legislative disclosure obligations, especially, the requirement to explain how relevant documents have been identified. Staff must also be well trained and overseen by those competent to explain its use. The question must also be asked, if an error occurs – which it will – and another trial collapses, who will be to blame? And how would the organisation ‘learn’ from, say, a glitch in the software? These questions are considered below.

Efficiency benefits

The most obvious benefit of using AI in disclosure is efficiency, not only reducing the time spent on review but the number of staff required, and consequent costs. In a few minutes, the technology can strip out duplicate documents and piece together email chains in chronological order from the ingested data. When keyword searches are run, the technology will return every document containing those words. Some will even give each document a score from 0 to 100; the higher the score, the more likely it is relevant. Usefully, the technology can also use document metadata to create timelines or intelligence maps, bringing the data alive. The SFO has made use of these features since 2018, but Ms Osofsky is proposing to go further, and this is where the risks lie.

Legal obligations

Were machine learning to be introduced to disclosure, the case team would decide an initial algorithm to tell the technology what to look for: for example, documents with evidence of bribery or money laundering. Once results are returned, the case team would tell the technology which documents are relevant and it will “learn” from this. The AI then begins to make decisions, based on what it has learnt. This is where AI and the SFO’s legislative disclosure obligations butt heads.

As the Attorney General’s Guidance on Disclosure states, in cases involving large quantities of data, the “digital strategy must be set out in an IMD (investigation management document) and subsequently a DMD (disclosure management document). This should include the details of any sampling techniques used (including key word searches) and how the material identified as a result was examined”. The guidance also states that “technology that takes the form of search tools which use unambiguous calculations to perform problem-solving operations, such as algorithms or predictive coding, are an acceptable method of examining and reviewing material for disclosure purposes.”

By nature, machine learning uses ambiguous unexplainable algorithms. This is the “black box” problem, also commonly discussed in relation to driverless cars. The SFO will be able to explain its “input” into the software and the “output” from that, but the part in between – where the AI has learnt and made decisions – will be unknown.

The SFO touched on this point in its response to Parliament’s questions, acknowledging that in court “it will be necessary to explain how technologies were applied in a way that led to the specified outcome”. By saying this, however, the SFO is admitting that it can explain an outcome only based on its input; decisions made by the AI itself will remain unexplained.

Given the SFO will not be able to explain exactly how relevant documents have been identified, it will be unable to do the same for those the AI decided were irrelevant. This is an issue, both for the court and the defence, because how can the sufficiency or reasonableness of search terms be explored if the prosecution cannot explain the full string of search terms applied? Arguably, the prosecution would not meet the AG’s disclosure standards if they cannot explain why a certain search was run, even if they know the original input.

Of potential help to the SFO is a caveat in the AG’s guidance in relation to search term recording “where it is impracticable to record each word or term it will usually be sufficient to record each broad category of search”. The question is whether the SFO’s input to the AI technology alone will amount to a broad category of search.

Ms Osofsky may see the current legislative disclosure framework as outdated but, if and until we see reform, the SFO will need to ensure its use of AI complies with its overriding disclosure obligations.


To be effective, technology must be operated by experts. Given personal liberty is so often at stake in an SFO trial, the consequences of exculpatory documents being missed makes this so crucial. It is questionable whether the SFO has enough staff with the required skillsets. As it acknowledged to Parliament, “people with relevant skillsets can command extremely high salaries in the private sector, which will limit the public sector’s ability to hire the required number of staff, as the application of new technologies becomes more commonplace”; it is clear that hiring and retaining skilled staff is a struggle.

Given the length of SFO investigations, rarely shorter than a few years, it is important that it retains the staff responsible for the input to and application of AI in a case’s disclosure exercise. A high turnover of those who can account for how and why the technology was applied will undermine its efficiency; written records are always subject to interpretation and will not account for the entirety of decisions made. Further, if an issue was raised in court, the SFO will want someone who had oversight of the entire case to account for the disclosure approach.

Perhaps the SFO will prioritise technology training for its staff, but the reality is that the organisation cannot compete with private sector salaries and attrition rates remain high.

Further, as the SFO admitted to Parliament, training is a necessary safeguard to ensure the “proportionate and lawful” application of technology. This includes the potential for AI to develop prejudicial algorithms, such as that reported by Amazon. Although the SFO are clearly aware of this risk, it will be important to demonstrate how their use of AI fends against it, which training will be crucial for.


What happens if the technology makes a mistake? Let’s assume the defence claim a relevant document has been missed, one that – when found – is key evidence in support of their case. The SFO’s AI has tagged it as non-relevant. The document is clearly relevant and contains keywords that should have been picked up. What then?

The SFO will be able to show its “input”, but the “black box” will prevent the court from understanding how it was missed. This is different to a human error. As an example, if a particular document reviewer mis-tagged a document, depending on the severity of the fault, it could be proposed that the entirety of their work be re-reviewed. Whereas in this scenario, how can the court trust that the AI has correctly applied the SFO’s input and the section 3 CPIA 1996 test of relevance?

Indeed, the SFO told Parliament of its extensive quality assurance processes. For example: “sample testing of initial results; updating the programme with results from the sampling then subsequently re-testing; the use of multi-disciplinary teams with the right skillsets; recording processes and protocol; and early communication with the other side to the litigation to agree protocol”. This will go a long way to ensure errors do not occur, but risks remain.

If the court held that it could not rely on the disclosure exercise performed and the trial collapses, who is to blame? If this is down to a glitch in the software that had gone unnoticed, might the software provider or the skilled staff overseeing its use throughout the case be responsible? This would, of course, be a subjective matter.

Further, how would the SFO learn from such a glitch? In the aftermath of Serco, the organisation appraised how and why the disclosure effort went wrong. This would enable it to make identifiable changes to prevent the issue from recurring. Bouncing back from a software glitch, however, would be a different exercise.

Treading a careful path

The SFO’s dive into AI should accelerate its investigations, which will reduce the time those accused spend awaiting a potential charge. This is undoubtedly positive, but the risks cannot be ignored. As the SFO grapples with the technology, our lawyers are equipped to consider its use in disclosure and make representations when needed. If the SFO wants to avoid another Serco, it must carefully consider the use of AI in disclosure and be prepared to answer tricky questions.