The UK Safety Institute, the UK's recently established AI safety body, has created a toolset that aims to “strengthen the safety of AI” by making it easier for industry, research and academia to develop AI assessments. has been released.
This toolset, called Inspect, is available under an open source license, specifically the MIT license, and is designed to evaluate specific features of an AI model, such as the model's core knowledge and inference capabilities, and to The purpose is to generate a score based on
In a press release announcing the news on Friday, the safety association said that Inspect “is a state-sponsored, agency-led AI safety testing platform released for broader use. He claimed that it was his first time.
Take a look at Inspect's dashboard.
“Successful collaboration on AI safety testing means sharing an accessible approach to evaluation, and we look forward to Inspect being a building block of that.” Chairman Ian Hogarth said in a statement. “Around the world, he helps the AI community use his Inspect to not only run safety tests on their own models, but also adapt and build on open source platforms to ensure high quality assessments across the board. I hope it can be produced.”
As I've written before, AI benchmarking is difficult. Especially since today's most sophisticated AI models are black boxes, with their infrastructure, training data, and other important details kept secret by the companies that create them. So how does Inspect address this challenge? Primarily by being extensible and extensible to new testing methodologies.
Inspect consists of three basic components: a data set, a solver, and a scorer. Datasets provide samples for evaluation testing. The solver does the work of running the tests. The scorer then evaluates the solver's work and aggregates the test scores into metrics.
Inspect's built-in components can be extended via third-party packages written in Python.
In a post on I did.
Clément Delangue, CEO of AI startup Hugging Face, floated the idea of integrating Inspect with Hugging Face's model library or creating a public leaderboard with evaluation results of the toolset.
Inspect's release comes after the US government agency National Institute of Standards and Technology (NIST) launched NIST GenAI, a program to evaluate a variety of generative AI technologies, including text-generating AI and image-generating AI. . NIST GenAI plans to release benchmarks, help create content authenticity detection systems, and encourage the development of software that identifies AI-generated false and misleading information.
Following commitments announced at the UK's AI Safety Summit at Bletchley Park last November, the US and UK announced a partnership in April to co-develop advanced AI model testing. As part of the cooperation, the United States will launch its own AI Safety Institute, which will be responsible for a wide range of risk assessments from AI and generative AI.