Benchmark To Evaluate AI Models Are Not Sufficient: Report

Benchmark To Evaluate AI Models Are Not Sufficient: Report
Published on
2 min read

Despite the growing demand for AI safety and accountability, the current tests and benchmarks used to evaluate AI systems might not be sufficient. Especially Generative AI models that analyze and process images, music, texts, videos, and similar models tend to behave unpredictably and make mistakes.

As per the study done by a UK-based AI research firm Ada Lovelace Institute, existing evaluations, although they are non-exhaustive and useful, are not sufficient to predict models’ behaviour in the real world, in addition, they are easily traceable.

Further, the version of the model which is being evaluated also matters. Suggesting ways to mitigate unpredictable behaviour in the model, the report advocates that small changes made to AI applications built on the foundation model can bring behavioral changes and may overrule built-in safety features.

“Assessing the safety of a model requires considering the wider context, including the users, the design of its interface, what tools the model might have access to, or how the model will affect the environment it operates within.”

“Current evaluations appear to be designed to meet corporate needs or academic curiosity rather than public or regulatory interests.”

“To improve evaluation methods, it will be necessary to develop context-specific evaluations of AI systems that respond to the needs of specific regulators,” allowing regulators to assess and investigate the safety of the foundation model applications efficiently.

While proposing ways to increase the impact of evaluation “for scrutinizing foundation models and their impacts,” the report suggests regulators and policymakers “must clearly articulate the insights they seek from evaluations,” while maintaining transparency about existing limitations and chances of future advancements. They should also “keep the details of some evaluation and related datasets confidential.” Also, mandating increased public participation during the creation of evaluations including the consequences can help reduce unpredictability in the model. In addition to that, the government should also develop and implement supportive measures to create an ecosystem for third-party evaluations “including certification schemes and initiatives to ensure assessors have the necessary access to the model, dataset and organizational information to conduct an evaluation.”

𝐒𝐭𝐚𝐲 𝐢𝐧𝐟𝐨𝐫𝐦𝐞𝐝 𝐰𝐢𝐭𝐡 𝐨𝐮𝐫 𝐥𝐚𝐭𝐞𝐬𝐭 𝐮𝐩𝐝𝐚𝐭𝐞𝐬 𝐛𝐲 𝐣𝐨𝐢𝐧𝐢𝐧𝐠 𝐭𝐡𝐞 WhatsApp Channel now! 👈📲

𝑭𝒐𝒍𝒍𝒐𝒘 𝑶𝒖𝒓 𝑺𝒐𝒄𝒊𝒂𝒍 𝑴𝒆𝒅𝒊𝒂 𝑷𝒂𝒈𝒆𝐬 👉 FacebookLinkedInTwitterInstagram

Related Stories

No stories found.
logo
DIGITAL TERMINAL
digitalterminal.in