Home / posts / Blog / The Rise of AI Tools in Evaluation – Challenges and Opportunities

The Rise of AI Tools in Evaluation – Challenges and Opportunities

Sep 17, 2024 | Blog

By Martha Bicket (September 2024)

The past few years have seen an unprecedented rise in the capability and availability of AI. What new opportunities and advantages might AI tools be able to offer evaluators and decision-makers? What are the potential challenges and risks?

This blog summarises key points discussed by delegates in the session “How might the rise of AI tools affect future approaches to evaluation?” as part of the CECAN Conference held at the Royal Academy of Engineering on 4 June 2024.

Opportunities

One of the more highly anticipated advantages of AI in evaluation is the promise of speed and improved efficiency, particularly in relation to data analysis and reporting.

AI lends itself well to certain analytical tasks such as quantitative analysis and text analysis, and is able to process large amounts of data more quickly than by traditional means. It may even uncover insights that might otherwise have been overlooked.

In terms of reporting, AI may be able to save evaluators time and effort by assisting with notetaking, simple writing, summaries and editing.

In complex and rapidly developing policy areas, where evaluation results are needed yesterday to understand changes and inform decision-making, the prospect of accelerating the speed at which evaluation can provide results is attractive and has the potential to be highly impactful.

Challenges

“Bias, bias, bias, ethics” (session participant)

However, any gains will be short-lived if the output is unreliable or if measures to manage potential reliability issues and verify outputs are too time-consuming. AI can be liable to ‘hallucinate’, that is, generate inaccurate or nonsensical information. Inaccuracies and biases may arise from gaps or biases in training data or incorrect assumptions made by the model.

Managing bias in AI and training data is a particular concern; how do we ensure that AI is trained on a good enough understanding of the real world, avoiding bias as far as possible, and how do we evaluate the quality of an AI model and its outputs?

Other concerns include how to protect sensitive data and manage confidentiality when AI tools are hosted externally, as is often the case, and how to deal with service unreliability such as loss of connection and unexpected system down-time, especially as systems become more heavily integrated with – and reliant on – AI.

Practical considerations and looking ahead

“It is not a question of whether an industry and its professions will be affected. It is a question of how.” (Nielsen, 2023)

Delegates made several suggestions for how to approach integrating AI into evaluation practices. They stressed the importance of a gradual, considered approach, emphasising the need for proper training and organisational guidance to support AI uptake and good practice in the evaluation community. Training needs to include a discussion of the limitations of AI and options for how to assess models and validate outputs. They also highlighted the importance of human expertise and skilled evaluators in guiding model development and validating outputs and how this may help to improve the legitimacy of AI use in evaluation.

Looking ahead, the evaluation community faces questions like: ‘how can we ensure AI tools are trained on representative data with minimal bias?’ and ‘what are the implications of using external AI tools when dealing with sensitive personal data?’

The rise of AI offers some exciting opportunities for evaluation. However, it also presents practical challenges, e.g. around reliability, bias and data protection, and demands a thoughtful, ethical approach to implementation. As the field of AI continues to evolve, the evaluation community must stay informed about developments and actively work to shape best practice on AI use in evaluation.

The session “How might the rise of AI tools affect future approaches to evaluation?” at the 2024 CECAN Conference was led by Brian Castellani (University of Durham) and Martha Bicket (University of Surrey). This summary of proceedings was written by Martha Bicket with help from Claude.ai. The text has been checked for accuracy; any remaining errors are Martha’s, not Claude’s.

References and further reading

Head, C. B., Jasper, P., McConnachie, M., Raftree, L., & Higdon, G. (2023). Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation, 2023, 33–46. https://doi.org/10.1002/ev.20556

Nielsen, S. B. (2023). Disrupting evaluation? Emerging technologies and their implications for the evaluation industry. New Directions for Evaluation, 2023, 47–57. https://doi.org/10.1002/ev.20558

Wirjo, A., Calizo, S. Jr., Vasquez, G., & Andres, E. (2022). Artificial Intelligence in Economic Policymaking. APEC Policy Support Unit Policy Brief No. 52. https://www.apec.org/docs/default-source/publications/2022/11/artificial-intelligence-in-economic-policymaking/222_psu_artificial-intelligence-in-economic-policymaking.pdf

World Food Programme Evaluation (2023). 5 ways AI is about to transform evaluation. Medium. https://wfp-evaluation.medium.com/5-ways-ai-is-about-to-transform-evaluation-28e499194647

Tweets by cecanexus

The Rise of AI Tools in Evaluation – Challenges and Opportunities

Opportunities

Challenges

Practical considerations and looking ahead

Related