Traditionally, evaluation has focused on understanding whether a program is making progress against pre-determined indicators. In this context, the quality of the evaluation is often measured in part by the “rigor” of the methods and scientific inquiry. Experimental and quasi-experimental methods are highly-valued and seen as the most rigorous designs, even when they may hamper the ability of the program to adapt and be responsive to its environment.
Evaluations of complex systems change strategies or adaptive, innovative programs cannot use this same yardstick to measure quality. An experimental design is hard to apply when a strategy’s success is not fully defined upfront and depends on being responsive to the environment. As the recognition of the need for these programs, and consequently the number of complex programs, grows, so does the need for a new yardstick. In recognition of this need, we proposed a new definition of rigor at the 2015 American Evaluation Association annual conference, one that broadens the ways we think of quality in evaluation to encompass things that are critical when the target of the evaluation is complex, adaptive, and emergent.
We propose that rigor be redefined to include a balance between four criteria:
- Quality of the Thinking: The extent to which the evaluation’s design and implementation engages in deep analysis that focuses on patterns, themes, and values (drawing on systems thinking); seeks alternative explanations and interpretations; is grounded in the research literature; and looks for outliers that offer different perspectives.
- Credibility and Legitimacy of the Claims: The extent to which the data is trustworthy, including the confidence in the findings; the transferability of findings to other contexts; the consistency and repeatability of the findings; and the extent to which the findings are shaped by respondents, rather than evaluator bias, motivation, or interests.
- Cultural Responsiveness and Context: The extent to which the evaluation questions, methods, and analysis respect and reflect the stakeholders’ values and context, their definitions of success, their experiences and perceptions, and their insights about what is happening.
- Quality and Value of the Learning Process: The extent to which the learning process engages the people who most need the information, in a way that allows for reflection, dialogue, testing assumptions, and asking new questions, directly contributing to making decisions that help improve the process and outcomes.
The concept of balancing the four criteria is at the heart of this redefinition of rigor. Regardless of its other positive attributes, an evaluation of a complex, adaptive program that fails to take into account systems thinking will not be responsive to the needs of that program. Similarly, an evaluation that fails to provide timely information for making decisions lacks rigor even if the quality of the thinking and legitimacy of the claims are high.
There are many implications of this redefinition:
- From an evaluator’s point of view, it provides a new checklist of considerations when designing and implementing an evaluation. It suggests that specific, upfront work will be needed to understand the cultural context, the potential users of the evaluation and the decisions they need to make, and the level of complexity in the environment and the program itself. At the same time, it maintains the same focus the traditional definition of rigor has always had on leveraging learnings from previous research and seeking consistent and repeatable findings. Ultimately, it asks the evaluator to balance the desire for the highest-quality methods and design with the need for the evaluation to have value for the end-user, and for it to be contextually appropriate.
- From an evaluation purchaser’s point of view, it provides criteria for considering the value of potential evaluators, evaluation plans, and reports. It can be a way of articulating upfront expectations or comparing the quality of different approaches to an evaluation.
- From a programmatic point of view, it provides a yardstick by which evaluators can not only be measured, but by which the usefulness and value of their evaluation results can be assessed. It can help program leaders and staff have confidence in the evaluation findings or have a way of talking about what they are concerned about as they look at results.
Across evaluators, evaluation purchases, and users of evaluation, this redefinition of rigor provides a new way of articulating expectations from evaluation and elevating the quality and value of the evaluations. It is our hope that this balanced approach helps evaluators, evaluation purchasers, and evaluation users to share ownership over the concept of rigor and finding the right balance of the criteria for their evaluations.