|5.1 Selecting the appropriate evaluation framework|
|5.2 Selecting the appropriate evaluation design|
|5.3 Collecting and analyzing data|
This section highlights the importance of using appropriate methods for an Equity-focused evaluation, to ensure that the equity dimensions of the intervention will be identified and analyzed during the evaluation process.
5.1 Selecting the appropriate evaluation framework
Below are two frameworks, and a number of designs and tools, which can be taken into consideration when planning an Equity-focused evaluation. Many other frameworks, designs and tools relevant and suitable for Equity-focused evaluations exist. The final decision on what framework, design and tools should be used has to be based on the purpose and scope of the evaluation, the evaluation questions, and also the nature and the context of the intervention to be evaluated.
A. Theory-based Equity-focused evaluation
While the programme’s theory of change is an important component of most programme evaluations, a well-articulated theory of change is particularly critical for Equity-focused evaluations. Equity interventions achieve their objectives through the promotion of behavioral changes that cannot be defined and assessed through conventional pre-test/post-test comparison group designs comparing a set of indicators before and after the intervention. The process of implementation, and the context within which implementation takes place, have a significant impact on the accessibility of the health, education and child protection public systems for worst-off groups. It is also important to understand how effectively public policies and service delivery systems have been able to adapt to the special challenges of reaching worst-off groups. For all of these reasons it is important to base the evaluation on a theory of change that can describe and assess the complex reality within which Equity-focused interventions operate.
A well-articulated programme theory of change can:
- Define the nature of the problem the policy or programme is intended to address.
- Incorporate lessons from the literature and experiences with similar programmes.
- Identify the causes of the problem being addressed, and the proposed solutions.
- Explain why the programme is needed.
- Identify the intended outcomes and impacts.
- Present a step-by-step description of how outcomes are to be achieved.
- Define the key assumptions on which the programme design is based.
- Identify the key hypotheses to be tested.
- Identify the contextual factors likely to affect implementation and outcomes.
- Identify the main risks and reasons why the programme may not achieve its objectives.
The programme theory is also a valuable tool in the interpretation of the evaluation findings. If intended outcomes are not achieved, the programme theory can help trace-back through the steps of the results chain to identify where actual implementation experience deviated from the original plan. It also provides a framework for identifying unanticipated outcomes (both positive and negative). If implementation experience conforms reasonably closely to the design, and if outcomes are achieved as planned, this provides prima facie evidence to attribute the changes to the results of the programme. However, it is possible that there are other plausible explanations for the changes, so a well-designed programme theory should be able to define and test rival hypotheses. The theory must be defined sufficiently precisely that it can be “disproved”. One of the major criticisms of many programme theories is that they are stated in such a general and vague way that they can never be proved wrong. To disprove requires:
- that the theory includes a time-line over which outcomes and impacts are to be achieved;
- measurable indicators of outputs, outcomes and impacts;
- measurable indicators of contextual factors, and a clear definition of how their effect on policy implementation and outcomes can be analyzed.
Ideally the programme’s theory of change will be developed during the policy design. However, it is often the case that the theory of change was not developed so the evaluation team must work with stakeholders to “reconstruct” the implicit theory on which the policy is based (see section 7 on Real World evaluation). Ideally the evaluation team will be involved at a sufficiently early stage of the design to be able to assist in the development of the programme’s theory of change, so as to ensure that it provides sufficient detail for the evaluation.
Theories of Change and Theories of Action
According to Funnell and Rogers (2011)1 Program theories can be articulated through two components:
· A theory of change and
· A theory of action.
A theory of change:
· Explains how program outcomes are expected to contribute to the situation (problem) that gave rise to the program
· Identifies baseline data that should be collected to measure change in outcomes
· Situates the program within a broader context
· Defines an outcome chain (also called a results chain) which should include outcomes beyond the direct influence of the program but that are critical to success.
A theory of action:
· A detailed statement about each of the outcomes in the outcome (results) chain indicating
o Who is to be affected
o What choices were made about the design of the service/activity
· Assumptions about how the program needs to operate [numbers of staff, quality of service delivery etc]
· Assumptions about external factors that could affect the program
· What has the program chosen to do about external factors
Basic components of a programme theory of change
Programme theories of change are often represented graphically through a logic model. The model can be used either to describe a stand-alone programme targeted at worst-off groups (for example female sexual partners of injecting drug users), or to describe equity-focused strategies that are integrated into a universal programme. An example of the latter would be a programme designed to increase overall school enrolment through separate toilets for boys and girls, renovated buildings, new school textbooks and teacher training programmes. A special scholarship programme and transport vouchers might be targeted specifically at girls from low-income households to provide a further incentive for them to enroll. In this case the evaluation would assess the overall impacts of the programme on school enrolment as well as the effectiveness of the scholarships and transport vouchers on increased enrolment for low-income girls. If resources permit the evaluation might use the programme theory as a framework to compare enrolment rates for low-income girls in schools that only offered the general improvement programmes with those that also included the targeted programmes. This would permit an analysis of the value-added of the targeted programmes.
The model includes two main components:
- The seven stages of the project cycle (design, inputs, implementation, outputs, outcomes, impact and sustainability) – defining the special equity-focused elements at each stage.
- The contextual factors (political, economic, institutional, natural environment and socio-cultural characteristics of the affected populations) that can affect the implementation and outcomes of equity-focused interventions.
There are a number of refinements that can be incorporated in the basic logic model that are important for the description and evaluation of equity-focused interventions:
- The contextual framework: analysis of the economic, political, socio-cultural, environmental, legal, institutional and other factors, that affect how programmes are implemented and how they achieve their outcomes. All of these factors can constrain the effective implementation of equity-focused interventions. In cases where there is little social or political support for the integration of worst-off groups, many of these factors can present major challenges. While contextual factors are often analyzed descriptively, it is also possible to incorporate these variables into the statistical analysis by converting them into dummy variables.
- Process analysis: examining how the programme is actually implemented, how this compares with the intended design, and the effects of any deviations from the design, and how deviations affect the accessibility of the programme for different sectors of the target population
- Results chain analysis (also called outcomes chain): a step by step explanation of how the programme is expected to operate and how it will achieves its objectives
- Trajectory analysis: defining the time horizons over which different outcomes are expected to be achieved.
B. The bottleneck analysis framework
Bottleneck supply and demand analysis has been used successfully to evaluate service delivery systems, especially in health systems. It provides a framework for the description and analysis of the major factors affecting the access of worst-off groups to public services, and it has the potential to be an integrated tool that can identify the strengths and weaknesses of different service delivery systems. However, it is important to note that this framework has important limitations when evaluating interventions dealing with acts of commission rather than omission, notably in the field of child protection and violence against children and women.
The framework has four components :
Use of services by worst-off groups
Defining the worst-off groups to be targeted by the intervention. A first step is to identify the worst-off groups intended to benefit from the intervention. The groups can be defined geographically (for example, living in a particular district or in all rural areas) as well as by the nature of the inequity (gender, ethnicity, etc.).
Assessing the adequacy of service utilization by worst-off groups. The following measures should be combined, as appropriate, to assess effectiveness in delivering quality services to the target worst-off groups. Performance indicators include:
- The proportion of each worst-off group who utilize the service.
- The adequacy level for utilization of each service.2
- A comparison of the proportion of the total population utilizing the service with the proportion of the worst-off group who utilize it.
- A comparison of the adequacy of utilization by worst-off and by other groups.
Assessing sustainability. Many interventions operate well whilst donor agencies are actively involved or whilst special programme funding is available, but the quality or volume of services often declines when these special incentives end. It is therefore important to continue to monitor programme operations over time in order to assess long-term sustainability.
Identifying different scenarios for access to worst-off populations. The indicators can be used to identify different scenarios, each of which has different policy and operational implications. For example:
- A programme reaching a high proportion of the total population but a low proportion of the worst-off groups. This indicates that there are some specific problems in reaching the worst-off groups.
- A programme reaching only a low proportion of both populations. This suggests that the overall programme performance needs to be improved before greater access to the worst-off groups can be expected.
- The adequacy of utilization by worst-off groups is lower than for other groups. This suggests that there are some specific delivery issues to be addressed.
- Only a small proportion of the worst-off group use the service but the adequacy of utilization is high for this group. This suggests the programme design can potentially benefit the worst-off group but that there are problems in ensuring access.
Cost-effectiveness analysis. Cost will often be a critical factor when budgets are limited or when the equity interventions do not enjoy broad political support. Consequently the analysis of costs, and how they can be reduced, will often be a critical determinant of the success and sustainability of the intervention.
Supply side factors
The following supply-side factors are assessed:
- Budgets and resources such as staff, buildings, transport, school supplies.
- Overall efficiency of the service organization and delivery.
- Adequate targeting mechanisms. How well does the programme identify the worst-off groups? How adequate are the administrative and other mechanisms for reaching them?
- Culturally acceptable services. Are the services designed in a way that is acceptable to the worst-off groups? For example, many indigenous cultures do not accept the way that western medicine is delivered. Men may not allow their wives or daughters to visit health centres.
- Culturally sensitive staff. Are staffs familiar with the characteristics of the worst-off groups and do they understand the special issues involved in working with these groups? Do they have a positive attitude to working with these groups? Are there staff members who speak the local languages?
- Do worst-off groups have “ownership” of the programme? Were they consulted on how it was designed? Are they involved in management, monitoring and evaluation?
Please note that supply-side issues will be different for special stand-alone programmes targeted exclusively at worst-off groups and for universal service delivery systems adapted to reach worst-off groups.
Demand side factors
The achievement of equity outcomes usually involves processes of behavioral change for different actors. Even when there is a demand for services, and when they are designed in a culturally appropriate way, there are a number of logistical and cultural factors affecting access:
- Distance to the service.
- Time, cost and availability of transport.
- Acceptability of the transport to worst-off groups and their being allowed to use it.
- Costs of services.
- Time constraints.
- Cultural constraints.
The accessibility of services to worst-off groups can be affected by a wide range of local, regional and national contextual factors including:
- Political factors. The attitude of different political groups to providing services to the worst-off (e.g. are worst-off groups considered a security threat; a nuisance; a financial burden; a potential base of political support; a moral obligation; etc.)
- Economic factors. The state of the local and national economy can affect the availability of resources. Also when the local economy is growing, worst-off groups may not have the money to pay for services (including travel). Families may have more incentive to send their children (particularly girls) to school if they are more likely to find employment when they leave.
- Institutional and organizational factors. How well do different agencies work together to coordinate services?
- Legal and administrative. Do worst-off groups have all of the documents required to access services? Are they registered with the appropriate agencies? Are there legal constraints on providing services to, for example, families who do not own the title of the land they farm or on which they live? Can the ministry of education build schools on land without title?
- Environmental. How is programme delivery or sustainability affected by environmental factors such as soil erosion or salinization; flooding; deforestation; water contamination; air quality; or the proximity of urban waste?
Irrespective of the size and nature of the intervention, an evaluation design which applies a mixed-method approach will usually be the most appropriate to generate an accurate and comprehensive picture of how equity is integrated into an intervention. Mixing qualitative and quantitative approaches, while ensuring the inclusion of different stakeholders (including the worst off groups), will offer a wide variety of perspectives and a more reliable picture of reality.
A. Mixed-methods designs
Mixed-method designs combine the strengths of quantitative (QUANT) methods (permitting unbiased generalizations to the total population; precise estimates of the distribution of sample characteristics and breakdown into sub-groups; and testing for statistically significant differences between groups) with the ability of qualitative (QUAL) methods to describe in depth the lived-through experiences of individual subjects, groups or communities. QUAL methods can also examine complex relationships and explain how programmes and participants are affected by the context in which the programme operates.
These benefits are particularly important for Equity-focused evaluations where it is necessary to obtain QUANT estimates of the numbers and distribution of each type of inequity but where it is equally important to be able to conduct QUAL analysis to understand the lived-through experience of worst-off groups and the mechanisms and processes of exclusion to which they are subjected. QUAL analysis is also important to assess factors affecting demand for services and to observe the social, cultural and psychological barriers to participation.
One of the key strengths of mixed-methods is that the sample design permits the selection of cases or small samples for the in-depth analysis that are selected, so it is possible to make statistically representative generalizations to the wider populations from which the cases are selected. This is critical because typically when QUANT researchers commission case studies to illustrate and help understand the characteristics of the sample populations, little attention is given to how the cases are selected and how representative they are. Very often the case studies, because of their ability to dig more deeply, will uncover issues or weaknesses in service delivery (such as sexual harassment, lack of sensitivity to different ethnic groups, lack of respect shown to poorer and less educated groups, or corruption). When these findings are reported it is difficult to know how representative they are, and consequently it is easy for agencies to dismiss negative findings as not being typical. Mixed-method samples can ensure that the case studies are selected in a representative manner.
However, it is important to highlight the fact that mixed-method designs involve much more than commissioning a few case studies or focus groups to complement a quantitative sample survey. It is an integrated evaluation approach, applying its own unique methods at each stage of the evaluation.
Mixed-methods for data collection combine quantitative methods such as surveys, aptitude and behavioral tests, and anthropometric measures with the QUAL data collection methods such as observation, in-depth interviews and the analysis of artifacts3. QUAL methods can also be used for process analysis (observing how the project is actually implemented and how these processes affect the participation of different groups within the vulnerable population). The following are examples of how mixed-methods combine different QUANT and QUAL data collection methods:
- Combining QUANT questionnaires with QUAL in-depth follow-up interviews or focus groups.
- Combining QUANT observation methods with QUAL in-depth follow-up interviews.
- Combining QUANT unobtrusive measures with QUAL in-depth interviews or case studies.
In addition, mixed-methods can combine QUANT and QUAL data analysis methods in the following ways:
- Parallel QUANT and QUAL data analysis: QUAL and QUANT data are analyzed separately using conventional analysis methods.
- Conversion of QUAL data into a numerical format or vice versa.4
- Sequential analysis: QUANT analysis followed by QUAL analysis or vice versa.
- Multi-level analysis.
- Fully integrated mixed-method analysis.
Types of mixed-method designs
Most mixed-method designs are used by researchers who have either a QUANT orientation and recognize the need to build-in a QUAL component, or researchers with a QUAL orientation who recognize the need to build in a QUANT component. Very few mixed-method designs give equal weight to both approaches. Mixed-method designs (Figure 3) can be considered as a continuum with completely QUANT design at one end and completely QUAL at the other. In between are designs that are mainly QUANT with a small QUAL component, designs that are completely integrated with equal weight given to QUANT and QUAL, to designs that are mainly QUAL with only a small QUANT component.
There are three main kinds of mixed-method design:
- Sequential: The evaluation either begins with QUANT data collection and analysis followed by a QUAL data collection and analysis or vice versa. Designs can also be classified according to whether the QUANT or QUAL components of the overall design are dominant. Figure 4 gives an example of a sequential mixed-method evaluation of the adoption of new seed varieties by different types of farmer. The evaluation begins with a QUANT survey to construct a typology of farmers and this is followed by QUAL data collection (observation, in-depth interviews) and the preparation of case studies. The analysis is conducted qualitatively. This would be classified as a sequential mixed-method design where the QUAL approach is dominant.
- Parallel: The QUANT and QUAL components are conducted at the same time. Figure 5, which illustrates a multi-level evaluation of a school feeding programme might also include some parallel components. For example, QUANT observation checklists of student behavior in classrooms might be applied at the same time as QUAL in-depth interviews are being conducted with teachers.
- Multi-level: The evaluation is conducted on various levels at the same time, as illustrated by the multi-level evaluation of the effects of a school feeding programme on school enrolment and attendance (Figure 5). The evaluation is conducted at the level of the school district, the school, classrooms and teachers, students and families. At each level both QUANT and QUAL methods of data collection are used. Multi-level designs are particularly useful for studying the delivery of public services such as education, health, and agricultural extension, where it is necessary to study both how the programme operates at each level and also the interactions between levels.
Triangulation: a powerful tool for assessing validity and for deepening understanding
Triangulation is a very powerful element of the mixed-method approach. It involves using two or more independent sources to assess the validity of data that has been collected and to obtain different interpretations of what actually happened during project implementation and what the effects were on different sectors of the population. Triangulation should be an integral component of the Equity-focused evaluation design and should be used to check the validity of the key indicators of processes, outputs and outcomes that are collected. Triangulation can involve:
- Comparing information collected by different interviewers.
- Comparing information collected at different times (of day, week, or year) or in different locations.
- Comparing information obtained using different data collection methods.
Triangulation can also be used to obtain different perspectives on what actually happened during project implementation and what effects the project had on different groups. This can be done through interviews with individuals, focus groups, review of project documents or participant observation.
B. Attribution, contribution and the importance of the counterfactual
When evaluating the effects of development interventions it is important to distinguish between: changes that have taken place in the target population over the lifetime of the intervention, and impacts that can reasonably be attributed to the effect of the intervention. Statistical impact evaluations estimate the size of the change in the project population (the effect size5), and the statistical probability that the change is due to the intervention and not to external factors. Many evaluations, particularly those conducted under budget and time constraints, only measure changes in the target population and results are often discussed as if they prove causality. It is important to appreciate that change does not equal causality. Interventions operate in a dynamic environment where many economic, social, political, demographic and environmental changes are taking place and where other agencies (government, donors, NGOs) are providing complementary or competing services, or introducing policies, that might affect the target population.
The assessment of impacts or causality requires an estimate of what would have been the condition of the target population if the intervention had not taken place. In order to control for the influence of other factors that might contribute to the observed changes, it is necessary to define a counterfactual. In statistical evaluation designs (experimental and quasi-experimental), the counterfactual is estimated through a comparison group that matches the target population. If the comparison group is well matched, and if the level of change between this and the target group is sufficiently large to be statistically significant, then it is assumed that the difference is due, at least in part, to the effect of the intervention.
In the real-world, it has only proved possible to use statistical comparison groups in a small proportion of interventions, so evaluators have had to use their creativity to define alternative counterfactuals. This is particularly the case for policy interventions and other multi-component programmes, where it is rarely possible to use a statistical comparison group. In addition, a weakness of many statistical evaluation designs is that when expected outcomes are not achieved, it is difficult to know whether this is due to weaknesses in the underlying programme theory and how it is translated into project design (design failure), or whether it is due to problems with how the project was implemented (implementation failure). Economists often call this the “black box” problem, because project implementation is a mysterious black box that is not analyzed and whose effects are not understood. The “black box” has many practical implications, because many clients assume that if an evaluation does not detect any statistically significant project impacts this means it should be terminated, whereas often the recommendation should have been to repeat the project with more attention to how it is implemented.
Therefore, one of the main challenges for Equity-focused evaluations is how to define a credible counterfactual to answer the question “what would have been the situation of the worst-off groups if the intervention had not taken place”? Based on the above, one of the best ways to define credible counterfactual in Equity-focused evaluations is through contribution analysis.
Contribution analysis is used in contexts where two or more donor agencies, as well as one or more national partners, are collaborating on a programme or broad policy reform, and where it is not possible to directly assess the effects of a particular donor on overall outcomes and impacts. Sometimes contribution analysis for a particular donor will be complemented by attribution analysis, assessing the overall outcomes and impacts of the collaborative programmes (for example a poverty reduction strategy), but in most cases no estimates will be made of overall programme outcomes. The purpose of contribution analysis is to assess the contribution that a particular international agency has made to achieving the overall programme objectives.
The simplest form of contribution analysis is to define each stage of the programme (consultation; planning; design; implementation; achievement of outputs and outcomes; dissemination of findings; and sustainability) and to assess the agency’s contribution to each stage. The assessment combines a review of project reports and other documents6 with interviews with other international and national agencies and key informants. Interviews are often open or semi-structured but for large programmes rating scales may be used to assess performance on each component, as well as to assess the agency on dimensions such as collaboration, flexibility (for example with respect to use of funds), promoting broader participation etc. Agencies can also be rated on what they consider to be their areas of comparative advantage, such as knowledge of the national or local context, ability to work with a broader range of actors, or technical expertise.
John Mayne (2008)7 proposes a theory-based approach to contribution analysis that includes the following steps:
- Set-out cause and effect issues to be addressed in the analysis.
- Develop the assumed theory of change and assess the risks to the achievement of the proposed changes.
- Gather the existing evidence relevant to the theory of change.
- Assemble and assess the contribution story (what changes took place, why did they take place and what were the contributions of the agency) as perceived by the agency being studied and by other partners. Identify and assess challenges to this story (for example some stakeholders or informants may not accept the claims made by the agency about their role in the changes).
- Seek out additional information that both supports, and if necessary, challenges the contribution story.
- Revise and strengthen the contribution story.
In complex settings, assemble and assess the complex contribution story.
When using the analysis to assess contributions to the achievement of equity objectives, each stage of the analysis must focus on equity-issues, using the kinds of questions discussed earlier.
C. Equity-focused evaluation at the policy level
The design of an equity-focused evaluation will depend on the nature of the interventions to be evaluated: national policy, programme or project.
While designing a project-level evaluation does not imply particular challenges, it becomes more difficult to evaluate complicated equity-focused programmes using conventional evaluation designs. Sometimes conventional evaluation designs are applied to individual components of the programme and the overall programme performance is assessed by combining findings from the different components with other broader assessments of management, accessibility to the target population etc. When there is a systematic design for determining which individuals or organizations (schools, clinics etc.) receive which services, it may be possible to use a multivariate design that assesses overall outcomes and then assesses the contribution of each main component.
As described below, the evaluation of complex equity-focused policies requires the use of more creative and less quantitatively oriented evaluation methodologies than those used in “simple” project-level Equity-focused evaluations.
This section presents selected approaches to evaluate equity-focused interventions at policy level.
Systems approaches to evaluation7
Most development agencies, including UNICEF, are seeking to improve the welfare of the worst-off groups of society through finding the most effective way to deliver services to these groups, or to improve the performance of national policy, planning and service delivery agencies in reaching and benefiting these groups. All of the development interventions operate in, and often attempt to change, public and private service delivery systems and national governance and policy systems. All of these systems involve many actors and stakeholders, and often involve interventions with many stages. In addition, they operate through, and are affected by, other parts of the system. Interventions are also introduced into systems that have historical traditions (including perceptions about what will and will not work) and traditional ways of doing things. The interventions are also influenced by a wide range of economic, political, organizational, legal, socio-cultural and environmental factors. Finally, many programmes also involve the value systems of different actors concerning the target populations and what programmes and approaches should and should not be introduced.
Most conventional approaches to evaluation tend to address development programmes as largely stand-alone interventions, sometimes including contextual variables as factors affecting, but not really part of, the programme delivery system. Systems approaches have been developed to analyze these kinds of complexity and they offer potentially valuable ways to understand how a particular intervention is affected by, and in turn can influence, the public and private service delivery systems within which programme implementation takes place. Systems approaches can be particularly helpful for evaluating equity-focused policies as many of these operate within, and seek to change, systems which are often resistant to (or are actively opposed to) accepting the proposed focus on the worst-off groups in society.
Systems thinking introduces some radically different ways of thinking about evaluation, all of which are potentially important for Equity-focused evaluation. Some of the ideas, that can be drawn from approaches such as those described above, include:
- Programmes, policies and other kinds of development interventions are normally embedded in an existing social system that has its own historical traditions, linkages among different customers (clients/beneficiaries), actors and owners. The intervention must adapt to the existing system and will often be changed by the system.
- Different actors who may have very different perspectives on how the new intervention operates, and even whether it is accepted at all, will be affected in different ways by these perspectives.
- Systems have boundaries (which may be open or closed), which will often affect how widely the new intervention will be felt.
- New interventions create contradictions and often conflicts and the programme outcomes will be determined by how these conflicts are resolved.
It is not possible to summarize the many different systems thinking methods in the present document but the following three approaches illustrate some of the approaches that could potentially be applied to Equity-focused evaluations. While systems theory often has the image of being incredibly complex, in fact the goal of many approaches, including those described below, is to simplify complex systems down to their essential elements and processes. We illustrate how each of the three approaches could be applied to the evaluation of pro-equity service delivery systems.
The System Dynamics approach
This approach focuses on a particular problem or change that affects the ability of the system to achieve its objectives. It examines the effect of feedback and delays, how the system addresses the problem, and how the different variables in the system interact with each other, and how the effects vary over time. The focus is on system dynamics, adaptation and change rather than on a descriptive snapshot of the system at a particular point in time.
Applying this approach to evaluating the delivery of equity-focuses services (e.g., adapting a current programme aiming to providing pre- and post-natal services to overcome resistance to extending services to vulnerable mothers and their children). The System Dynamics approach would study the way in which the new services were delivered; reactions of targeted mothers (feedback); how this affected the way the services were delivered; the effects of delays in implementing the new services, on the attitude and behavior of different actors, and on the effectiveness of reaching the target population. It would also examine how the introduction of the new service delivery mechanism affected the overall operation of the service delivery system.
Soft Systems Methodology
Soft Systems Methodology focuses on the multiple perspectives of a particular situation. The first step is to provide a “rich picture” of the situation and then to provide a “root definition” (the essential elements) of the situation in terms of:
- the beneficiaries;
- other actors;
- the transformation process (of inputs into outputs);
- the world-views of the main actors;
- the system owners (who have veto power over the system); and,
- environmental constraints.
Once the root definition has been defined a cultural analysis is conducted of the norms, values and politics relevant to the definition.
One or more system models are then defined using only the elements in the root definition (an example of how the systems approach seeks to simplify the system). A key element of the approach is that a new root definition can then be defined based on the perspectives and values of a different customer, actor or owner.
Applying this approach to evaluating the delivery of equity-focuses services. A “rich picture” (detailed description) of the programme would be developedcovering the 6 elements of the root definition.The service delivery system would be examined from the perspective of different elements of the worst-off groups, the different actors and owners. Areas of consensus as well as disagreement or conflict would be examined. Particular attention would be given to the attitude of the different “owners” who have the power to veto the new service delivery systems.
Cultural-Historical Activity Theory
The key elements of the Cultural-Historical Activity Theory approach are that:
- systems have a defined purpose;
- they are multi-voiced (different actors have different perspectives);
- systems are historical and draw strongly from the past;
- changes in a system are produced largely by contradictions which generate tensions and often conflict; and,
- Contradictions provide the primary means by which actors learn and changes take place. The changes can produce further contradictions so processes of change are often cyclical.
Applying this approach to evaluating the delivery of equity-focused services. Actors have different perspectives on whether and how services should be extended to vulnerable groups. These different perspectives - combined with the fact that the changes required to address the needs of worst-off groups can create contradictions - and how these are resolved, will determine how effectively the services reach worst-off groups. The Cultural-Historical Activity Theory approach also stresses that the cyclical nature of processes means that the changed procedures will often result in a cyclical process with further revisions, so that short term success in reaching vulnerable groups should not be assumed to be permanent.
Unpacking complex policies
Many complex policies and other national-level interventions have a number of different components each with different objectives and organized in different ways. Many agencies conclude that most of these interventions are too complicated for a rigorous evaluation to be conducted, or to use any of the conventional comparison group designs. Also, as the interventions are defined at the national level and are intended to operate throughout the country, it is assumed that it is not possible to find a comparison group that is not affected. However, it is often possible to “unpack” the policy into a number of distinct components, making it possible to design a more rigorous evaluation:
- Complex policies can often be broken down into different components, each with clearly defined structures and objectives.
- While policies are formulated at the national level, in many cases they will be implemented and will have measurable outcomes at provincial and local levels.
- Even though policies are intended to cover the whole country, they tend to be implemented in phases, or for different reasons do not reach all areas at the same time. Consequently it is often possible to use pipeline designs (see below) to identify comparison areas that have not yet been affected by the intervention.
Pipeline designs take advantage of the fact that some policy and national-level interventions are implemented in phases (either intentionally or due to unanticipated problems). Consequently the areas, districts or provinces where the intervention has not yet started (but that are scheduled to be covered by future phases) can be used as a comparison group. While there are many situations in which policies are implemented in phases and where pipeline designs can be used, it is important to determine why certain regions have not yet been included and to assess how similar they are to regions already covered. When there is a systematic plan to incorporate different provinces or districts in phases, the pipeline design may work well, but when certain regions have been unintentionally excluded due to problems (administrative or political) the use of the pipeline design may be more problematic8.
Policy gap analysis
Policy gap analysis is a term used to describe analytical approaches that identify key policy priorities and target groups and assess how adequately current and planned policies address these priorities. It reviews the whole spectrum of public sector policies to identify both limitations of individual policies and also problems arising from a lack of coordination between different policies. This analysis is particularly important for equity issues because inequities have multiple causes and require a coordinated public sector approach, and often the worst-off groups fall through gaps in the social safety net. In Central and Eastern Europe, for example, UNICEF adopts a systemic approach to the assessment of the adequacy with which countries address issues of vulnerability as they affect children and their families9.
The analysis is normally conducted at the national level although it can also be applied in a particular region or sector. The analysis normally relies on secondary data from surveys and agency records. Techniques such as quintile analysis are used to identify the worst-off groups and to compare them with other groups through indicators such as school enrolment or use of health services10. If available, studies such as Citizen Report Cards can provide additional useful information.
Often these secondary data sets do not include all of the required data (for example they may not cover both supply and demand-side factors), in which case they may be complemented by other data sources such as records from public service agencies. Techniques such as Bottleneck Analysis or Knowledge, Attitude and Practice (KAP) studies could make a major contribution to the data requirements for policy gap analysis. It is sometimes possible to develop a special module that can be incorporated into an ongoing or planned survey to fill in some of the information gaps. These data sources will normally be complemented by desk reviews, consultation with key informants, focus groups and possibly visits to ministries or service delivery centres.
Using other countries or sectors as the comparison group
For policies that are implemented country-wide or that cover all of the activities of a ministry, one option is to use other countries as a comparator. One or more countries can be selected in the same region. In these cases, it is difficult to use a statistical comparison and the analysis will normally be descriptive, drawing on whatever kinds of comparative data are available. As each country is unique a great deal of interpretation and judgment will be required.
A second option is to draw on the increasingly rich international databases now available. Extensive comparative data is available for most of the MDGs, household socio-economic and demographic conditions, human development indicators, and access to public services. Over the past few years databases are also becoming available on governance and participatory development topics such as corruption and political and community participation11. These databases permit the selection of a large sample of countries with similar socio-economic and other relevant characteristics. Changes in key outcome indicators for the target countries are then compared with other similar countries that have and have not introduced reforms. It is however more difficult to find data relating to worst-off groups and where data is available it will normally apply to income comparisons and will not address other dimensions of inequity.
Sometimes, when a policy is being launched in different ministries or agencies, it may be possible to use as the comparison the ministries where the programme has not yet started. Policy areas where this type of comparison could be considered include: anti-corruption and other kinds of administrative reform, decentralization and financial management. However, these comparisons are difficult to apply as every agency has unique characteristics. Also, it is difficult to obtain baseline data on the situation before the reforms began as information on outcome indicators tends to be limited, not very reliable and difficult to compare with the extensive and more rigorous indicators that the reform programmes tend to generate.
Concept mapping uses interviews with stakeholders or experts to obtain an approximate estimate of policy effectiveness, outcomes or impacts. It is well suited as a tool for Equity-focused evaluation as it allows experts to use their experience and judgment to help define the equity dimensions that should be used to evaluate policies, and then to rate policies on these dimensions. This is particularly useful for the many kinds of equity-focused policies where objective quantitative indicators are difficult to apply. A comparison of the average ratings for areas receiving different levels of intervention, combined with a comparison of ratings before and after the intervention, can provide a counterfactual. A similar approach could be applied to evaluate a wide range of equity-focused policies that seek to increase access by worst-off groups to public services, to provide them with equal treatment under the law, or that protect them from violence and other sources of insecurity.
Many complex equity-focused policies, particularly when supported by several different stakeholders, can include large numbers of different interventions. An equity-focused example would be a gender mainstreaming and women’s empowerment programme that might include components from large numbers of different policy and programme interventions, including many where gender mainstreaming was only one of several objectives. Portfolio analysis is an approach that is commonly used in these cases. All interventions are identified (which can in itself be a challenge) and then classified into performance areas.Adesk review is then conducted to check the kind of information that is available on these projects such as: the existence of a logic model; monitoring data on inputs and outputs; ratings of quality at entry; quality of implementation; quality at completion; and other kinds of evaluation reports. Often there is no clear delineation of the projects to be included in the analysis, and boundary analysis may be required to define criteria for determining which projects should and should not be included.
If the information on each project is sufficiently complete, which often is not the case, projects will be rated on each dimension and summary indicators will be produced for all of the projects in each performance area. For example, quality at entry or during implementation may be assessed in terms of: quality of design; quality of planning; the design and use of the M&E system; and the internal and external efficiency. Where data permits, average ratings will be computed for each of these dimensions and an overall assessment will be produced for quality of entry or implementation. The ratings for the different components (quality at entry etc.) are then combined to obtain an overall assessment for each performance area. Many agencies use the OECD/DAC evaluation criteria for these overall assessments. Additional criteria relevant to humanitarian settings, such as coherence, connectedness and coverage, may also be used.
If resources permit, a sample of projects from each performance area will be selected for carrying-out field studies to compare the data from these secondary sources with experience on the ground. The findings will then be reviewed by a group of experts and stakeholders, and where there are discrepancies between the draft reports and the feedback from this group, further analysis will be conducted to reconcile or explain the reasons for the discrepancies. In some cases the kinds of concept mapping techniques described earlier in this chapter may be used as part of the assessment. In cases where field studies are conducted, concept mapping can also be used to help select the countries or projects to be covered.
D. Equity-focused evaluation at the project and programme levels
Conventional quantitative impact evaluation designs
Project-level impact evaluation designs estimate the contribution of an intervention (project) to the observed changes in an outcome indicator (the change the project seeks to produce). This is done by identifying a comparison group with similar characteristics to the project population, but that has no access to the intervention. The comparison group serves as the control for changes due to external factors unrelated to the project. Figure 7 represents a pre-test/post-test comparison group design. P1 and P2 represent the measurements (surveys, aptitude tests, etc.) taken on the project (treatment) group before and after the project (treatment) has been implemented. C1 and C2 represent the same measurements on the comparison group at the same two points in time. If there is a statistically significant difference in the change that occurs in the project group, compared to the change in the comparison group, and if the two groups are well matched, then this is taken as evidence of a potential project effect. The strength of the statistical analysis is influenced by how closely the project and comparison groups are matched, as well as the size of the sample and the size of the change being estimated (effect size). A careful evaluator will use triangulation (obtaining independent estimates on the causes of the changes from secondary data, key informants, direct observation or other sources) to check the estimates. Ideally the impact evaluation should be repeated several times on similar projects (as in laboratory research) but this is rarely possible in the real world.
The statistical validity of the estimate of project effect (impact) is affected by how well the project and comparison groups are matched. The three main methods for matching, in descending order of statistical precision are:
- Randomized control trials in which subjects are randomly assigned to the project and control groups.
- Quasi-experimental designs12 in which secondary data permits the comparison group to be statistically matched with the project group.
- Quasi-experimental designs in which judgmental matching is used to select the comparison group.
There are a large number of design options that can be considered, but the range of viable options is often limited by budget, availability of secondary data, and when the evaluation began. In the Equity-focused evaluation resource centre a list of 7 basic impact evaluation designs is presented based on: when the evaluation began (start, middle or end of the project); whether there was a comparison group, and whether baseline data was collected on the project and/or the comparison group.
An expanded list with 20 evaluation design options is also presented. This builds on the 7 basic designs but also takes into consideration two sets of factors. Firstly, whether the comparison group (counterfactual) was selected randomly, using a quasi-experimental design with statistical matching or judgmental matching, or whether the counterfactual was based on a qualitative design. Secondly, how was the baseline condition of the project and comparison groups estimated: conducting a baseline survey at the start of the project; “reconstructing” the baseline condition when the evaluation is not commissioned until late in the project cycle; using qualitative methods to estimate baseline conditions; or, no information is collected on the baseline condition of the project and comparison groups.
Estimating project impacts using non-experimental designs
Non-experimental designs do not include a matched comparison group (statistical counterfactual) so it is not possible to control statistically for the influence of other factors that might have produced the changes in the output indicators. It is useful to distinguish between situations where a non-experimental design is used as the default option, because time and resource constraints do not permit the use of a comparison group; and situations where, in the judgment of the evaluators a non-experimental design is the methodologically strongest evaluation that can be used. Situations where non-experimental design might be considered the best design include:
- When the project involves complex processes of behavioral change that are difficult to quantify.
- When the outcomes are not known in advance, as they will either depend on the decisions of project participants or on interactions with the other actors.
- When many of the outcomes are qualitative and difficult to measure.
- When each project operates in a different local setting and where elements of this setting are likely to affect outcomes.
- Where there is more interest in understanding the implementation process than in measuring outcomes.
- Where the project is expected to evolve slowly over a relatively long period of time.
Potentially strong non-experimental designs
Some of the potentially strong non-experimental designs that could be considered include:
- Single case analysis: This is a pre-test/post-test comparison of a single case (such as a child suffering from behavioral problems in a classroom). The baseline observation, before the treatment, is taken as the counterfactual. The treatment is applied at least three times, and if a significant change is observed on each occasion (usually based on the observation ratings of experts) then the treatment is considered to have been effective. The experiment would then be conducted again in a slightly different setting to gradually build up data on when and why it works.
- Longitudinal designs. The subject group, community or organization is observed continuously, or periodically over a long period of time, to describe the process of change and how this is affected by the contextual factors in the local setting. One option is to select a small sample of individuals, households or communities who are visited constantly over a long period of time (panel study). This approach is useful for understanding behavioral change, for example in relations between spouses as a result of a programme to promote women’s economic empowerment. It has been used successfully to evaluate, for example, the effects of microcredit programmes on women’s empowerment. A second option is to observe the group or community over a long period of time, to monitor, for example, changes in the level of gender-based violence in the community or market.
- Interrupted time series. The design can be used when a series of observations at regular intervals is available over a long period of time, starting well before the intervention takes place and continuing after the intervention. The analysis examines whether there is a break in the intercept or the slope at the point where the intervention took place. This method has been widely used to evaluate, for example, the impact on the number of road accidents of new anti-drinking legislation.
- Case study designs. A sample of case studies is selected to represent the different categories or typologies of interest to the evaluation. The typologies may be defined on the basis of quantitative analysis of survey data or they may be defined from the quantitative diagnostic study. The cases describe how different groups respond to the project intervention and this provides an estimate of project impacts.
E. Feasibility analysis
Once the evaluation design has been proposed it is important to assess its feasibility. This involves questions such as: Can the data be collected? Will it be collected within the budget and time constraints? Can the design address the key evaluation questions? Will the evidence be considered credible by key stakeholders? The feasibility analysis must also assess the credibility of the proposed counterfactual – particularly when non-experimental designs are used.
An important issue, that weakens the validity of the findings of many evaluation designs, is the point in the project cycle at which the evaluation is conducted. Due to administrative requirements and pressure to show that the project is achieving its intended outcomes, many impact evaluations are commissioned when it is still too early in the project cycle to assess outcomes. For example, it may require several years before a girls’ secondary education project can have an effect on age at marriage or teenage pregnancies; but due to donor or government pressure the evaluation may be conducted at the end of the first year before the programme has had an effect.
The evaluation manager must ensure that fieldwork meets evaluation method standards for gathering evidence to support findings and recommendations on the intervention’s contribution to equity. Defining the tools for data collection and analysis is the first part of implementing a successful evaluation process. The next section describes some of the tools appropriate for Equity-focused evaluations. In addition to being robust and generating reliable data, the tools selected should maximize the participation of stakeholders identified in the stakeholder analysis, allowing for active, free, meaningful participation by all.
Collecting data and analyzing contextual factors
When designing an Equity-focused evaluation it is important to understand the context within which the intervention has been implemented, and the factors that affected implementation and accessibility to the different worst-off groups. It is also important to understand the perceptions and attitudes of implementing agencies and society towards the different worst-off groups.
In most situations it will be useful to conduct a rapid diagnostic study to understand the intervention and its context. The type of study will be determined by the size and complexity of the intervention; how familiar UNICEF and its partners are with this type of intervention and with the locations where it will be implemented. For a small intervention implemented in only a few locations, it may be possible to conduct the diagnostic study in a few weeks; for a large and widely dispersed intervention significantly more time may be required.
The following are some of the kinds of information that the diagnostic study will usually cover:
- How are problems the intervention is designed to address, currently being addressed? Do other agencies provide these services? Are there traditional approaches for addressing the problems?
- What are the opinions of different sectors of the community concerning these services? Who uses them and who does not?
- Have similar projects been tried earlier? How did they work out? Why were they discontinued?
- Which groups are most affected by the problems to be addressed? Would they be considered as worst off, and if so in which category would they be classified?
- What are the reasons for lack of access of different groups to the services? How would these be categorized in the bottleneck framework?
- Are there any cultural attitudes or practices that affect access to the planned services and how they are used – particularly by worst-off groups?
Diagnostic studies will normally use one or more of the following data collection methods:
- Participant observation13. One or more researchers live in the community or become involved in the group or organization as participating members or as people who are known and trusted. The goal is to live the experience of the project and of living in the community in the same way as other residents, rather than simply observing as an outsider. It is important to be aware of the ethical implications in cases where the researcher does not fully explain who s/he is and why s/he is living in the community or participating in the group.
- Non-participant observation. Many kinds of observation are possible without having to become accepted as a member of the community. For example: it is possible to observe how water and fuel are collected, transported and used and the kinds of conflicts and problems that this causes; and the use and maintenance of social infrastructure such as community centres, schools, drainage channels, and children’s playgrounds. A lot can be learned by watching people entering health centres, village banks and schools14. It is important to recognize that the presence of an outsider in the community will change behavior however inconspicuous they try to be. For example, drug dealers may move elsewhere and residents may not engage in informal business activities not permitted by the housing authority.
- Rapid household surveys. If the number of questions is kept short, it is often possible to conduct a large number of interviews in a relatively short period of time. For collecting information on hard-to-reach groups or excluded groups, it is generally better to use people from the community or from local organizations. It is of course necessary to ensure that local interviewers have the necessary experience and credibility.
- Key informants. Key informants are a valuable source of information on all of the questions mentioned above and for understanding relations within the community and between the community and outside agencies (government, private sector and NGOs). Key informants are not only government officials, academics, religious leaders and donor agencies, but also representatives from worst-off groups and in general, anyone who has extensive knowledge on the questions being studied. Teenagers will be a principal source of information on why teenagers do, and do not, attend school. Key informants always present information from a particular perspective, so it is important to select a sample of informants who are likely to have different points of view to counterbalance each other. The use of triangulation is important when attempting to reconcile information obtained from different informants.
- Local experts. These are people who are likely to have more extensive and credible knowledge on topics such as health statistics, availability of public services, crime, school attendance and overall economic conditions. However, many experts may have their own perspectives and biases which must be taken into consideration. For example, the local police chief may be anxious to prove that crime rates have gone down since s/he was appointed (or that crime has increased if s/he is seeking support for a budget increase!).
- Focus groups. Groups of 5-8 people are selected to cover all the main groups of interest to a particular study. For example, each group might represent a particular kind of farmer or small business owner; poorer and better off women with children in primary school; men and women of different ages, and perhaps economic levels, who use public transport. While most focus groups select participants who come from the same category of interest group as the study, another strategy is to combine different kinds of people in the same group15.
Collecting and analyzing information to understand knowledge, attitude and practices
Knowledge, attitude and practices information on public services should be collected in relation to different groups:
- Worst-off groups. Information is needed on their understanding of the nature of health and other problems and the actions they must take to address these problems. Worst-off groups suffer from multiple problems so that taking actions, such as coming to a clinic or detox centre, or acquiring and using contraceptives, can be difficult and in some cases dangerous.
- Service delivery agencies. Information needs relate to their attitudes and how they interact with worst-off groups. There are a wide range of knowledge gaps and ingrained attitudes (including fear and distrust, and feelings of superiority) that affect their adoption of open and supportive behavior.
- Policy-makers and planners who often make assumptions about worst-off groups, the causes of their problems and how they will respond to the provision of services. Often attitudes are based on “factoids” which are assumptions and bits of knowledge widely believed to be true, but which are often false or only partially true.
The information on attitudes and beliefs, and the evaluation of the effectiveness of different interventions in changing them, can be collected through Knowledge, Attitude and Practice (KAP) studies, using the following questions:
- Knowledge: was information about the intervention disseminated? Did it reach all groups of the target population, including worst-off groups, and was it understood?
- Attitudes: what did people, including worst-off groups, think about the new programmes or information? Did they agree with it or not?
- Behavior (Practice): did they change their behavior? If they agreed with the information/programme did they adopt it? Was it properly implemented? If they did not adopt it, why was this: was it due to lack of access, to the attitudes or behavior of other household members, or to contextual factors?
Figure 8 illustrates the framework of a KAP study assessing the effectiveness of a campaign to introduce bed-nets to reduce malaria among pregnant women. The questions about practice are similar to the questions in the bottleneck analysis about demand for services and effective utilization.
The five steps in designing a KAP study are the following:
Step 1: Domain identification: defining the intervention, the knowledge to be communicated, the attitudes to be measured and the indicators of acceptance and use.
Step 2: Identifying the target audience: in the example of bed-nets it would be necessary to decide whether the campaign is just targeted at pregnant women, or also at other family members, or other members of the community (such as traditional birth attendants), and perhaps local health professionals.
Step 3: Defining the sampling methods:
- Defining the population to be sampled: the geographical areas and the target populations.
- Defining the sample selection procedures: often the population will be broken down into sub-groups, each with special interests or issues. If worst-off groups who are difficult to identify and to reach are targeted, special sampling procedures might be required, such as, snowball sampling; quota sampling; multi-stage sampling; requesting assistance from key informants or group leaders; sociometric techniques; and, identifying people in locations known to be frequented by the targeted worst-off groups.
Step 4: Defining the data collection procedures: ideallyKAP studies should use a mixed-method data collection strategy combining the following types of quantitative and qualitative data collection methods:
- Sample surveys.
- Observation (participant or non-participant).
- Key informant interviews.
- Focus groups.
- Inclusion of questions in an omnibus survey questionnaire already planned.
- Project records.
- Secondary data sources such as reports and records from other agencies and previously conducted surveys.
Step 5: Analysis and reporting: this follows standard practices for survey analysis and reporting on focus groups, key informants and observation studies.
Collecting and analyzing information on the quality of services delivered and the satisfaction of citizens
Citizen report cards16
Citizen Report Cards are based on large surveys that typically cover a major urban area (the first study was conducted in Bangalore, India). The survey asks households which public service agencies (education, health, police, transport, water etc.) they have had to contact within the last 12 months to address a particular problem. For each agency they are asked: were they able to resolve their problem; how many visits were required; how were they treated by agency staff; did they have to pay bribes (if so how many and how much). Average ratings are calculated for each agency on each dimension. The surveys may be repeated (usually 2-3 years later) to measure changes in performance. Samples can be designed to over-sample worst-off populations (for example the Bangalore study included a separate stratum for slum dwellers). Studies can either cover all of the main public service agencies or they can just focus on a particular sector such as health or education.
Experience shows that the credibility and independence of the research agency is critical as a typical reaction of agencies is to claim that the findings are not representative and to challenge the professional competence or motives of the research agency. For the same reason it is important to have a sufficiently large sample to be able to disaggregate the data by different worst-off groups.
Carrying out cost-effectiveness studies to compare costs and results of alternative interventions
Cost effectiveness is the defining element of a method for comparing both the costs and the results of different options for addressing particular goals. Criteria for measuring effectiveness must be similar among the different options for a cost-effective comparison. Effectiveness estimates are based on the usual experimental, quasi-experimental or statistical designs. Cost estimates are based on a careful specification of required resources and their market values. Selection of the options having the greatest effectiveness per unit of cost will generally provide the largest overall impact for a given resource constraint. In performing a cost-effectiveness analysis, adequate scrutiny must be given to both the cost measurement and the estimation of outcomes (Levin 2005).
Cost-effectiveness analysis is used for comparing different services or delivery systems. It may involve a comparison between average costs of service delivery and costs for reaching special groups (e.g. worst-off groups), or it may involve comparisons between different delivery systems for reaching special groups. Some of the key elements in cost-effectiveness analysis include:
- Ensuring that the services to be compared are equivalent. For example, it is not possible to compare directly the cost-estimates for a malaria control programme run by an NGO and involving orientation sessions and follow-up home visits, in addition to the malaria treatment, with a government programme that only involves the handing out of bed-nets and tablets with no orientation or follow-up.
- Identifying all of the costs of the programmes being compared and ensuring that they are measured in an equivalent way, and that any hidden subsidies are identified and monetized. For example, some NGOs may obtain free services from volunteer doctors whereas the government programme includes the full cost of doctors. On the other hand an NGO may have to pay rent for their clinic whereas the government programme may be provided with the space in the local health centre at no charge.
- Ensuring that standard definitions are used to record the number of users. This is critical because the average (unit) cost is calculated by dividing the total cost by the number of people treated. So it is important to clarify, for example, whether a mother who brings her child for a check-up and is given malaria treatment by the nurse (even though she had not come for this purpose), is counted as a person who was treated for malaria prevention. In multi-service clinics, how this is defined can have a major effect on the average cost estimates.
- A final issue concerns the question of scaling-up. Many programmes start on a small scale and if they are considered successful it will then often be recommended that they should be replicated on a larger scale. However, it is difficult to estimate how scale-up will affect costs. While there may be economies of scale from working with a larger number of patients/clients, on the other hand the larger organizational effort will require additional administrative staff, and perhaps more expensive computer systems. So care must be taken when assuming that because a small programme is relatively inexpensive that the same will be true if the programme is replicated on a larger scale.
Public expenditure tracking studies18
Public expenditure tracking studies (PETS) track the percentage of budget funds approved for front-line service delivery agencies such as schools and health clinics, that actually reach these agencies. The studies are important because in some cases it has been found that less than 20% of approved budget funds actually reach the schools or clinics. The studies have proved to be an effective advocacy tool, mobilizing the media, public opinion and intended beneficiaries, to pressure government to improve the delivery of funds. If data is available it would be possible to track the proportion of funds that reach the programmes targeted at worst-off groups.
The studies involve a very careful review of disbursement procedures, combined with interviews with agency staff to track the flow of funds, to note the delay in transfer from one level to another, and the proportion of funds that get lost at each stage.
Public expenditure Benefit Incidence Analysis19
Benefit Incidence Analysis (BIA) estimates the effectiveness with which public expenditure in sectors such as health and education reach worst-off groups. Normally the analysis focuses on access to services by income quintile as data is more readily available on these groups, and the analysis is rarely able to examine other dimensions of inequity (such as female-headed households, and families with physically or mentally disabled children). The analysis requires three types of data:
- Government spending on a service (net of any cost recovery fees, out of pocket expenses by users of the service or user fees);
- Public utilization of the service; and
- The socioeconomic characteristics of the population using the service.
The analysis can either be used at one point in time or it can be repeated to assess the effects of new legislation or external factors, such as a financial crisis, on expenditure incidence. BIA has been used extensively in the preparation of national poverty reduction strategy programmes (PRSP) but it could have other applications and is a potentially useful tool for Equity-focused evaluations. Ideally BIA should be considered as one of several tools used for Equity-focused evaluations, with the weaknesses in data on aspects such as quality, and utilization by different household members etc., being complemented with techniques such as bottleneck analysis or KAP studies.
BIA assesses the proportion of the health or education expenditure that benefit particular groups of users, such as households in each income quintile. Two limitations of BIA for Equity-focused evaluation and particularly for focusing on children are: data is normally not available on the quality of services, and data is normally only available at the level of the household so that it is not possible to examine access by different household members. This is critical for Equity-focused evaluation as there will often be differences in the frequency with which boy and girl children are taken to the health clinic, or the frequency with which women and men use services.
5.1 Selecting the appropriate evaluation framework
A. Theory-based evaluation
- Using Programme Theory to Evaluate Complicated and Complex Aspects of Interventions (http://www.rismes.it/pdf/rogers_complex.pdf)
B. The bottleneck analysis framework
- Marginal Budgeting for Bottlenecks
- Health service coverage and its evaluation by T. Tanahashi
- Highlights on Marginal Budgeting for Bottlenecks (MBB), UNICEF
5.2 Selecting the appropriate evaluation design
A. Mixed methods designs
- QUANT and QUAL Approaches to Different Stages of the Evaluation Process, by Bamberger
- Examples of evaluation designs at each point on the QUANT-Mixed-QUAL continuum, by Bamberger
- How QUANT and QUAL approaches complement each other at different stages of an evaluation, by Bamberger
C. Equity-focused evaluation at the policy level
Thinking Systemically: Seeing from Simple to Complex in Impact Evaluation
(http://www.3ieimpact.org/userfiles/file/5_1 Rogers Cairo Final.ppt)
Beyond Logframe; Using Systems Concepts in Evaluation
Concept Mapping Resource Guide on the Internet
Using Concept Mapping in Evaluation
- Using concept mapping to evaluate equity-focused policy interventions, by Bamberger
- Case study illustrating the evaluation of a complex program : Evaluation of World Bank support for gender and development 2002-08, by Bamberger
- “Unpacking” complex policy interventions, by Bamberger
D. Equity-focused evaluations at the project and program levels
- Comparing Probability, Purposive and Mixed Methods Sampling Strategies, by Bamberger
- Expanded list of experimental, quasi-experimental and non-experimental evaluation design options, by Bamberger
- Non-experimental designs (NEDs), by Bamberger
- Seven Basic Impact Evaluation Design Frameworks, by Bamberger
5.3 Collecting and analyzing data
Collecting and analysing qualitative information to understand Knowledge, Attitude and Practices (KAP)
- A Guide to Developing Knowledge, Attitude And Practice Surveys (http://www.stoptb.org/assets/documents/resources/publications/acsm/ACSM_KAP%20GUIDE.pdf)
Collecting and analyzing information on the quality of services delivered and satisfaction of citizens
Citizen report cards
- Citizens’ Report Cards On Public Services: Bangalore, India (http://siteresources.worldbank.org/INTEMPOWERMENT/Resources/14832_Bangalore-web.pdf)
- Improving Local Governance and Service Delivery: Citizen Report Card Learning Tool Kit (http://www.citizenreportcard.com/crc/pdf/manual.pdf)
- Improving Local Governance and Pro-Poor Service Delivery – Citizen Report Card Learning Toolkit (http://www.citizenreportcard.com)
- An assessment of the impact of Bangalore citizen report cards on the performance of public agencies (http://go.worldbank.org/8BWVVNK7N0)
Carrying out cost-effectiveness studies to compare costs and results of alternative intervention
- Using Cost-Effectiveness Analysis for Setting Health Priorities (http://dcp2.org/file/150/DCPP-CostEffectiveness.pdf)
- Guide to Analyzing the Cost-Effectiveness of Community Public Health Prevention Approaches (http://aspe.hhs.gov/health/reports/06/cphpa/report.pdf)
- WHO Guide to Cost-Effectiveness Analysis (http://www.who.int/choice/publications/p_2003_generalised_cea.pdf)
Public expenditure tracking (PETS) studies
- Public Expenditure Tracking Surveys – Application in Uganda, Tanzania, Ghana and Honduras (http://siteresources.worldbank.org/INTEMPOWERMENT/Resources/15109_PETS_Case_Study.pdf)
- Public Expenditure Tracking and Facility Surveys: A General Note on Methodology (http://siteresources.worldbank.org/INTPCENG/1143380-1116506243290/20511062/exptrack.pdf)
- Public Expenditure Tracking Survey in Education (http://www.unesco.org/iiep/PDF/pubs/Reinikka.pdf)
- Public Expenditure Performance in Rwanda: Evidence from a Public Expenditure Tracking Study in the Health and Education Sectors (http://www.worldbank.org/afr/wps/wp45.htm)
- PETS Education Tanzania (http://www.ncg.no/index.asp?id=34843)
Public expenditure Benefit Incidence Analysis (BIA)
- Impact of Government Budgets on Poverty and Gender Equality (http://data.undp.org.in/hdrc/GndrInitv/wrkgppr/Impact of goverment budgets on poverty and gender equality.pdf)
- Integrating Gender into Benefit Incidence and Demand Analysis (http://www.cfnpp.cornell.edu/images/wp167.pdf)
- Benefit incidence analysis in developing countries (http://ideas.repec.org/p/wbk/wbrwps/1015.html)
- Marginal Benefit Incidence Analysis of Public Spending in Nigeria (http://www.pep-net.org/fileadmin/medias/pdf/files_events/8th-PEPmeeting2010-Dakar/papers/Reuben_Adelou_Alabi.pdf)
1 Sue Funnell and Patricia Rogers (2011) Purposeful Program Theory. Jossey-Bass Publications.
2 Examples of indicators of adequacy of utilization include: does the pregnant mother sleep under the bed-net provided through the programme? Does the under-nourished child receive the entire intended nutritional supplement or is it shared with siblings? Do women and children receive the medical services free of charge (as intended) or do they have to pay out-of-the pocket to health centre staff?
3 Examples of artifacts are: photographs and religious symbols in houses, clothing styles, posters and graffiti and different kinds of written documents.
4 An example of conversion of a QUAL indicator into a QUANT variable would be when a contextual analysis describes the status of the local economy. This may be described in words. This can be converted into a dummy variable (Localeconomy) where: Economy is growing = 1; economy is not growing = 0.
5 The effect size is the average difference in the change score in the outcome indicator between the treatment and comparison groups. In pre-test/post-test comparison group design the change score is the difference in the mean pre-test/post-test scores for the project and comparison groups. In a post-test comparison group design the change score is the difference between the means of the two groups. If a single group design is being used in which the project group is compared with an estimate for the total population, the change score is the difference between the mean of the project group and that of the total population. Ideally the difference should be divided by the standard deviation of the change to estimate the size of the standardized effect.
6 Publications, planning documents, and meeting minutes of national planning agencies or line ministries can provide a useful course of information on how these agencies perceive the contribution of different partners. For example, if a donor believes it had a major influence on policy reform it can be interesting to see whether there are references to this in documents such as the Five Year Plan.
7 John Mayne 2008. Contribution analysis: An approach to exploring cause and effect. Rome: Institutional Learning and Change Initiative, ILAC Brief No. 16. May 2008. http://www.cgiar-ilac.org/files/publications/briefs/ILAC_Brief16_contrib...
8 This section is based on Williams, B. (2005), Systems and Systems Thinking in Mathison, S. (editor) Sage Encylopedia of Evaluation pp. 405-412. For examples of how these approaches are applied in practice see Williams, B. and Imam, I. (Eds.), (2007), Systems Concepts in Evaluation: An expert anthology. American Evaluation Association.
9 Very often the excluded regions or areas are poorer, or government agencies have more limited administrative capacity (often due to more limited resources), so there will often be systematic differences between them and the areas where the policies are being implemented – limiting their validity as a comparison area.
10 See Albania: Evaluating the impact of social assistance on reducing child poverty and social exclusion
11 When the quality of secondary data permits, it is possible to use techniques such as social exclusion analysis and multidimensional poverty analysis to identify vulnerability in terms of a much wider range of indicators
12 Picket and Wilson (200), The spirit level ; Hills, Le Grand and Piachuad (2001), Understanding social exclusion; and the UNDP Human Development Index provide examples of the wide range of secondary data sources that are now available for assessing the causes and consequences of vulnerability.
13 A quasi-experimental design (QED) is an evaluation design in which project beneficiaries are either (a) self-selected (only people who know about the project and chose to apply participate) or (b) participants are selected by the project agency or a local government agency in the location where the project will be implemented. In either case the project beneficiaries are not an unbiased sample of the total target population and in most cases the people who enter are likely to have a higher probability of success than the typical target population member. The evaluator then tries to select a comparison group sample that matches as closely as possible the project beneficiaries.
14 For a detailed description of the participant observation approach see Salmen, L (1987) Listen to the People: Evaluation of Development Projects. Salmen lived for 6 months in low-income urban communities in Bolivia and Ecuador to experience the first World Bank low-cost urban housing programmes in the same way as community residents. By living in the community and winning the trust of residents he was able to discover many critical facts that previous evaluation studies had failed to capture. For example, he found that there was a very large undocumented renter population who became worse off as a result of the project, but neither previous researchers nor project management were aware of their existence as they hid whenever outsiders came to the community as they were afraid they would be evicted.
15 In one assessment of women’s access to rural health centres it was observed that women using traditional dress seemed to be treated less well than women in western dress.
16 In a study of reasons why female college students did not use public transport in Lima, Peru; some focus groups were conducted with homogenous groups, such as college-age girls, teenage boys, mothers etc., while others mixed teenage boys and girls and adult men and women. The attitudes to condoning sexual harassment on buses was very different in single sex and mixed groups.
17 For an example of a citizen report card study see Bamberger, MacKay and Ooi (2005), Influential Evaluations: detailed case studies. Case study No. 3 Using Citizen Report Cards to Hold the State to Account in Bangalore, India. Operations Evaluation Department. The World Bank. Available at: www.worldbank.org/oed/ecd
18 Much of the work on cost-effectiveness has been conducted in the areas of education and health. For a good overview of cost-effectiveness methods see Levin, H and McEwan, P. (2011), Cost-Effectiveness Analysis: Methods and Applications. Second Edition. Most of the examples are drawn from education but it provides a good introduction to the general principles. For an introduction to the application of cost-effectiveness in health see Muennig, P. (2008), Cost-effectiveness analysis in health: A practical approach. Wiley Publications
19 For an example of a PETS study applied to education in Uganda see Bamberger and Ooi eds., (2005), Influential Evaluations: Detailed case studies. Case 7 Improving the delivery of primary education services in Uganda through public expenditure tracking surveys. Independent Evaluation Group. World Bank. Available at www.worldbank.org/oed/ecd
20 For an introduction to BIA see Davoodi, Tiongson and Asawanuchit (2003), How useful are benefit incidence analyses of public education and health spending?