Program evaluation is the systematic application of social science research tools to analysis of the impact of public sector programs. Canada was the second country to get on the program evaluation bandwagon in the mid-1960s, following the lead of the federal government in the United States, which had embarked on the same trend a few years earlier. Social scientists working for Canada’s federal government evaluated selected government programs from the mid-1960s to the late 1970s, but it was not until around 1980 that new computer technology made data analysis for the systematic evaluation of programs widely available. In 1981, the Canadian Evaluation Society was formed, and before long it had nearly two thousand members, most of whom were involved in conducting or sponsoring evaluations of government programs.
Program evaluation promised major improvements in the effectiveness and efficiency of government programs, but by about 1990, the bloom was off the rose and a general skepticism set in about the usefulness of program evaluation techniques. Today, there is a new realism in the public sector about program evaluation. There is recognition that given the proper conditions, program evaluation can be an important addition to the arsenal of accountability mechanisms, along with the traditional audit, scrutiny by parliamentary committees, and critical probing by the opposition parties.
During the past seven years, it appears that there has been an explosion of interest in the utilization of program evaluation techniques in Europe and other parts of the world, as reflected by the birth of new evaluation societies. The European Evaluation Society was formed in 1994, and in the same year, the United Kingdom Evaluation Society appeared. Evaluation societies were started in Australia in 1991, France in 1996, Germany, Italy and Switzerland in 1997, and Malaysia in 1999. Recently, evaluation societies have been formed in Russia, Belgium, Ethiopia, Finland, Ghana, Israel and Kenya. No doubt, their members will have the same innocent enthusiasm that Canadian and American evaluators started out with, but it is hoped that these new societies can take advantage of some of the lessons learned by their North American counterparts.
I will begin by summarizing the development of program evaluation in Canada, and then will review some of the problems that the movement has encountered. I will conclude with some recommendations that I hope might lead to a more steady success rate for program evaluation in the future.
Origins of Program Evaluation in Canada
In the year of Canada’s creation in 1867, the Treasury Board was established as a committee of cabinet responsible for scrutinizing the expenditure budget prior to its approval by Parliament. Deputy Ministers could be grilled at Treasury Board meetings, or later on by officials in the Treasury Board Secretariat, about the efficiency and effectiveness of their operations. In the 1870s, the office of the Auditor General was created, and until the 1960s it relied on traditional audit techniques such as the financial audit and compliance audit to promote accountability and prevent fraud and waste. The quality of the civil service began to be upgraded during the first two decades of the 20th century through the creation and refinement of the Public Service Commission. However, it was not until the 1960s that dissatisfaction with the limits of the traditional audit techniques surfaced in a serious way.
There were two reasons for this change in attitude. First, it was becoming evident that the phenomenal growth of the public service after the Second World War was making it difficult to ensure the accountability of new and expensive government programs, and so there was a demand both from politicians and senior administrators for a review of the effectiveness of these programs using the newly-developed social science techniques of investigation. Second, from 1960 until 1973, Canada had an outspoken Auditor General, Maxwell Henderson, who began to comment publicly on the effectiveness -- or lack thereof -- of government programs in his annual reports. Thanks to Henderson’s style, the Auditor General’s reports, for the first time, became newsworthy items, and sometimes they proved embarrassing to the government. In 1962, the Auditor General’s office announced that it would go beyond the traditional audit approach, and use new techniques to try to measure the effectiveness of programs in a broader way. In 1969, the government of Prime Minister Pierre Trudeau tried to reign in the new initiatives of the Auditor General, but the legislation intended to do this was shelved because of public opposition.
Another significant development in the early 1960s was the report of a Royal Commission on Government Organization, which emphasized the need for better accountability mechanisms. In 1966, the Treasury Board directed government agencies to monitor and evaluate their activities, and in 1968, Planning and Programming Budgeting (PPB) was mandated, two years after its debut in the U.S. PPB made it much easier to identify programs as distinct entities and to know most of the costs associated with them. This change opened up more possibilities for program evaluation because it became easier to identify the purpose, scope and cost of particular programs. Similarly, PPB opened up more possibilities for the use of cost-benefit analysis as an evaluation tool.
During the 1970s, the emphasis on the need for evaluation of programs continued, spurred on by the institution of an operational performance measurement system across the federal public service in 1973, and the creation of the Office of the Comptroller General (OCG) in 1978. Within the OCG was a program evaluation branch intended to ensure a more rational expenditure budget process. In 1976, the federal government mandated all departments to implement performance measurement, to undertake periodic evaluations of their programs, and to put a process in place to accomplish this within four years. A year later, the Treasury Board issued its first formal program evaluation policy, setting out the expectations of evaluations and mandating all departments and agencies to establish three to five year cycles for evaluating all of their programs.
A Royal Commission on Financial Management and Accountability, which reported in 1979, recommended the more widespread institutionalization of program evaluation, and the improvement of program evaluation techniques. Because of the appearance of a new generation of mainframe and mini-computers around the same time, and the creation of new computer programmes that could handle data entered with a keyboard and screen rather than punch cards, the opportunities for sophisticated and relatively inexpensive program evaluations increased enormously. As well, new versions of computer programs like SPSS and SAS that could provide inexpensive and more user-friendly access to data analysis made it possible for smaller agencies to adopt evaluation techniques, and for consultants to conduct more comprehensive evaluations at less cost. The general institutionalization of program evaluation was somewhat slow to materialize, however. The Treasury Board Secretariat discovered in 1981 that out of 58 federal agencies and departments, only 12 had adequate program evaluation procedures in place.
During the early 1960s, the newly-developed technique of cost-benefit analysis came to the attention of officials in the Treasury Board, and this approach was strongly recommended as a key tool for program evaluators. In 1976, the Treasury Board published a comprehensive Benefit-Cost Analysis Guide, including how to choose the appropriate discount rate and how to present a net present value profile, and this guide was made widely available to program evaluators and the academic community.
Program evaluation began to be used in the provincial and municipal public services as well as the federal public service, but these other orders of government did not institutionalize the same systematic demands for the evaluation of all programs that the federal government had. The frequency of use of program evaluation by these governments was therefore much more dependent upon the judgment of individual managers about whether the conduct of program evaluations would be beneficial. Often, this judgment depended on a manager’s knowledge of evaluation techniques, which among the older generations of managers was minimal.
Along with the increased demand for program evaluations in the early 1980s came the development of text books, journals, and academic courses. The new literature on program evaluation advocated the greater use of standardized approaches than had occurred in the past. For example, the experience with program evaluations in the 1970s indicated that a successful evaluation depends upon a clear and generally-agreed on set of program goals and objectives. Because so many government programs are the result of input from diverse sources and political compromise, the goals and objectives of programs are sometimes not clear enough for an effective evaluation to take place. As well, even where program goals are clear, some programs have not been in existence long enough to have had a fair a chance to achieve their goals. Therefore, it became standard to begin most evaluations by conducting an “evaluability assessment” to determine whether a program was evaluable, prior to the conduct of a full-scale evaluation.
The literature suggested that full-scale evaluations ought to be classified as either summative (i.e. designed to determine whether a program should continue or be scrapped) or formative (i.e. designed primarily to recommend improvements to a program), and highlighted a variety of standard types of research designs which could be selected to fit the evaluation of particular programs. The literature generally supported the involvement of stakeholders in formulating the evaluation research design, and contracting with the program sponsors in order to ensure that they had bought into the process and that the process was clear. Because of early advances made in program evaluations in education, health and psychology, much of the literature of the early 1980s recommended procedures that were developed in these disciplines.
The growth of the practice of program evaluation led to the establishment of a number of program evaluation consulting firms, as well as a plethora of individual consulting practices (often started by former public servants), and the development of program evaluation services in the consulting wings of the major public audit firms. Government expenditures on program evaluation contracts increased significantly during the early 1980s.
The first half of the 1980s therefore represented the first golden age of program evaluation in Canada. The creation of the Canadian Evaluation Society in 1981 and its remarkable growth in membership testified to the fact that program evaluation, often referred to as an exciting new social science, had come of age.
Problems Encountered by the Program Evaluation Movement
In 1985, the year-old federal Conservative Government, which was bent on reducing the cost of government, created a Task Force on Program Review known as the Nielsen Task Force. It was expected that the results of all the program evaluations conducted during the early 1980s would play a major role in the recommendations of the Task Force. I attended a meeting in Ottawa in 1986 where deputy ministers were reporting on the procedures followed by the Nielsen Task Force and the process of cutting budgets that followed it. I asked how the results of program evaluations had impacted budget-cutting decisions. I was told that in the end, not much. The cabinet had decided that programs needed to be cut more or less equitably across the regions, also taking into account the relative strength of the Conservative party in each region. Therefore, some programs shown by evaluations to be excellent and effective in Ontario were cut, and some programs evaluated as failures in the West, Quebec and the Atlantic Region, where the Conservatives happened to have most of their support, were continued. Clearly, program evaluation was not working as the theory intended it to.
But the Nielsen Task Force marked the beginning of a new skepticism with program evaluation in more ways than one. The Task Force’s consultations across federal departments indicated a general distrust of program evaluation. The report of the Task Force stated blatantly that “many study teams reported that routine government program evaluations were generally useless and inadequate.” The Task Force concluded that program evaluations tended to be self-serving because they were usually sponsored by deputy ministers who had a vested interest in the outcome, and because they tended to be formative rather than summative. The criticism of program evaluation was coming not only from the conservative side of the political spectrum. David Zussman, who supervised the transition from the Conservative to the Liberal government in 1993, had written that evaluations have tended to result in changes to official forms and program rationales, but have not generally resulted in improving the programs themselves.
Several years ago, I asked a former Deputy Minister of Social Services in Alberta to address my program evaluation class at York University. He was a psychologist by profession, and had been involved in nearly 200 program evaluations at various points in his career, either as a member of the group being evaluated, as the evaluator, or as the sponsor of an evaluation. He knew a great deal about program evaluation, and was a proponent of effective evaluation. However, of the nearly 200 evaluations that he had had something to do with, he said that only a handful had been successful in that they had resulted in significant improvements to programs. Two or three more, he said, were partially successful. The rest, in his opinion, had been a waste of time and money. Why did he think that program evaluations tended to be so ineffective?
Most importantly, he cited the failure of most evaluators to ask the right questions, in other words, to ask questions that addressed the key problems in the program being evaluated. This failure is often related to the lack of knowledge that evaluators have about the programs they are evaluating. He said that good evaluators need to spend up to 80 per cent of their time checking and re-checking with the program sponsors to ensure that they are addressing the most important issues in an effective way.
Second, he cited political interference. This was not necessarily partisan interference, but rather interference by managers with a vested interest in the outcome of the evaluation. They may try to influence the results of the evaluation while it is being conducted, or may try to discredit the results if they do not agree with them.
Third, there is the resistance to evaluation that evaluators often encounter from those working in a program. Evaluators are often suspected of having an agenda that is hostile to the program being evaluated. As well, at time the lack of knowledge of evaluators about the program they are evaluating destroys their credibility in the early stages, and this interferes with the ability of the evaluators to obtain the cooperation they need to conduct the evaluation.
Finally, even if a good evaluation is done, its recommendations are often relegated to the back burner of the relevant manager’s agenda unless there has been prior agreement to implement the recommendations of the evaluation, and to report on steps taken toward implementation. Had there been adequate follow-through, the former deputy minister thought that perhaps another 30 to 40 evaluations might have been successful.
Müller-Clemm and Barnes, an evaluation consultant and a psychologist, conducted a comprehensive review of federal program evaluation experience in 1997. They suggested similar reasons for the failure of program evaluations to live up to the high expectations placed on them in the early 1980s, including general resistance to evaluation activities among program staff, and failure to address the right questions. In addition, they cited the lack of a generally accepted theory about the purpose and practice of evaluation, and the failure of many evaluators to recognize that programs do not serve homogeneous groups, thus requiring different tools of analysis for the different target groups of particular programs.
From the mid-1980s to the early 1990s, cost-benefit analysis also lost its gloss as an evaluation tool. Cost-benefit analysis had become too technical, and therefore cost-benefit reports tended to be incomprehensible except to the economists who specialized in the art. Thus, both program managers and officials in the Treasury Board tended to distrust a cost- benefit analysis unless a second cost-benefit analysis, conducted independently from the first, produced the same results. Thus, cost-benefit analysis became excessively expensive and time- consuming. As well, it became evident that cost-benefit analysis tended to over-simplify program issues, and to leave out of the calculation important benefits that could not be measured in monetary terms. In the early 1990s, the World Bank conducted a retrospective study of cost- benefit studies it had commissioned, and discovered that only about a third had come close to predicting future costs and benefits somewhat accurately.
The history of public administration is based on the assumption that there are solutions to every problem, and we experiment with potential solutions to problems until we find those that work in an acceptable way. Assuming that program evaluation is a potentially useful technique, therefore, it is a question of which potential solutions to the problems of evaluation outlined earlier to try first.
But a prior question is whether program evaluation is, indeed, a useful enough technique that attempts to revive it are justified. After all, we could return to an era of seat-of-the pants judgments about the utility of government programs, which are at least kept free from fraud and unacceptable levels of waste through the more traditional accounting techniques. My view is that program evaluation is worth reviving because in a democracy, citizens have a right to expect not just absence of fraud and waste, but also quality government services. My preferred definition of accountability is “the ability to demonstrate quality.” Program evaluation, where applied intelligently and judiciously, can significantly increase the quality of public programs.
Following are some possible solutions to the current shortcomings of program evaluation. One of the most effective solutions I know of to the problem of asking the wrong questions, as well as the problem of the resistance that evaluators encounter, is to encourage one or more soft evaluations of a program prior to a hard evaluation. A “soft” evaluation is one that is developed and conducted by those working in the delivery of a program, whereas a “hard” evaluation is one that is carried out by independent persons who have no vested interest in the program, in order to enhance the objectivity of the evaluation. Soft evaluations are often frowned upon by methodological purists because of their possible lack of objectivity, but they can nevertheless serve a useful function. They can introduce program workers and other stakeholders in a program to the techniques of program evaluation, and this involvement can go far to overcoming suspicions about the evaluation process. As a result of my experience in the Alberta public service, I learned that soft evaluations can often result in program workers learning that some parts of their programs are working better than they had expected. The soft evaluations allow them to improve other parts of their programs, and the result is often a feeling of pride in the program. Sometimes the confidence and pride among program staff generated by the findings of soft evaluations can even result in program staff requesting a hard evaluation. (At times, program staff anticipate that the results of a hard evaluation of their program are likely to be so positive that their program will stand a better chance of obtaining the financial resources needed for adequate program delivery.) As well, the soft evaluations can make the stakeholders aware of the information that will be needed for evaluation purposes, and therefore when a hard evaluation eventually takes place, the raw data necessary for it will already exist, and there will be less resistance to making it available.
Another way of encouraging a greater degree of trust in hard evaluations is to encourage the development of a more stable pool of evaluators so that after a period of time, they can become quite familiar with the programs they are evaluating, just as financial auditors become familiar with the inner workings of their clients after being engaged by them for several years in succession. Another benefit of this familiarity would be that the evaluators would be more likely to recognize the multifaceted nature of the target groups of most programs, and therefore less likely to treat them as an undifferentiated, homogeneous whole.
During the 1990s, those conducting cost benefit analyses have attempted to make the procedure more user-friendly and useful. For example, they have made greater efforts to write their reports in plain language, and to include different sets of probabilities about the costs or benefits of various factors so that the users of cost-benefit studies are able to apply their own best judgments to the results. As well, they are attempting to treat benefits that cannot be measured in monetary form with more respect. The Canadian Treasury Board released a new Benefit-Cost Analysis Guide four years ago that reflected these changes, and cost-benefit analysis appears to be experiencing a revival in its credibility.
Since the mid-1980s, leading program evaluators in Canada have advocated written contracts between the program evaluation sponsors and evaluators which require the continued involvement of the evaluators for a year or two after an evaluation. This continued involvement is to ensure that the recommendations of the evaluation will be taken seriously, and that appropriate remedial steps are taken by the program where appropriate. Private evaluation firms of repute that are in high demand are in a position not to take on any clients who will not agree to including such a follow-up clause in the contract. However, those who work in program evaluation within departments, or in the Treasury Board Secretariat, may not be in a position to demand such follow-up unless it is stipulated by Treasury Board policy.
Although there will always be attempts by those with vested interests to skew the results of evaluations, political interference in the process can be minimized in part through stressing the need for objectivity in individual evaluation contracts, and in part through stipulating in the various public service codes of conduct the importance of objectivity in evaluation.
Very possibly, the problem of the lack of a generally accepted theory about the purpose and practice of evaluation might best be tacked by encouraging the development of a self-regulating evaluation profession. For example, an institute for program evaluation might be created, and such an institute could not only ensure that its members have the requisite knowledge of program evaluation techniques, but it could also develop a comprehensive set of generally accepted evaluation principles somewhat analogous to the standards created by associations of internal auditors. As well, a credible institute might be able to use its clout to protect its members from political interference in the evaluation process, and to help to enforce the need for follow-up to ensure that the recommendations of evaluations are taken seriously and are implemented where appropriate.
Finally, we need better educated managers in government who understand both the potential usefulness and the potential problems in program evaluation. Unfortunately, the new public management movement, which claims that good public sector managers require an education in business administration but not public administration, has resulted in there being senior managers in federal and provincial government departments who do not have a basic knowledge of program evaluation. Without that basic knowledge, these managers are not in a position to require appropriate evaluations, or to use the results of evaluations intelligently. Fortunately, leading proponents of public administration in Canada, such as Ian Macdonald, have been working tirelessly to try to ensure that MBA graduates who go on to work for government will obtain a sound basic education in program evaluation and public administration before they leave the university.
Clearly, program evaluation has a number of challenges to overcome before it can become generally accepted as a useful accountability mechanism. Because good public sector managers thrive on meeting challenges, program evaluation may achieve an increased profile as a valuable accountability mechanism -- depending on the extent to which governments are able to recruit and retain good managers.