Wednesday, October 8, 2014

The role of evaluation in superior education

This post is the second on a trilogy about my reflections on my own teaching activity at the University of São Paulo (USP). Now I want to discuss evaluation, in the sense of that evaluation we perform on our students, to "measure" their learning.

The reader must be aware that in my country evaluation is practically a synonym of (written) tests and grades. We adopt in most of the education levels, a numerical system which runs from 0 to 10. At USP, the student must attain 5.0 in a discipline to be considered "approved" (otherwise he, or she, is considered "reproved" and has to attend again the lectures in another semester).

This value is accessed typically by written tests applied along the semester, which, at least in mathematics, physics and engineering, consist of a set of (traditionally 3 or 4) problems that the student has to solve, normally in a period of 100 minutes and without consultation of the colleagues, written material or personal notes. Each of these problems have a standard "value", the professor corrects the problem attributing this value or a partial one (if the student didn't succeed completely) and the "total value" of the test is the summation of the partial values of the problems. The final grade is calculated as some average of the individual tests. A student who didn't succeed attaining the 5.0 in a given discipline in most of the cases has to repeat the discipline, meaning, attending the lectures and the tests a second time, a third time, until he gets his 5.0.

I do not like this system. In my opinion it mostly accesses the student's capacity to remember something or his (or her) capacity to work under stress conditions. Of course, both things are worth accessing, but not every time. In some conditions, however, tests are unavoidable. We have disciplines with up to 70 student at USP. A personalized subjective evaluation is very difficult in this context.

The numerical system is pernicious. Its effect goes beyond the individual disciplines. For example, at USP we calculate two kinds of "average grades", called "clean" and "dirty". They are weighted averages, respectively, of the approved disciplines or of all disciplines the student attended (the weight factor is the number of credits, meaning lecture hours, the discipline has). This averages are used, for example, to select students for our international exchange programs.

I try to be fair in the tests I write. I believe my tests are so constructed that a student who reached the minimal goals of the discipline get a 5.0. I try to cover all the contents with my questions, but in many cases the student solves only part of the exercises,  "aiming" at 5.0. Of course, I get disappointed and they usually fail. My tests are not made to be only partially solved. I detected, however, a problem in the other side of the spectrum: the excellent student. It is a hell of difficulty to reach 10.0 in a individual test, what to say about three or four tests? I can count on my fingers the number of students who got a 10 in my tests and more than 400 students passed through me in these 13 years.

One could ask why this system is so prevalent. My answer is: because it is convenient.

It is convenient to the professor, in two ways. First, it may be easy to correct. Particularly in the natural sciences the problems have a numerical value. If the student obtained that value, you could consider the problem correctly solved. It is also convenient due to the fact that correction is objective. Either the student succeeded or not. The professor does not need to deal with the (messy) task to subjectively accessing the student's performance. It is also convenient to the student, since he knows what to expect from the evaluation and he can be prepared for it. For example, the student memorizes everything the day before the test, so he can retain the subject in the memory.

I try to reduce the impact of the written tests, or even remove them completely, in my disciplines. In one case, which usually get more students per class I use the (balanced) test, but I grant up to 2.0 points which I call "participation note" which I subjectively evaluate based on several criteria, but, as I always tell my students, one of the criteria is whether I can remember the face of the student at the end of the semester. I will not show the hubris to assume I always give the fairest evaluation, but at least I can get typically one or two students each year (out of 40) with 10 grade.

In a second discipline, which receives typically a lower number of students each year (it is an "optional" discipline, meaning the student chose to be there, and was not forced) I evaluate the students with on line tests in which he (or she) can have access to written material and personal notes and by group activities, in which I evaluate the participation of the student, and not the learned content. I realized the content comes automatically with the group activities, so participating in it is enough to get the content.

Sometimes I also run into trouble with the bureaucracy due to this. Once I had to renew the accreditation of an advanced discipline in the post-graduation review board. I stated in the form that I was practicing "continued evaluation", meaning I apply no special evaluation method, I evaluate the students on site, in real time, during the lectures. They are highly motivated students, mostly experienced researchers or engineers which return to university to get a master or a doctoral degree, so I have no need to apply any evaluation method. It is amazing how much they learn after I tell them this. The review board returned my forms, saying this was forbidden, and that I had to introduce some formal evaluation method. I returned the form stating I would apply a test, I just didn't explain that my test starts in the first minute of the first lecture and ends in the last minute of the last lecture.

In order to finish, I would like to share with you what I learned with my disciplines:


  1. Written tests should be reduced to a minimum, or, if possible, removed at all. If they are unavoidable, they have to be fair according to the faculty rules.
  2. Qualitative evaluation should not be feared, the professor will be wrong from time to time, but in subjects like "the student shall be approved or not", he will probably be right all the time.
  3. One should not fight bureaucracy, if your faculty requires you give a numerical evaluation of your students, do it, but be sure you are truly evaluating the student's performance, according to the rules of the faculty (in particular, a student who learned the minimal contents in my faculty should be rewarded with a 5.0, a student who reached 100% of the content, and they do exist, should receive 10)
  4. A professor should not fear to be qualitative in the evaluation, one may not be fair all the time, but he will get it right in most of the times and the students will respond, learning what you have to teach and not looking only for the grades.