POD 2009 Innovation Award Application

The item below is a nice synthesis of our thinking on the last 18 months of work in the Harvesting Gradebook. It was developed as a Professional and Organizational Development Network in Higher Education (POD) Award application for the 2009 competition.

Author Contact Information
Gary Brown, Theron DesRosier, Jayme Jacobson, Corinna Lo, Nils Peterson
Center for Teaching Learning and Technology
Washington State University
Pullman, WA 99164

Innovation Award Description
Harvesting Gradebook

Category of Innovation Award
Teaching and Learning

A grade book traditionally is a one way reporting mechanism. It reports to students their performance as assessed by the instructor.  This model assumes and implies that students learn primarily from the professor and the professor’s grade.

The Center for Teaching, Learning, & Technology Center at WSU has developed, implemented and assessed an enriched gradebook that affords multiple stakeholders opportunity to assess student work and provide quantitative and qualitative feedback to students and faculty.

Project Description

Nationally, efforts to integrate active learning, critical thinking and assess the outcomes have been elusive.  Faculty’s experience and resources for providing students with feedback that contains multiple perspectives has been limited.

In our work with faculty at a land grant, research institution, faculty preconceptions tend to assume that undergraduates do not have the wherewithal to engage in constructive peer and self-assessment. Many doubt that rich feedback about authentic problems can be offered in any but a few select courses.

This innovation counters these assumptions.  At heart is the WSU Guide to Critical and Integrative Thinking (CITR), an internationally recognized instrument and assessment process.  We have previously shown that students appreciate and can provide rich peer assessments with the CITR, and that those assessments mirror judgments by faculty. We have further shown that online tools help facilitate the process.

In this phase of the innovation we extend the process to a distributed audience of students, peers, faculty in the program and industry professionals around the globe. The resulting feedback and ratings from each of these groups provides invaluable insight and a rich resource breaking the barrier between educational practice and the “real world.”  For instance, reviewers provided insights into:
1.      their perception of the value of the rubric‘s dimensions
2.      the changes in employ-ability of the students based on the work

Scope and Results
We have piloted this Harvesting Gradebook technique in an undergraduate level market forecasting class and a similar Honors class. In the forecasting class, student teams received rating and textual feedback from peers, faculty in the program, and industry professionals at mid-term and again at end of term,

We observed that:

CTLT is presently scaling up its capacity to offer this grading/feedback approach University-wide and we are generalizing the idea of Harvesting Feedback to apply it to the Learning Outcomes aspects of WSU’s University Accreditation activities.

This process is effective and efficient way to gather rich feedback for students working on authentic problems. Students can work in any space on the Internet and post in their workspace a link to an online rubric for their purposes.

They can also send a request for feedback through email. Faculty can easily recruit industry participants and because the rating process is fast, the professionals can contribute without significant time cost. Results can be centrally harvested and reported to students and faculty in real time.

Crowd sourcing to support learning at all levels of the university

We are developing a response here to the article Duke Professor uses ‘Crowdsourcing’ to Grade by Erica Hendry  in the Chronicle.

In our response (which for some reason did not appear as a comment to the Chronicle article but is reproduced in Cathy’s blog) we offered a survey that implements Gary Brown’s Harvesting Gradebook concept. Erica’s article is the object of the review, Cathy’s criteria are the basis of the instrument.

The demonstration we whipped up is a variant of an earlier demonstration of harvesting feedback for programatic assessment that we did in a webinar hosted by the TLT Group. The link is to David Eubank’s insightful review of the demo.

The basic concept is to link a survey to an object on the Internet and invite feedback from viewers, but the criteria are more elaborate than “its hot/its not.” The student gets visualization of the quantitative data and a listing of the qualitative comments.

If you have not tried it already, read Erica’s article and review it here.  The end of the review will take you to a page with the results (its not real time, we’ll  update periodically.)

Some of the angst in the comments to the Chronicle article seems to come from the energy surrounding high-stakes grading activities, and perhaps a recognition that grading does not advance the student’s learning (nor the faculty member’s).

A grade book traditionally is a one way reporting mechanism-it reports to students their performance as assessed by the instructor who designed the activity.   Learning  from grades in this impoverished but pervasive model is largely one way-the student learns, presumably, from the professor’s grade.  What does a student really learn from a letter or number grade? What does the faculty member learn from this transaction that will help him or her improve?  What does a program or institution learn?  We are exploring ways to do better.

We exploring ways to learn from grading by transforming the gradebook, and part of that transforming is to allow others into the process. For example, Alex Halavais rates his  experiment to use crowdsourcing as “revise and resubmit.” In Halavais’ example, students gamed the system, competing for points. The approach we are exploring has a couple key differences. First, the scales we’d advocate (such as WSU’s Critical Thinking Rubric) are absolute (we expect graduating seniors have not reached the top level; faculty map the rubric scale to grades in age-appropriate ways). Second, we imagine outsiders providing feedback, not just peers. When we did a pilot in an Appearal Merchandising course in the Fall of 2008, a group of industry professionals, faculty from the program, and students were all involved in assessing in a capstone course. The results were rich in formative feedback and let the students see how their work would be judged by other faculty in the program and by the people who might hire them.

Further, we have called this a “transformative” approach, because the data can be used by the student, by the instructor, by the program, and by the university, each for purposes of improving on practice.

Getting started with transformative assessment university-wide

Moving a university from a compliance mode of accreditation assessment to a transformative mode is a complex task, yet it is the task brought on by changing requirements of accrediting bodies. To get there, the university (viewed as a collection of learners) needs some scaffolding and some easy place to get started.

Background to the problem

Washington State University is accredited by the Northwest Commission on Colleges and Universities (NWCCU). The NWCCU is engaged in a process to review its standards. The process includes drafting some new standards and converting the review from a decennial calendar to a new septennial review schedule and adding a new catalog of types of reports that institutions must produce.

Regarding these new standards, NWCCU says:

“Standard Three requires evidence of strategic institutional planning that guides planning for the institution as a whole as well as planning for its core themes. Much like the current accreditation model, Standard Four requires assessment of effectiveness and use of results for improvement. However, unlike the current accreditation model, assessments are conducted with respect to the institution’s core themes, rather than its major functions.” [emphasis added]

It goes on to say:

“Goals and intended outcomes, with assessable indicators of achievement, are identified and published for [the institution’s] mission and for each of [its] core themes…

“A core theme is a major concentration, focus, or area of emphasis within the institution’s mission. Examples include, but are not limited to: Developmental education; workforce preparation; lower division transfer education; baccalaureate education; graduate preparation for professional practice; graduate preparation for scholarship and research; service; spiritual formation; student life; preservation of values and culture; personal enrichment; continuing education; academic scholarship; and research to discover, expand, or apply knowledge.”

We can assume that WSU will develop several core themes related to student learning, such as “undergraduate preparation for advanced study and professional careers” and “graduate and professional preparation for scholarship and research.”

NWCCU’s calendar would appear to require the University to provide a Year 1 report in 2011 that answers these points:

Section II: Core Themes

For each Core Theme: [Maximum of three (3) pages per theme]

a. Descriptive Title
b. Goals or Intended Outcomes for the Core Theme
c. Indicators of Achievement of the Core Theme’s Goals or Intended Outcomes
d. Rationale as to Why the Indicators are Assessable and Meaningful Measures of Achievement of the Core Theme’s Goals or Intended Outcomes

For the WSU core themes tied to student learning outcomes, the university will need assessments conducted with respect to the goals of its themes – implemented in ways that can be “rolled up” from program to college to university levels. WSU may elect to use something like its Six Learning Goals as the intended outcomes for its core themes that deal with student learning.

The Problem Statement

The challenge is to help programs move toward having and using indicators of achievement of WSU’s chosen outcomes, and to do so in a way that helps programs and colleges use the data to improve, rather than just performing a compliance activity (that is, developing a transformative approach to their assessment). The further challenge is to accomplish this in a resource-constrained environment.

Theron and I have been looking for a place that all ~100 undergraduate programs at WSU could start working toward meeting these requirements in time for a 2011 delivery date of a Year 1 report.

What we describe here is based around WSU’s goals, but the concept will likely work equally well for another campus with a different set of institutional learning goals

The Strategy

Figure 1 is our whiteboard of the concept. It starts with the idea of collecting sample assignments from each program (along with some metadata about each assignment) in order to provide feedback about the assignments to instructors and the program from their community of “critical friends.”  This process is intended to provide us baseline information. The baseline is in two forms: an assessment practice that programs can build on and data about the university’s teaching practices, such as the types of assignments used in programs and the kind of feedback (beyond grades) that the assignments can provide to learners, along with some demographics about the assignments. (If programs want to do something else, or something more, they can, see below.)

Figure 1. A brainstormed diagram of the kinds of metadata that would need to be collected with each assignment and the graphical analysis that could be done with the data.

Figure 1. A brainstormed diagram of the kinds of metadata that would need to be collected with each assignment and the graphical analysis that could be done with the data.

Here are some benefits that we see:

  • A simple message to communicate.

“Give us 3 assignments, we’ll help you get feedback to improve the assignments AND to meet NWCCU requirements”. (Provost to deans, deans to chairs, chairs to faculty, CTLT staff to WSU community. ) A simple message is less likely to get confused and become a “telephone game” nightmare.

  • A manageable and understandable process.

Every program has assignments. They are easily collected and can be assessed online using Harvesting techniques. Its not everything that might be included in program-level outcomes assessment, but it’s a starting point

  • Feedback from communities.

Specifically, communities important to faculty and instructors.

  • A common reference point.

With most programs doing the same thing, WSU can develop opportunities for shared models, resources and interdisciplinary partnerships.

  • Feedback from a broad range of stakeholders.

Assignments are artifacts that impact many people (downstream faculty, community, industry).  Those stakeholders can answer questions such as, “Does this assignment prepare students for your course? For your workplace? For life outside of the university? How would you improve it to meet your context?”

WSU’s Center for Teaching Learning and Technology has previously done work to map assignments to WSU’s Six Learning Goals using this proposed form (developed with the WSU Honors College): and in this case study with a academic program.

Or the mapping to WSU’s goals could be implemented more indirectly by scoring the assignment with WSU’s Critical and Integrating Thinking Rubric and then mapping the rubric to WSU’s six goals. If the program already has a rubric that is has been using, that rubric could be incorporated into the process and mapped to the WSU goals, see figure 2.

Figure 2.  Diagram of the mapping process from the Food Science rubric to the WSU Six goals and the the University of Idaho (UI) five goals. If the University changes its goals, the mapping can be readily changed, as illustrated by this joint WSU-UI program that is mapping its rubric to WSU's 6 goals and UI's 5 goals simultaneously.

Figure 2. Diagram of the mapping process from the Food Science rubric to the WSU Six goals and the the University of Idaho (UI) five goals. If the University changes its goals, the mapping can be readily changed, as illustrated by this joint WSU-UI program that is mapping its rubric to WSU's 6 goals and UI's 5 goals simultaneously.

The rating itself could be managed in a manner similar to the one we demonstrated to harvest feedback on an assignment, figure 3. Unlike that demonstration, in the pilot year of this proposed assessment plan, programs might only be asked  to provide sample assignments and not the associated student work. In subsequent years, programs may elect to use the full harvesting model, or may elect to use other assessments to provide triangulation to this approach.

Figure 3. Harvesting Feedback process gives results to the instructor (about the assignment) and to the program (about the assignments in aggregate and about the utility of the rubric used).

Figure 3. Harvesting Feedback process gives results to the instructor (about the assignment) and to the program (about the assignments in aggregate and about the utility of the rubric used).

Summary of the Process

1. Assessment for improving, not just proving:

  1. Academic programs deposit 3 sample assignments and fill in a form describing the courses where the assignments are given.
  2. The assignments are scored with a rubric (based on the WSU’s Goals, or the Critical Thinking Rubric, or a rubric that the program already uses to assess student work. The program can choose the rubric.) The program can help recruit the assessors from communities that have interest in the program’s success.
  3. The rubric is mapped to WSU’s chosen goals so the results can be rolled up from Program to College to University levels.
  4. Programs examine the feedback from the assessments, engage in conversations about it and choose action plans for the coming year; instructors can get specific feedback about their assignment to engage in SOTL or other process improvement.
  5. The whole process is assessed with a transformative assessment rubric to judge the quality of the assessment activity and suggest refinements in the assessment practice.

2. Or, if a program already has an assessment procedure (perhaps required by another accrediting body):

  1. The results of that assessment can be submitted, along with a mapping of the relevant data to WSU’s Goals.
  2. The assessment process is assessed, as above, to judge the quality of the assessment activity and to benchmark the program against all other WSU programs.

By using the Harvesting process to implement both the assessment of the assignment and the assessment of the assessments, it should be fairly simple to gather input about the program from a community that is relevant to, and interested in, the academic program’s success and to document the success of the assessment efforts.

Once programs have begun to engage in the discussions that we think these activities will trigger, they may elect to proceed in many directions that include: broadening the scope their assessment activities, refining the assignments or the criteria for assessing them, expanding the communities giving feedback, or assessing student work along with the assignments to gage how effective the assignments really are in impacting student learning.


We have proposed programs get started toward the new accreditation standards by harvesting assessments on their assignments, but this is not the only place to put a toe into the water. Syllabi or student work would be another place to start. Ultimately, we think programs should have direct measures of both student learning and faculty learning, and be able to talk about action plans related to improving that learning and/ or improving the assessment of that learning.

Crowd-sourcing feedback

David Eubanks commented on our recent Harvesting Feedback demo. I’ll save replying about inter-rater reliability to focus now on his suggestion of using Mechanical Turk and the very insightful comment about the end of “enclosed garden” portfolios.

I think David correctly infers that Mechanical Turk is a potential mechanism to crowd-source the Harvesting Feedback process we are demonstrating. Its an Amazon marketplace to broker human expertise. The tasks, “HITs” (Human Intelligence Tasks) are ones that are not well suited to machine intelligence, in fact the site bills itself as “artificial artificial intelligence.”

To explore Mechanical Turk, I joined as a “Worker” to discover that “Requesters” (sources of HITs) can pre-qualify Workers with competency exams. I’m now qualified as a ‘”Headshot” Image Qualifier’ a skill to identify images that meet certain specific criteria important to requester Fred Graver. I also learned that workers earn (or maintain) a HIT approval rate, which is a measure of how well the worker has performed on past tasks. One might think of this as how well the worker is normed with the criteria of the task (though the criteria in this case are not explicit (which is a weakness in our view)). Being qualified for a task might be analogous to initiation to a community of practice; but one would need to then practice “in community” which Mechanical Turk does not seem to support.

We’ve also been exploring a couple other crowd-source feedback sites that help flesh out the character of this landscape. Slashdot and Leaf Trombone (website and video). Slashdot is a technology-related news website that features user-submitted and editor-evaluated current affairs news with a “nerdy” slant. Leaf Trombone is a game for the iPhone that lets you use your iPhone to play a slide trombone to a world audience.

The three systems are summarized in this table:

Mechanical Turk Leaf Trombone Slashdot
Goal of site/ developer’s reason for using reputation in the site Distributed processing of non-computable tasks/ sort for suitable workers Selling an iPhone app/ use ego to encourage players Building a reliable source of information/ screen for editors who can take high level tasks
Type of reputation / Participant’s purpose for having a good reputation Private reputation/ to secure future employment; earn more income Public reputaiton/ status in the community as player and judge; ongoing participation Public reputation/ enhanced opportunity to contribute to the common good (as opposed to being seen as clever fellow
Type of Reward/ Motivation for participant Money/ Personal gain Personal access to perform on world stage/ learning & fun “Karma” to enable future roles in the community/ improve the information in the community
Performance Space/ durability of the performance Private space (enclosed garden)/ durability is unknown, access to the performance is only available to the Requestor Public stage & synchronous; a new playback feature makes performances durable, but private for the artist Public stage & asynchronous/ permanent performance visible to public audience
Kind of feedback to the participant/ durability of the performance Binary (yes/no) per piece of work completed; assessments are accumulated as a lifetime “approval rate” score Rating scale & text comment per performance/ assessment are stored for the performer Rating scale per posting/ assessments are durable and public for both individual items and are accumulated into the user’s “Karma” level
Assessment to improve the system This could be implemented by the individual “Requester” if they desire ? High “Karma” users engage in meta-assessments of the assessors
Kind of learning represented Traditional employer authority sets a task and is arbiter of success; the goal is to weed out unsuccessful workers Synchronous, collaborative individual learning – judge as learner; performer as learner Asynchronous collaborative community learning
Type of crowd-sourcing Industrial model applied to crowd of workers Ad hoc judges gathered as needed for a performance Content and expertise are openly shared

The three systems represent an interesting spectrum, and each might be applied to our challenge of crowd-sourcing feedback. But looking at the different models they would have very different impacts on the process. I believe that only Slashdot’s model could be sustained by a community over an extended period of time, because it is the only one that has the potential to inform the community and build capital for all the participants.

The table above got me to think about another table we made, diagraming 4 models for delivering higher education. At one side of the chart is the industrial, closed, traditional institution. It progresses through MIT’s open courseware and Western Governor’s University’s student collected content and credit for work experiences to the other end of the chart that we called Community-based Learning.

Three rows in our chart addressed the nature of access to expertise, the assessment criteria, and what happens to student work. The table above informs my thinking on those dimensions. As I’ve charted it, in the Slashdot model expertise is open, assessment is open. (while assessment criteria are obscure, the meta-assessment helps the community maintain a collective sense of the criteria) and the contributer’s (learner’s) work remains permanently as a contribution to the community. This is what I think David is referring to when he applauds the demise of the “enclosed garden” portfolio.

A reason to work in public is to take advantage of an open-source/ crowd-wisdom strategy. David illustrated the power of “We smarter than me”  when called our attention to Mechanical Turk.

Another reason is the low cost to implement the model. Recently the UN Global Alliance for Information and Communication Technology and Development (GAID) announced the newly formed University of the People, a non-profit institution offering higher education to the masses. In the press briefing, University of the People founder Shai Reshef said that “this University opened the gate to these [economically disenfranchised] people to continue their studies from home and at minimal cost by using open-source technology, open course materials, e-learning methods and peer-to-peer teaching.” [emphasis added]

We propose that to be successful the University of the People must implement its peer-to-peer teaching as community-based learning and include a community-centric, non-monetary mechanism to crowd-source both assessment and credentialing.