Once upon a time, I taught a MOOC. Actually it was twice, and really it was only four years ago. It was a course called Metadata: Organizing and Discovering Information, created for the University of North Carolina at Chapel Hill, and delivered through Coursera. Since I taught this course last, however, Coursera announced the closure of their old platform: as of 30 June 2016, courses on the old platform were no longer available — including mine. Coursera suggested downloading materials from courses in which one had enrolled; some third parties developed guidelines for batch downloading Coursera courses. Coursera had been developing a new platform for courses since 2014; any new courses launched in this new platform, or old courses migrated over, remain available.
The possibility that potentially hundreds or even thousands of courses have now disappeared should be enough to cause anyone concern over the fate of online course content. My MOOC going offline happened because the instructor (me) was no longer available to teach it (as I have since left Carolina), and the institution did not migrate it. “Traditional” courses may be orphaned when there is no instructor available to teach it; it seems likely that MOOCs are orphaned just as often. Because let’s face it: instructors leave institutions, they retire, they die, they move into administration. There are any number of reasons why an instructor may no longer be available to teach an online course that he or she created. In the long run, we will all no longer be available to teach the courses we created. What should become of these courses?
Historically, at the end of a semester or a quarter, what became of a course was…nothing. Most courses simply vanished. In the benighted days before learning management systems were invented, very little of a course would persist. To be sure, there are examples of course lectures surviving long after a course was concluded: The Feynman Lectures on Physics at Caltech, Nabokov’s lectures on literature at Cornell and Harvard, and Foucault’s lectures at the Collège de France leap to mind. These, however, are rare exceptions, made even more exceptional by the knowledge on the part of both the lecturers and the institutions that these should be captured. Those of us less famous than Feynman, Nabokov, and Foucault generally have our lecture notes, but those are rarely, if ever, made public. Likewise with notes taken in class by the students, handouts distributed in class, and most other artifacts of the course.
Nowadays, nearly all institutions of higher education have deployed learning management systems. Many courses in higher education are blended courses, to some extent or other. There are several models of blended learning, but all make use of technology platforms of some type: LMSs, repositories of course content, online assessments and exercises, etc. The online components of a course are artifacts, and as such can persist after the end of the course. Most course artifacts can easily be reused by instructors from one semester to the next. While these artifacts persist, however, they are still rarely made public. Many institutions turn off or unpublish courses in their LMS after the end of a semester. Students may have downloaded and kept artifacts from a course, instructors may copy artifacts from one course site to another, but the course itself disappears from the web. And often courses in LMSs were never really on the open web anyway, requiring institutional affiliation and credentials to access even when they were available.
A MOOC does not require institutional affiliation, which of course is part of what makes it open; enrollment in a MOOC is open to anyone, anywhere (at least in theory). A MOOC does, however, require a login to access course content, like a course in an LMS. Also like courses in an LMS, many MOOCs are unpublished after their conclusion. Artifacts of a MOOC may persist, and components may be reused from one offering to the next. But for the most part, like “traditional” courses, a single iteration of a MOOC typically disappears from the web when it is over — and this is especially true for those from the major MOOC providers.
The availability of a MOOC in the short term is nearly a non-issue: like a “traditional” course in an LMS, it either vanishes entirely or is offered again. If it is to be offered again, its availability becomes a middle-term problem: Sustainability. Sustainability is the bugbear of most, if not all, technology initiatives in higher ed. Even those whose job involves thinking about this apparently don’t have any idea what a sustainability model for MOOCs might look like.
Coursera and edX have emerged as the major MOOC providers. While these two organizations have quite different business models — edX is a nonprofit, for starters, while Coursera is for-profit and backed by venture capital — their policies and practices around what happens when a course is completed are fairly similar. Both platforms allow courses to be offered as self-paced: as of this writing, 1,009 edX courses are listed as self-paced. (There seems to be no way to filter Coursera’s offerings for only self-paced courses.) These courses often have minimal instructor involvement. Some institutions disable the discussion forums, for lack of an instructor to moderate them. Some institutions keep the discussion forums open, but enlist what Coursera calls “Mentors” to moderate them: students who completed the course previously and are recruited to be essentially TAs, but managed by Coursera instead of the institution. What resources the institution expends on self-paced courses may directly affect students’ experience of the course. This is an issue that both edX and Coursera explicitly leave up to the institution.
Both edX and Coursera also explicitly leave up to the institution ownership of the intellectual property that is the course itself. As already discussed, it is an affordance of MOOCs (and of online courses generally) that all components of a course are artifacts. Whether or not those artifacts are preserved after the end of the course is up to the instructor and the institution. But as artifacts, these course materials are subject to intellectual property law, in a way that the more ephemeral components of a “traditional” course (such as discussions) are not. Yet discussions, too, become artifacts in an online course. Who owns this material?
Institutions could exert intellectual property rights over the course materials created by faculty: under U.S. Copyright law, course materials fall under the definition of a work for hire, as “a work prepared by an employee within the scope of his or her employment.” Institutions could exert these rights, but they rarely do. A “teacher’s exception” has long been customary, even if not actually written into U.S. Copyright law, allowing for the free sharing of syllabi and other course materials (even if not explicitly licensed under a Creative Commons or other license). An institution could choose to ignore the teacher’s exception, of course, but it would be a spectacular waste of political capital for a Provost to take such a position, and the faculty Senate would likely have a conniption. Furthermore, for an institution to exert IP rights runs the risk of having a chilling effect on this sharing of course materials, which would make it difficult to standardize the teaching of a course across different sections, even within the same department. Many institutions therefore adopt institutional copyright policies specifying that faculty own the copyrights to their own course materials.
These institutional copyright policies cover materials created by the faculty member: syllabi, assignment descriptions, exams, handouts, and the like. They do not, however, cover the emergent components of a course, such as discussions…primarily because such things have not traditionally been subject to copyright. Indeed, such things have not traditionally been things at all; in a “traditional” face-to-face course, discussions are ephemeral. But in a MOOC (as in any online course), discussions are artifacts, and therefore are subject to copyright. Who owns this material?
Both edX and Coursera have similar copyright policies in their Terms of Service, which specify that posting content grants to edX or to Coursera the right to use and make derivative works from this content, but that the user retains all other copyrights to this content. In other words, instructors own their course materials, and students own their assignment deliverables and discussion posts. The student-produced content for a MOOC (as for any course, online or face-to-face) is of course generally tied to the one specific iteration of the course in which the student is enrolled. Thus, student-produced content generally disappears from the web when a single iteration of a course is over. Whether the institution backs up this content and stores it over the long term is up to the institution and the institution’s IT unit. For self-paced courses where the discussion forums are open, however, student-produced content may continue to be available on the web more or less indefinitely. What IP rights an institution may exert on this student-produced content — archived or live — is an issue that has not yet been addressed publicly.
A MOOC offered as self-study or on-demand requires support and resources, which implies an institutional commitment to sustain the MOOCs developed by an institution. Assuming an institution is willing to commit some level of resources, and the institution’s agreement with the provider allows it, a MOOC could be offered as a self-paced course forever. Just because it could, however, doesn’t mean it will. As organizations go, universities are extremely long-lived…but technology companies not so much. It is unclear what the longevity will be of organizations like MOOC providers that partake of both. Both edX and Coursera partner institutions own their own intellectual property, very sensibly separating the content from the dissemination platform. But once we start questioning the longevity of the platform, we must also question the long-term availability of the content disseminated on it, as the recent Coursera announcement makes plain.
Archivists have been thinking about the issue of long-term availability of online content since the advent of the web. In order to get everyone else to think about it too, the deliberately rather alarming phrase “Digital Dark Ages” was coined: the possibility that the obsolescence of file formats, the ease of deleting digital files, the extensive use of dynamic content online, and simply the vanishing of online services, may lead to a gap in the historical record about the current moment. This is the long-term problem of the availability of a MOOC. Here, it’s possible to look to a large body of work on preservation of digital resources generally.
Digital preservation is “the subject of managing digital resources over time and the issues in sustaining access to them.” In other words: There is no such thing as benign neglect. Simply leaving MOOC content in place — simply leaving any online content in place — without further action ensures that it will eventually become unusable. While it’s true that even an entire MOOC requires a fairly trivial amount of disk space (mine, for example, requires a sum total of less than half a gigabyte of storage), that disk will fail eventually. All IT organizations recognize this, and have backup systems in place to manage this risk.
Yet even if an institution’s IT infrastructure were so robust that it could guarantee the integrity of digital files for ever and ever amen, another issue would eventually present itself: technological change. All file formats will eventually be superseded. It’s not a question of if the video formats used in my MOOC will become obsolete, it’s a question of when. The Internet Archive has very successfully addressed this type of digital preservation, via emulation of the environment necessary to run a piece of software (as, for example, in the Console Living Room). The necessity of digital preservation is thus a strong argument for making both the platform and the course content open source. To date, Open edX is the only open source platform for developing and hosting MOOCs, thus likely ensuring that while edX courses may disappear from the web, they will not disappear from history. Using open source software and open standards is the surest way of ensuring the preservation of digital content.
In 1983, Atari dumped thousands of unsold video game cartridges in a landfill. By 2013, the Internet Archive was treating these same video games as an historical collection. How long will it be before my MOOC is so dated that I’ll want to dump it in a landfill? And how long after that will it become an historical artifact?
Noam Chomsky has said: “If you’re teaching today what you were teaching five years ago, either the field is dead or you are.” You may disagree with Chomsky on any number of other subjects, but you have to admit he has a point here. If an instructor has their course in the can, so to speak, and teaches it the same way they have for the past five years, chances are that the content is going to be pretty dated. If an online course has been preserved unchanged for five years, chances are that it’s going to be even more dated.
This is especially true in rapidly moving fields such as my field of information science. I spent some time in my MOOC talking about the NSA’s program of bulk collection of phone metadata, which was new news when I was developing the course. Since then, however, that story has evolved, a great deal of research has been conducted on the kinds of information that can be extracted from phone metadata, and several relevant legal rulings have been issued. Consequently, my treatment of the topic looks at best like a period piece, at worst under-informed. I don’t know how long it will be before these or other parts of my MOOC start to look not just under-informed but downright embarrassing, but five years seems like a good estimate.
While my MOOC may be embarrassing in five years, in 30 years it will be a cultural artifact. How was metadata taught in the immediate wake of the Snowden revelations? How was the subject of use metadata and data exhaust addressed during the historical moment when public discussions of big data and privacy were just beginning? Similar questions could of course be posed about any MOOC, on any subject.
Vint Cerf has proposed that every piece of software should be preserved, along with data about the emulation environment necessary to run it. This is critical, Cerf argues, for future historians to make sense of the current historical moment. My MOOC by itself may not be of great historical significance, but collectively, the many thousands of MOOCs currently in existence are. They may say something profound about pedagogy in the early 21st century, the way that textbooks from the mid-19th century tell us about the philosophy of the American Common School Period. They may be informative about online discussions, the way that studying ancient graffiti is informative about people’s speech patterns. They may say something about the state of video technology, the way that educational films from the 1950s have a distinctive look and production quality. Who knows? It’s not for us to say, really; it’s for future historians to figure out. Our job is to ensure that future historians have the raw materials to do their work.
Coursera closed their old platform over a year ago now; it is already too late to batch download courses. This loss of course materials — and the fact that we do not know the precise scope of that loss — clearly illustrates the hazard of the Digital Dark Ages. It is probably not reasonable to expect Coursera (or any technology company) to preserve the content disseminated on their platform; instead, it falls to the content creator to preserve that content. Everyone involved in developing online course content — whether those courses are made available through Coursera, edX, or any other platform — must therefore participate in the preservation of these materials.
The Internet Archive has a great deal of experience with and infrastructure devoted to digital preservation, and I have nothing but praise for their work. But the Internet Archive is only one organization. I suggest that there is a better place for archived MOOC content to live: at the institution that created it in the first place. Specifically, in an institutional repository: a digital collection of the intellectual output of an institution of higher ed.
Course content has not historically been considered part of the intellectual output of an institution of higher ed, but this is largely because, as discussed above, there have historically been few artifacts from courses that could be considered output. With the advent of online courses, however, that changed. It was once the case that the data on which scholarly publications were based was not made available, and in many cases not preserved at all. Now, however, it is recognized that data is as much a product of scholarship as publications, and scholars are increasingly of the belief that it should be preserved. The same is certainly true for the teaching about that scholarship.
Ultimately, this is a call to action to three groups on campus: faculty, librarians, and the IT staff involved in campus MOOC initiatives. IT units excel at developing backup functionality, but this tends to be for middle-term operational uses; IT staff must approach librarians to bring in expertise in long-term planning for digital preservation. Librarians have plenty of technical expertise, but rely on content owners to contribute material to institutional repositories; librarians must approach IT staff and faculty to bring in online course content. And faculty must be willing to contribute their course materials to their institutional repository, and to insist that the IT unit make it happen. As Coursera’s platform closure shows, the threat of the Digital Dark Ages is real.