You are here: Home » cases » Lessons the Federal Courts Might Learn from Westlaw’s Prolonged Data Processing Error

Lessons the Federal Courts Might Learn from Westlaw’s Prolonged Data Processing Error

The Thomson Reuters Errata Notice

On April 15, 2016 Thomson Reuters notified subscribers to its online and print case law services that a significant number of U.S. decisions it had published since November 2014 contained errors.

email

Here and there words had been dropped.  The company explained that the errors had been introduced by software run on the electronic texts it collected from the authoring courts.  Thomson posted a list of the affected cases.  The initial list contained some 600 casesA week later it had grown to over 2,500 through the addition of cases loaded on Westlaw but not published in the National Reporter Service (NRS).  Two weeks out the list included links to corrected versions of the affected cases with the restored language highlighted.  The process of making the corrections led Thomson to revise the number of casualties downward (See the list’s entry for U.S. v. Ganias, for example.), but only slightly.

Thomson Reuters sought to minimize the importance of this event, asserting that none of the errors “changed the meaning of the law in the case.”  Commendably, Thomson apologized, acknowledging and detailing the errata.  It spun its handling of the processing error’s discovery as a demonstration of the company’s commitment to transparency.  On closer analysis the episode reveals major defects in the current system for disseminating federal case law (and the case law of those states that, like the lower federal courts, leave key elements of the process to Thomson Reuters).

Failure to View Case Law Publication as a Public Function

Neither the U.S. Courts of Appeals nor the U.S. District Courts have an “official publisher.”  No reporter’s office or similar public agency produces and stamps its seal on consistently formatted, final, citable versions of the judicial opinions rendered by those courts in the way the Reporter of Decisions of the U.S. Supreme Court does for the nation’s highest court.  By default, cemented in by over a century of market dominance and professional practice, that job has fallen to a single commercial firm (originally the West Publishing Company, now by acquisition and merger Thomson Reuters) to gather and publish the decisions of those courts in canonical form.  Although that situation arose during the years in which print was the sole or principal medium of distribution, it has carried over into the digital era.  Failure of the federal judiciary to adopt and implement a system of non-proprietary, medium-neutral citation has allowed it to happen.

With varying degrees of effectiveness, individual court web sites do as they were mandated by Congress in the E-Government Act of 2002.  They provide electronic access to the court’s decisions as they are released.  The online decision files, spread across over one hundred sites, present opinion texts in a diversity of formats.  Crucially, all lack the citation data needed by any legal professional wishing to refer to a particular opinion or passage within it.  Nearly twenty years ago the American Bar Association called upon the nation’s courts to assume the task of assigning citations.  By now the judiciaries in close to one-third of the states have done so.  The federal courts have not.

Major Failings of the Federal Courts’ Existing Approach

Delivery of Decisions with PDF Pagination to Systems that Must Remove It

Several states, including a number that produce large volumes of appellate decisions, placed no cases on the Thomson Reuters errata list.  Conspicuous by their absence, for example, are decisions from the courts of California and New York.  The company’s identification of the software bug combined with inspection of the corrected documents explains why.  Wrote Thomson it all began with an “upgrade to our PDF conversion process.”

The lower federal courts, like those of many states, release their decisions to Thomson Reuters, other redistributors, and the public as PDF files.  The page breaks in these “slip opinion” PDFs have absolutely no enduring value.  Thomson (like Lexis, Bloomberg Law, Casemaker, FastCase, Google Scholar, Ravel Law, and the rest) must remove opinion texts from this electronic delivery package and pull together paragraphs and footnotes that straddle PDF pages.  All the words dropped by Thomson’s “PDF conversion process” were proximate to slip opinion page breaks.  Why are there no California and New York cases on list?  Those states release appellate decisions in less rigid document formats.  California decisions are available in Microsoft Word format as well as PDF.  The New York Law Reporting Bureau releases decisions in htmlSo does Oklahoma; no Oklahoma decisions appear on the Thomson errata list.

Failure to Employ One Consistent Format

The lower federal courts compound the PDF extraction challenge by employing no single consistent format.  Leaving individual judges of the ninety-four district courts to one side, the U.S. Courts of Appeals inflict a range of remarkable different styles on those commercial entities and non-profits that must process their decisions so that they will scroll and present text, footnotes, and interior divisions on the screens of computers, tablets, and phones with reasonable efficiency and consistency.  The Second Circuit’s format features double-spaced texts, numbered lines, and bifurcated footnotes; the Seventh Circuit’s has single-spaced lines, unnumbered, with very few footnotes (none in opinions by Judge Posner).

In contrast the decisions released by the Michigan Supreme Court, although embedded in PDF, reflect a cleanly consistent template.  The same is true of those coming from the supreme courts of Florida, Texas, and Wisconsin.  Decisions from these states do not appear on the Thomson list.

Lack of a Readily Accessible, Authenticated Archive of the Official Version

By its own account it took Thomson Reuters over a year to discover this data processing problem.  With human proofreaders it would not have taken so long.  Patently, they are no longer part of the company’s publication process.  Some of the omitted words would have been invisible to anyone or any software not performing a word-for-word comparison between the decision released by the court and the Westlaw/National Reporter Service version.  Dropping “So ordered” from the end of an opinion or the word “Plaintiff” prior to the party’s name at its beginning fall in this category.  However, the vast majority of the omissions rendered the affected sentence or sentences unintelligible.  At least one removed part of a web site URLOthers dropped citations.  In the case of a number of state courts, a reader perplexed by a commercial service’s version of a decision can readily retrieve an official copy of the opinion text from a public site and compare its language.  That is true, for example, in Illinois.  Anyone reading the 2015 Illinois Supreme Court decision in People v. Smith on Westlaw puzzled by the sentence “¶ 3 The defendant, Mickey D. Smith, was charged in a three-count indictment lawful justification and with intent to cause great bodily harm, shot White in the back with a handgun thereby causing his death.” could have pulled the original, official opinion from the judiciary web site simply by employing a Google search and the decision’s court attached citation (2015 IL 116572), scrolled directly to paragraph 3, and discovered the Westlaw error.  The same holds for the other six published Illinois decisions on the Thomson list.  Since New Mexico also posts final, official versions of its decisions outfitted with public domain citations, it, too, provides a straightforward way for users of Westlaw or any other commercial service to check the accuracy of dubious case data.

The growing digital repository of federal court decisions on the GPO’s FDsys site falls short of the standard set by these state examples.  To begin, it is seriously incomplete.  Over fifty of the entries on the Thomson Reuters list are decisions from the Southern District of New York, a court not yet included in FDsys.  Moreover, since the federal courts employ no system of court applied citation, there is no simple way to retrieve a specific decision from FDsys or to move directly to a puzzling passage within it.  With an unusual party name or docket number the FDsys search utility may prove effective but with a case name like “U.S. v. White” retrieval is a challenge.  A unique citation would make the process far less cumbersome.  However, since the lower federal courts rely on Thomson Reuters to attach enduring citations to their cases (in the form of volume and page numbers in its commercial publications) the texts flow into FDsys without them.

The Ripple of the Thomson Reuters Errors into Other Database Systems

Because the federal courts have allowed the citation data assigned by Thomson Reuters, including the location of interior page breaks, to remain the de facto citation standard for U.S. lawyers and judges, all other publishers are compelled in some degree to draw upon the National Reporter System.  They cannot simply work from the texts released by their deciding courts, but must, once a case has received Thomson editorial treatment and citation assignment, secure at least some of what Thomson has added.  That introduces both unnecessary expense and a second point of data vulnerability to case law dissemination.  Possible approaches range from: (a) extracting only the volume and pagination from the Thomson reports (print or electronic) and inserting that data in the version of the decision released by the court to (b) replacing the court’s original version with a full digital copy of the NRS version.  Whether the other publisher acquires the Thomson Reuters data in electronic form under license or by redigitizing the NRS print reports, the second approach will inevitably pick up errors injected by Thomson Reuters editors and software.  For that reason the recent episode illuminates how the various online research services assemble case data.

Services Unaffected by the Thomson Reuters Glitch

Lexis was not affected by the Thomson Reuters errors because it does not draw decision texts from the National Reporter System.  (That is not to say that Lexis is not capable of committing similar processing errors of its own.  See the first paragraph in the Lexis version of U.S. Ravensberg, 776 f.3d 587 (7th Cir. 2015).)   So that Lexis subscribers can cite opinions using the volume and page numbers assigned by Thomson, Lexis extracts them from the NRS reports and inserts them in the original text.  In other respects, however, it does not conform decision data to that found in Westlaw.  As explained elsewhere its approach is revealed in how the service treats cases that contain internal cross-references.  In the federal courts and other jurisdictions still using print-based citation, a dissenting judge referring to a portion of the majority opinion must use “slip opinion” pagination.  Later when published by Thomson Reuters these “ante at” references are converted by the company’s editors, software, or some combination of the two to the pagination of the volume in which the case appears.  Search recent U.S. Court of Appeals decision on Lexis on the phrase “ante at” and you will discover that in its system they remain in their original “slip opinion” form.  For a single example, compare Judge Garza’s dissenting opinion in In re Deepwater Horizon, 739 F.3d 790 (5th Cir. 2014) as it appears on Lexis with the version on Westlaw or in the pages of the Federal Reporter.

Bloomberg Law appears to draw more extensively on the NRS version of a decision.  Its version of the Garza dissent in In re Deepwater Horizon expresses the cross references in Federal Reporter pagination.  However, like Lexis it does not replace the original “slip opinions” with the versions appearing in the pages of the Federal Reporter.  Examination of a sample of the cases Thomson Reuters has identified as flawed finds that Bloomberg Law, like Lexis, has the dropped language.  Casemaker does as well.

Services that Copy Directly from Thomson’s Reports, Errors and All

In contrast, Fastcase, Google Scholar, and Ravel Law all appear to replace “slip opinions” with digitized texts drawn from the National Reporter System.  As a consequence when Thomson Reuters drops words or makes other changes in an original opinion text so do they.  The Westlaw errors are still to be found in the case data of these other services.

Might FDsys Provide a Solution?

fdsys

Since 2011 decisions from a growing number of federal courts have been collected, authenticated, and digitally stored in their original format as part of the GPO’s FDsys program.  As noted earlier that data gathering is still seriously incomplete.  Furthermore, the GPO role is currently limited to authenticating decision files and adding a very modest set of metadata.  Adding decision identifiers designed to facilitate retrieval of individual cases, ideally designations consistent with emerging norms of medium-neutral citation, would be an enormously useful extension of that role.  So would be the assignment of paragraph numbers throughout decision texts, but regrettably that task properly belongs at the source.  It is time for the Judicial Conference of the United States to revisit vendor and medium neutral citation.


 

Original article