Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Case law is set free; what next? (googlescholar.blogspot.com)
111 points by anigbrowl on Oct 21, 2014 | hide | past | favorite | 24 comments


Google deserves plenty of credit for Scholar. It's the best public legal research system out there.

Let's not pretend, though, that Google Scholar has "set free" court opinions or offers "open access." You're trapped in Google's ecosystem. You can't download opinions in bulk for any reason, whether it's legal practice or academic analysis. You certainly can't build your own legal research system.

The best "open access" resource I've found, oddly enough, is bulk XML from a Stanford grad student. It includes over 10 million documents in a fairly standard format.

http://webpolicy.org/2012/12/28/advancing-empirical-legal-sc...

http://webpolicy.org/2013/12/29/advancing-empirical-legal-sc...

http://webpolicy.org/2013/05/03/advancing-empirical-legal-sc...

Also, it appears to be replicated on Carl Malamud's public.resource.org.

https://law.resource.org/pub/us/case/federal/

https://law.resource.org/pub/us/case/state/


Here's some suggestions for what next:

- Free access to all of PACER: http://www.plainsite.org/dockets/29himg3wm/california-northe...

- Vendor-agnostic legal citations: http://www.plainsite.org/articles/20140115/a-modest-proposal...

- Standardized digital legal opinions, motions, and other documents: http://www.plainsite.org/articles/20140116/a-notsomodest-pro...

- Better free access to state court systems: http://www.plainsite.org/articles/20140114/xerox-strongly-ur...

- Coming soon: the Bad Lawyer Database (BLDB), a compendium of lawyers who have been formally sanctioned, disciplined, and themselves involved in litigation concerning their conduct


Please, oh please, make this a reality: "By classifying regulations using the same system that science librarians use to organize papers in agriculture, we can determine which scientific papers may form the rationale for particular regulations, and link the regulations to the papers that explain the underlying science."

I cannot begin to describe how incredibly useful this would be. Regulators rely on the information provided by regulatory attorneys to craft their policies, so it's critical that the attorneys have a deep understanding of the issues. And paid legal research tools just don't/can't/wont provide that sort of information. In this vein, it would be wonderful to include social sciences - topics like economics and finance. Perhaps SSRN is a good option.


I work in related field(s) in the technical sense (ie, NLP/Knowledge Engineering etc, but not related to law or legal services at all).

Is this a real problem? Is it really as simple as some ontology linking?


It's a problem I face almost every day.

The role of a regulatory attorney is to explain to the regulatory agency why they should do something (think: PPM in power plant emissions, high-frequency trading controls, restrictions on flight paths and requirements for airport construction, you name it, it's regulated). But the attorney doing the explaining is trained in law, not in whatever the technical subject matter is. So the attorney relies on his client's experts in the field for information. But if an attorney doesn't have a basic understanding of the technical aspects, he won't know what questions to ask to get the right details, and he won't be able to make meaningful strategy decisions. In turn, most regulatory agencies are required by law to make decisions based only on the documents and information provided to them in the hearing/filing process. And all those documents are prepared by attorneys. If the attorneys miss a detail, the regulatory agency misses it too.

I would envision this sort of tool as providing background that will allow the attorney to ask the right questions, rather than a complete education on the technical subject matter.


Thanks. I appreciate you taking the time to write an answer.


Carl Malamud has been a big advocate for making not just case law [1], but public safety codes [2], tax records of non-profits [3] and other supposedly "free" information free and easy to access as well. Efforts to digitize documents that are only available in print format often at considerable expense through scanning [4] and he even had a (failed) KickStarter to make accessible safety codes of the world [5].

[0] http://en.wikipedia.org/wiki/Carl_Malamud

[1] https://law.resource.org

[2] https://law.resource.org/pub/us/code/safety.html

[3] http://philanthropy.com/article/Open-Records-Activist-Shuts/...

[4] https://yeswescan.org

[5] https://www.kickstarter.com/projects/publicresource/public-s...


Some people might be curious how lawyers were able to do legal research efficiently and effectively before computers and online legal databases. Here is a brief overview. I'll write in the present tense for convenience, but keep in mind that the "present" for this is before the age of computer law. Circa 1960, for instance.

First, let's look at statutes. The output of Congress is a stream of laws. The first law the comes out of Congress #X and is signed by the President is Public Law #X-1. The second one is P.L. #X-2, and so on. These public laws are collected and published in a series of books called the "Statutes at Large".

The Statutes at Large is not a convenient tool for legal research, since it is just a sequential listing of the laws passed by Congress and signed by the President. There's no organization by subject, so in theory you would have to look at everything starting at page 1 of the first volume up to the last page of the present volume (and then look at laws that have been passed since the last volume was printed...) and note which are relevant to the problem you are researching.

To make it easier to find law, private publishers took the Public Laws and organized them into a code. A code is basically a statement of the law, organized by subject rather than chronologically. These privately published codes had no official status. The official statement of the law remained the Statutes at Large.

In 1874 Congress made an official code of the US laws, and they updated it in 1878. These were authoritative, by which I mean if the code said one thing and the underlying Public Law said something else, the code version won. This meant that if you were researching a topic in, say, 1890, you would start at the 1878 code, and then only had to look at the Statutes at Large from between then and 1890, instead of having to go back to the very beginning.

Congress got its act together in the '20s, and started producing an official United States Code, and updating it every six years, with annual supplements. It's important to note that the USC is not automatically the official authoritative statement of the law. The law codified in the USC only supersedes the Statutes at Large when Congress explicitly says so. Congress does so on a title by title basis, by passing a law that basically says that title X of the USC is now the complete statement of a particular area of law, superseding all prior Public Laws in that area.

In addition to the official USC, published by the government, there are unofficial versions. The most important is the United States Code Annotated, published by a private company, West Publishing. It consists of the text of the USC code with, as you've probably guessed from the name, annotations supplied by West. The annotations give for each thing in the USC a list of appellate cases that have cited or construed that part of the court, along with a summary of what that case said. They also give legislative history information. The USCA is immensely useful. Suppose you have a question about fair use in copyright. You can read what the statute says in the USC or in USCA. But in USCA you will also see summaries of hundreds of court cases that have interpreted that statute--organized by West in a logical system based on what aspect of fair use they were construing.

Case law is similar. Courts issues opinions, and those opinions are officially collected and published in chronological order in a series of volumes. As with the similar Statutes at Large, this is not the best form for research.

Private publishers stepped in to address this. West Publishing is again one of the top players. They produced an outline of the law...a giant tree that breaks the law down into multiple levels of categories and sub-categories and assigns an identifier to each leaf. As court decisions come out, West takes them, identifies which areas of the law they touch on, and writes a short summary of what the court said about that area of law. This is published in a series of volumes, which is indexed by the leaf identifiers. There are other index volumes published that index this collection by time. The net result is that if you are interested in some particular area of law, you can find it in West's outline, get the relevant identifiers, and then go to West's index books and get pointed to the relevant cases. You can then look those up in West's books and read the notes to find out which cases you need to look at in detail.

That same outline and identifier system in West's case reporters is also used in West's USCA, so it all fits together.

Once you have found a relevant case, you need to find out of it is still good law. You don't want to build your argument around an appellate court decision that was later overturned by the Supreme Court. That is very embarrassing for a lawyer.

For this, you turn to a series of books from the Frank Shepard Company. These books list cases, and then tell you what other cases cite them and how they cite them (e.g., agreed with them, overturned them, distinguished them, and so on).

It sounds a bit awkward, but it actually works very well. With a good law library, you can reasonably answer any legal question concerning the law in your jurisdiction without too much flailing about. There are frequent supplements to the books (each book had a slip in the cover to insert a supplement), and cumulative indexes every so often so that you do not have to go through all the index volumes sequentially to find things.

If you need to research something outside your jurisdiction, it can be harder. A small county in Idaho, for instance, might not have Florida state court decisions in its library, so if for some reason Florida case law is relevant the lawyers or their researchers might need to go to a bigger library.


Have these systems been completely superseded by digital ones, or do many lawyers continue to use paper-based systems for their research?


The digital systems are direct continuations of the paper ones. For example, federal court opinions are published on a court's website in PDF form. West Publishing collects them and prints bound volumes of the Federal Reporter. In parallel, it pulls the text of the opinions into the Westlaw online service, and incorporates the pagination of the printed versions by embedding starred page numbers into the text.

Also, West still categorizes the digital system using the index developed for paper cases. Using this index is usually faster than using search queries. In fact, search really doesn't work very well for legal research, and Google-like free form search works particularly badly (it tends to take you to highly-cited cases rather than highly-relevant ones). I generally start my research by looking up the topic in the digital version of a printed treatise or legal encyclopedia and then doing a reverse citation search of the listed cases. Failing that, I'll do low-level search queries (e.g., "foo" within N words of "bar").

But the only time I ever actually used a book was when I was working for a judge. Every now and then West screws up and the pagination differs between the digital and printed versions of the Federal Reporter. So we'd do our final cite-check using the printed volumes.


The reason the West system works is because law is retrospective. Cases cite older cases. Amendments to laws cite older laws. Much of the "annotation" involves inverting those backlinks, so you can look up legislation and find all the cases which reference it. That process is now, of course, automated.


Law librarians have been complaining for years about the expense of the old paper systems, especially since the companies who produce them keep increasing the price for updates every year. Almost everyone I know now uses equivalent digital systems, although where I work many older lawyers often ask for paper copies of things after you've found the most relevant cases/laws/rules.


As Rayiner says the same systems have digital equivalents now. I'm not a lawyer nor a law student any more (I quit during the 1l year for personal reasons) but I'm still very much into it intellectually. When I'm not in a hurry or just trying to get a grasp of some issue, I much prefer doing it with the books or specialist journals, plus I take a lot of handwritten notes. Walking the stacks and the time/volume limitations of handwriting make for a more enjoyable learning experience. I'm not sure it's necessarily more effective, but I really enjoy the peace and quiet of a library. The less popular the paper books become the more I like it.

The downside though is that many law firms just don't have a good reason to keep up their (expensive) paper-based collections any more, to the point that some actually give the whole lot away to anyone who's willing to box them up an take them. In a decade you'll have a hard time finding a paper library if you don't live near a university, as they're expensive to maintain and fewer towns/counties are bothering to keep them. I've thought about taking up one of these offers before but I don't want to end up on a TV show about hoarders :-/


They've been virtually entirely replaced by digital systems. I graduated from law school a few years ago and we were not taught to use a book-based system to do research. Some much older attorneys use books, but they hardly ever do research anyway, since they have young guns to do it.


I have a dumb question: If the United States Code is the authoritative law, what stops someone from finding some esoteric statute they want to change, and just publishing it differently than what the original legislation says?

It's not as if Congress goes back through it to make sure that no one slipped in a doozy.

And I know Congress could, if they caught wind of such trickery, pass yet more legislation to undo that, but given how onerous and slow-moving that process is, it could be years before it was fixed.

These dumb assholes need to start using git.


Only half of the US Code is authoritative (so called 'positive law titles'). For the rest, the original statutes are authoritative.

For the authoritative part- Congress isn't the only stakeholder who wants to see the law published correctly, fortunately. Businesses, lawyers, federal agencies, and others are looking over the US Code as it is published. In many ways it's like the open source model: many eyes and many copies (e.g. in libraries) keep it safe. The law is 200+ years old, though, so we're still stuck with old-timey ways of doing it for now.

Congress is very quickly modernizing now. They recently converted the US Code into very good XML: http://uscode.house.gov/download/download.shtml. Anyone could copy it into github or run it through diff to see changes.


Providing open access to caselaw (and constitutions, statutes, and periodicals, and ...) is wonderful, but there's a lot of room left in the realm of legal-oriented computing and software. The big boys (Westlaw and LexisNexis) and the other guys (FastCase, Bloomberg, etc.) provide a lot more than case text and search capability. Signals/Shepardization provides a quick way to understand treatment and subsequent history. Headnotes provide expert analysis and highlights for a given document.

Other functionality includes citation formatting, which as this article briefly touches on, can be devilishly complex. The ability to generate a formatted pin cite to a case (or other document) in Bluebook or another format, could be a huge time saver.

Non-US legal systems are a whole other can of worms and a huge opportunity.

These are just a few quick points; again, there's a huge amount of room to innovate in legal-oriented software.


I've wondered in the past how good some expert system like Watson would do with legal databases. There's a lot of speculation on how automation is going to reduce factory jobs, but very little discussion about Watson could maybe wipe out much of human-time needed for legal research. Has the hurdle mostly been that the data is locked up in difficult to access databases?


There isn't a ton of human time needed for legal research as it is. The time is in the fact finding, analysis, and arguments, not finding the law.

I'm a junior associate in a law firm, which means the legal research falls squarely on my shoulders (partners don't do legal research). And even then it's not a huge part of my job. Probably less than 10%.

Better index searching could be of help, but it'll be hard to actually make it better. Legal research databases are a huge industry and they spend a ton on improving search capabilities, but it is hard.

I think the search capability will be evolutionary not really revolutionary. But maybe I'm underestimating Watson.


I imagine the hurdle is simply the will to undertake the work. As the article notes, there are and have been various sources for caselaw. Parsing is of course painful, though XML is now available.


This is true. Sadly there's practically no funding from VCs for it.


The legal vertical market is too small.


I don't see that it includes codes or statutes (not for Washington state anyway). That would seem like something to do next.


Replying as breadcrumbs to an example of a High-Quality HN thread.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: