Mullen on Law 2.0+

Entries tagged as ‘bates’

Got 42 billion legal documents?

February 8, 2009 · Leave a Comment

I didn’t think so.

So, I totally agree that we can do away with the Bates system for identifying unique documents in litigation and move towards hashing them instead.

Here’s why:

Question: you’re ITM and you’ve been sued 1,000 times. Probably not an exaggeration.

How often did you get a discovery request for your organizational chart? Probably about 500 times.

Which means that upwards of 500 cases, 5 people on YOUR side [ 2 lawyers, 1 paralegal and 2 document reviewers ] touched that document, meaning 2500 external touches for one document. Even at $ 1.00 per touch (HA!) that’s $500.00 per document over it’s litigation life.

Add in the internal touches, court touches and appellate level touches and I sure hope it was full of pretty colors, because if you’re ITM and you LOST that lawsuit, you’ve just paid double.

This is a technical problem, not a legal one, although the impact upon the legal/litigation community could be severe. What this means is that software developers MUST figure out a better way. Because we can.

This is a great playground to be in, –especially given the economy…

So, in looking at TinyUrl, I wondered why they only used 6 slots for their hash.

For purposes of the guess, a “close enough” estimate of how the algorithm works would be to look at the possible items that go in each slot [a-z] and [0-9] and see how many variations were available, mathematically speaking.

How a hash is generated isn’t important, because there are several well-defined ways of doing so. The only thing that really matters is that the generator interface check that a hash has not already been used and generate another one in the teeny number of exceptions when a double is created.

The math of it is fairly simple, so I’ll spare you the link clicking to Chemical-Ecology and simply give it to you:

a to z is 26 letters plus 10 digits for a total of 36 items in each slot. There are 6 slots, so the formula for this is

36P6 = 36! / (36-6) = 1,402,410,240

Then, I wondered how the total would change if I got greedy and added oooone more slot.

The formula becomes:

36P7 = 36!/(36-7) = 42,072,307,200

That is a LOT of documents. Which means that IBM might be able to fill the bucket, but very few other companies will. In fact, the Forbes top 200 might want their own buckets, which would STILL be better than the current Bates system

[ mathematical corrections are always welcome! ]

Now, according to my Outlook text files, they send the potential variations through the rook with a text version format of [ 76193731-000000DGC.eml ]

Soooo, if we then ask how likely it is that the courts will be deluged with documents on the order of 42 BILLION, then I think it’s safe to say that it makes sense to pile all documents into one bucket and assign them unique litigation numbers rather than have each party Bates stamp their own.

The potential ROI for companies that are repeatedly sued (Forbes top 1000?) is impressive. Can you imagine how much money clients would save, if 90% of discovery documents were found to have already been assigned a number??

It’s a simple matter to ensure that the numbers are truly unique, so the next step would be to tag them appropriately. In fact, companies could keep repositories forever and provide an API to specific discovery requests, rather than actually delivering discovery.

Something to think about in these days of cloud computing!

Categories: Components · Law 2.0+ · Ralph says · Theory
Tagged: , , ,