Cloning vs. Crawling in E-Discovery Processing

Article Posted: December 05, 2008

Today’s litigation requires companies to comprehensively comb through electronic archives during the discovery process to preserve and produce relevant data. Unfortunately cases are sometimes won or lost based on how well the parties perform these tasks rather than on the merits of the case. To avoid getting a case needlessly sidetracked by discovery disputes, organizations are evaluating the best way to preserve and produce information stored on electronic devices. As a result of that evaluation, organizations are evolving from a traditional “crawl” method of finding evidence to a more efficient, comprehensive, and less expensive “cloning” process.

Learning to Crawl
As computerization was incorporated into the litigation process, organizations would send the paper documents gathered together by key players to their attorneys. The attorneys, in turn, would send the documents to a litigation support company and have the documents scanned and processed by OCR (Optical Character Recognition). This process converted the information on the paper documents into a searchable electronic index which greatly increased the efficient review and organization of the information. Once the index was created, lawyers, associates, and paralegals would crawl through the data by searching it to find useful information.

This “crawl” process is the foundation for the traditional e-discovery firm that process electronic data, placing it into an index so that attorneys can sort (crawl) through it to find evidence. Crawl-based processing is limited to non-deleted data that can be read by the software programs used by the attorneys to crawl through the data. Generally this process converts all data into a standard format (such as a tiff or pdf file) which is then linked to the metadata related to that file. Some e-discovery vendors offer to host the data and provide data review tools. Pricing the crawl process is generally based on the amount of data sent to the vendor (gigabytes in) and the amount of data sent from the vendor to the attorney for review (gigabytes out).

The crawl process has some key benefits: a) the identification and extraction of responsive data remains in the hands of the producing party, b) the process normalizes the information into one standard format eliminating the need to have the software necessary to read all the different types of files in their native applications, and c) redaction of data for privilege can be accomplished electronically.

The crawl process, however, has certain disadvantages:

Because it is priced by the amount of data, the process is costly to use as a tool to preserve and review data.

The process is inefficient because it relies upon attorneys and paralegals to crawl through the data to find evidence. For example, many crawl vendors convert all data into a single format because attorneys like to Bates Stamp a sequential number onto each page within each document. This is an expensive conversion process; but is usually done because it appears to attorneys to allay the fear that documents can be altered by a party opponent. (Of course, a party opponent committed to such a forgery could easily modify the document, Bates Stamp the modification, photocopy the resulting document, and substitute it for the original.)

Related Topics: Computer Forensics Network Forensics