ESI Collection Challenges
Exchange server repositories, including Exchange databases (EDB’s), STM’s, and log journals, are large and complex. The challenges in performing ESI collection from Exchange are many. The first is the fact that Exchange is in constant use. For large organizations obtaining access to a live mail server is next to impossible, as most IT organizations protect it vigilantly.
Another challenge in dealing with Exchange is the sheer size of the database. Most organizations have traditionally let user mailboxes grow quite large, easily 10-15 GB per mailbox. Each gigabyte contains approximately 10,000 emails. Multiplied by the number of users in the organization, it is easy to see that Exchange can contain millions or billions of objects. Finding specific emails and files in an Exchange repository requires the sorting and culling of these objects, a task that is cost prohibitive.
Additionally, Exchange repositories are not designed for streamlined ESI collection. The user mailbox is simply a starting point. Other containers such as other user’s mailboxes, deleted items folders, the dumpster, as well as the logs (or journals) also contain valuable information. No single user mailbox or folder will get you everything you need.
ESI Collection Today
To make collection from an Exchange repository feasible, the discovery process is usually narrowed in scope. A typical request for custodian data focuses only on the email residing in their mailbox or folder. ExMerge is commonly used to extract mailboxes from Exchange. This tool was originally designed to copy messages from an Exchange mailbox into an Outlook PST file. If a request is made for a dozen user mailboxes, ExMerge will extract these mailboxes from the server and isolate all the content so that ESI discovery can begin. Collection speed is a typical concern. In real world scenarios ExMerge collects approximately 1 GB of email per hour. If there are 100 GBs of email to collect, this one step alone can take four days.
ExMerge can only create a PST file compatible with Outlook 2000. This creates critical limitations for ESI discovery. First an Outlook 2000 PST is limited to 2GB. If a mailbox is larger than 2 GB, ExMerge will truncate the collection without any obvious indication that the collection was partial. By looking at the detailed logs, truncation can be identified, but review of these logs then becomes a crucial part of the email collection process. Additionally, the PST files ExMerge creates can only contain ANSI based emails. Unicode emails are corrupted during collection.
An alternative approach to using ExMerge is to work with recent backups of the Exchange server. Backups are performed by the IT organization on a regular basis. These backups are often stored on offline tapes and kept offsite for safe keeping. Getting access to these tapes is easier that getting access to the live mail server. Therefore many collection efforts are performed on data from backup tapes.


Share this