ECA Meets Analytics

For trial lawyers, Early Case Assessment (ECA) has always been the process of quickly synthesizing information from multiple sources to craft an initial case strategy. This process typically involves working closely with the client to identify and interview key witnesses, to review important documents, and to develop preliminary discovery and litigation plans.

The explosion of electronic data in the 1990s had many eDiscovery software companies clamoring to develop ECA tools to better manage the data. These tools allowed legal teams to cull data using keywords, dates, and other file characteristics in an attempt to reduce and/or prioritize the files that require review. These preliminary instruments even provided some insight into potential discovery costs associated with litigating. But their utility was limited, compared to what the industry needed.

This ‘early data assessment’, while valuable, often didn’t fully help lawyers to access and analyze information. What data was truly useful, what was not? Which documents would play an essential role in formulating case strategies? Which were simply irrelevant or redundant?

Keywords proved to be an inefficient means of organizing data. They offered a glimmer of insight, but did not deliver a dynamic way for attorneys to understand a case, or the means to help evolve that understanding. Moreso, keyword search alone demonstrated a flawed method of locating potentially relevant files (in terms of both recall and precision). It tended to overlook too many important documents, and “hit” on far too many irrelevant ones. Too wide, too shallow.

Analytics, fortunately, offer the potential to bridge the gap. They enhance a lawyer’s assessment of cases through scientific analysis of the data. ‘Scientific’ being the operable word.

Today, metrics, correlations, associations, occurrences, and algorithms have come to the forefront. (Well, maybe behind the scenes.) When deployed at the ECA stage, analytics can not only inform the development of early case strategy, but it can also provide a more sophisticated means of culling data for review. This makes it great for estimating and reducing overall discovery costs, as well.

What can Analytics do for you?

Standard features found in most eDiscovery analytics tools offer these functions:

1) Conceptual Clustering – Documents are analyzed based on their text, and then complex algorithms group documents together based on their conceptual similarity. Now, related items and topics start to cluster together, for easier observation. Even though the words within the document may be different, clustering will still group documents together, if they are conceptually similar.

Practical Uses of Conceptual Clustering:

    • Find important documents quickly. Use the ‘Concept Wheel’ as a preview index to explore documents. Whether you’re trying to wrap your mind around the data or to look for a ‘smoking gun’, the Concept Clusters will give you a leg up from the very start.
    • Prioritize documents for review. Use of Concept Clusters prioritizes the most important documents and de-prioritizes the less important ones prior to review. In other words, reviewers will get assigned the most important documents first. Likely irrelevant docs fall to the bottom of the pecking order.
    • Assign reviewers with clusters in mind. Some documents may require subject-matter expertise to make proper coding decisions. Using conceptual clusters, technical documents can be assigned to the right reviewers straight off the bat.
    • Batch by Cluster. Clustering will band together conceptually similar documents (also near duplicates) when batching. This way, reviewers will receive similar documents in their batches. Having a batch full of the same type of documents (e.g., 500 emails about fantasy football) will lead to increased efficiency (speed) and more consistent (accurate) coding decisions by reviewers. Clusters deliver similarity, speed, and accuracy.
    • Find similar. This function locates a key document, then uses the analytics to find other documents that are conceptually similar to the document you’re reviewing.


2) Key-Term Expansion
– This tool first identifies conceptually related terms found in your content, and then ranks them in order of relevance. The user dictates the status, grade and order of subjects.

Practical Use:

Start with a keyword. The tool provides a list of similar, or very related, terms. The results allow reviewers to expand the search to include documents containing other near or related terms.

For example, a search for “President Roosevelt” might produce a list such as: Theodore Roosevelt, Teddy Roosevelt, Theodore Roosevelt Jr., Franklin Delano Roosevelt, FDR, Commander-in-Chief, Vice President Roosevelt, Senator Roosevelt, Assemblyman Roosevelt, Eleanor Roosevelt, the Oval Office, Office of the President, POTUS, etc.

When using key-term expansion, a reviewer searching for important documents based on keywords can conduct a much more comprehensive and defensible search. This expansion of terms will produce more meaningful and trustworthy results.

3) Conceptual Search – The tool finds documents conceptually related to a known term or phrase. Comparable documents get grouped together by their correlated concept.

Practical Use:

Imagine you’ve located a key phrase or paragraph. Now you want to find similar ones that correspond to it. Concept searching will hunt for and assemble conceptually similar documents – even if they don’t contain that exact same term(s) used in the initial search. These are documents that would not be found with keyword searching. At the same time, concept searching eliminates false positives from synonyms and polysemes. An attorney can quickly zero in on top priority documents for immediate review.

4) Email Threading – Email threading identifies emails that were once part of the same email thread (or conversation).

Practical Uses:

    • Smart Batching. Assign documents for review by email thread; this way, when a reviewer starts looking at documents they’ll see the original email, then the response, then the next email, etc. to more quickly and accurately understand the content of the conversation. This also helps reduce coding conflicts that can be created when emails from the same thread are spread across multiple reviewers.
    • Inclusive Email Identification – If an email thread goes back and forth 15 times, do you really need to read all 15? Or could you simply look at the last email, start at the bottom, and read up? This is the idea of “Inclusive” or “Unique” emails. Analytics will identify the most comprehensive emails in the thread and suppress the redundant emails. This can easily cull out 30% of the email from a data set and reduce review time & cost.
    • Quality Control – Email threads can be used to spot conflicting coding decisions. For example, if two documents are in the same email thread conversation, how is it that one is marked as Responsive and the other is Non-Responsive? A quick search will identify these conflicts.


5) Near-Duplicate Identification
– Deduplication removes documents that are 100% duplicative, but what happens when they’re only 99% similar? Near Dupe (ND) detection will identify documents that have the same words, in the same order, and group them together. This has nothing to do with conceptual similarity – it’s a literal approach to similarity. So, those emails you get every morning from Yahoo Finance that have almost exactly the same text but with a few slight differences … they’ll be grouped together.

Practical Uses:

    • Sample the Data There are times when you don’t need to look at every document, especially when they’re all very similar. Using ND groups, you can select a “representative” document to represent other, similar documents. In other words, just look at one document from each ND group, not every single one, to get an idea of its importance.
    • Smart Batching – As described above, it makes sense to assign similar documents to a single reviewer. One way to do this is to make sure all members of a ND group are given to a single reviewer; they’ll be in a better position to spot the differences between documents and to make quick coding decisions.
    • Quality Control – ND groups can be used to spot conflicting coding decisions. For example, if two documents are 99% similar, how is it that one is marked as Responsive and the other is Non- Responsive? A quick search will identify these conflicts.
    • Remove Near Dupes – “Argh,” says the Reviewer. “Almost all of the documents are the same and we’re wasting time looking through them all. Can you remove all of the Near Dupes?” The answer is yes, we can, but you need to be careful. If we remove everything that is 95% similar, who’s to say that something important isn’t included in the 5% that’s different? Bottom line: it’s risky to remove near dupes, so proceed with caution.
    • Propagate Coding Decisions – It may be possible, but risky for the same reasons stated above, to only review the “representative” documents. If the representative is Responsive, then the other documents in the same ND group should be responsive.


6) Computer Assisted Review (“Predictive Coding” or “Technology Assisted Review”)
– The goal of computer assisted review is to train the analytics tool to make consistent, reliable responsiveness decisions on large sets of data. This can vastly reduce the volume of documents human review for production.

The Harbor Difference

Harbor’s ECA workflow leverages a processing engine that’s fully integrated into our Relativity environment. It reduces the time it takes to get access to the documents, and it provides those documents in a familiar review format. Once our system ingests data, the reviewer has access to a host of traditional features such as keyword search, reporting, and powerful culling strategies that include deduplication and de-NISTing. This workflow also offers advanced options like data visualization, near-duplication detection, data pivoting, sampling, email threading, clustering and conceptual searching.

Brainspace powers Harbor’s analytics offering and enables a truly unique analytics experience. It dynamically links multiple views of data that encompass: Overview Dashboard, transparent concept search, timeline, document clusters, communication analysis, and structured data facets.

Visual Analytics

Robust tools reveal the story inside your data by using powerful, interactive visualizations–even with the largest datasets. Our Dashboard, Focus Wheel, and Communication Network Graph all link together dynamically to provide multiple perspectives on any data set, or sub-data set.

Transparent Concept Search

Truly transparent Concept search gives reviewers in ECA complete control over the power of analytics, while helping them maintain a clearer understanding. It takes the guess work out of concept expansion, and delivers a versatile, defensible platform for attorneys.

Communication Analysis

State-of-the-art social network visualization enables users to effortlessly navigate the social media graph. It reveals the content and context of conversations, posts, direction of information flow, CC, BCC, and powerful, simple, alias consolidation.

Document Classification

Our unique approach to document classification incorporates multiple active learning methods to accelerate system training, depth and recall for planning and cost analysis, and delivers best-in- class matching results. Review less and decrease costs.

Contact Harbor Litigation, today, to see how our customizable ECA workflows can accelerate case understanding, defensibly reduce data sets, and significantly lower review costs.