ECA Meets Analytics

For trial lawyers, Early Case Assessment (ECA) has always been the process of quickly synthesizing information from multiple sources to craft an initial case strategy. This process typically involves working closely with the client to identify and interview key witnesses, to review important documents, and to develop preliminary discovery and litigation plans.

The explosion of electronic data in the 1990s had many eDiscovery software companies clamoring to develop ECA tools to better manage the data. These tools allowed legal teams to cull data using keywords, dates, and other file characteristics in an attempt to reduce and/or prioritize the files that require review. These preliminary instruments even provided some insight into potential discovery costs associated with litigating. But their utility was limited, compared to what the industry needed.

This ‘early data assessment’, while valuable, often didn’t fully help lawyers to access and analyze information. What data was truly useful, what was not? Which documents would play an essential role in formulating case strategies? Which were simply irrelevant or redundant?

Keywords proved to be an inefficient means of organizing data. They offered a glimmer of insight, but did not deliver a dynamic way for attorneys to understand a case, or the means to help evolve that understanding. Moreso, keyword search alone demonstrated a flawed method of locating potentially relevant files (in terms of both recall and precision). It tended to overlook too many important documents, and “hit” on far too many irrelevant ones. Too wide, too shallow.

Analytics, fortunately, offer the potential to bridge the gap. They enhance a lawyer’s assessment of cases through scientific analysis of the data. ‘Scientific’ being the operable word.

Today, metrics, correlations, associations, occurrences, and algorithms have come to the forefront. (Well, maybe behind the scenes.) When deployed at the ECA stage, analytics can not only inform the development of early case strategy, but it can also provide a more sophisticated means of culling data for review. This makes it great for estimating and reducing overall discovery costs, as well.

What can Analytics do for you?

Standard features found in most eDiscovery analytics tools offer these functions:

1) Conceptual Clustering – Documents are analyzed based on their text, and then complex algorithms group documents together based on their conceptual similarity. Now, related items and topics start to cluster together, for easier observation. Even though the words within the document may be different, clustering will still group documents together, if they are conceptually similar.

Practical Uses of Conceptual Clustering:

    • Find important documents quickly. Use the ‘Concept Wheel’ as a preview index to explore documents. Whether you’re trying to wrap your mind around the data or to look for a ‘smoking gun’, the Concept Clusters will give you a leg up from the very start.
    • Prioritize documents for review. Use of Concept Clusters prioritizes the most important documents and de-prioritizes the less important ones prior to review. In other words, reviewers will get assigned the most important documents first. Likely irrelevant docs fall to the bottom of the pecking order.
    • Assign reviewers with clusters in mind. Some documents may require subject-matter expertise to make proper coding decisions. Using conceptual clusters, technical documents can be assigned to the right reviewers straight off the bat.
    • Batch by Cluster. Clustering will band together conceptually similar documents (also near duplicates) when batching. This way, reviewers will receive similar documents in their batches. Having a batch full of the same type of documents (e.g., 500 emails about fantasy football) will lead to increased efficiency (speed) and more consistent (accurate) coding decisions by reviewers. Clusters deliver similarity, speed, and accuracy.
    • Find similar. This function locates a key document, then uses the analytics to find other documents that are conceptually similar to the document you’re reviewing.

2) Key-Term Expansion
– This tool first identifies conceptually related terms found in your content, and then ranks them in order of relevance. The user dictates the status, grade and order of subjects.

Practical Use:

Start with a keyword. The tool provides a list of similar, or very related, terms. The results allow reviewers to expand the search to include documents containing other near or related terms.

For example, a search for “President Roosevelt” might produce a list such as: Theodore Roosevelt, Teddy Roosevelt, Theodore Roosevelt Jr., Franklin Delano Roosevelt, FDR, Commander-in-Chief, Vice President Roosevelt, Senator Roosevelt, Assemblyman Roosevelt, Eleanor Roosevelt, the Oval Office, Office of the President, POTUS, etc.

When using key-term expansion, a reviewer searching for important documents based on keywords can conduct a much more comprehensive and defensible search. This expansion of terms will produce more meaningful and trustworthy results.

3) Conceptual Search – The tool finds documents conceptually related to a known term or phrase. Comparable documents get grouped together by their correlated concept.

Practical Use:

Imagine you’ve located a key phrase or paragraph. Now you want to find similar ones that correspond to it. Concept searching will hunt for and assemble conceptually similar documents – even if they don’t contain that exact same term(s) used in the initial search. These are documents that would not be found with keyword searching. At the same time, concept searching eliminates false positives from synonyms and polysemes. An attorney can quickly zero in on top priority documents for immediate review.

4) Email Threading – Email threading identifies emails that were once part of the same email thread (or conversation).

Practical Uses:

    • Smart Batching. Assign documents for review by email thread; this way, when a reviewer starts looking at documents they’ll see the original email, then the response, then the next email, etc. to more quickly and accurately understand the content of the conversation. This also helps reduce coding conflicts that can be created when emails from the same thread are spread across multiple reviewers.
    • Inclusive Email Identification – If an email thread goes back and forth 15 times, do you really need to read all 15? Or could you simply look at the last email, start at the bottom, and read up? This is the idea of “Inclusive” or “Unique” emails. Analytics will identify the most comprehensive emails in the thread and suppress the redundant emails. This can easily cull out 30% of the email from a data set and reduce review time & cost.
    • Quality Control – Email threads can be used to spot conflicting coding decisions. For example, if two documents are in the same email thread conversation, how is it that one is marked as Responsive and the other is Non-Responsive? A quick search will identify these conflicts.

5) Near-Duplicate Identification
– Deduplication removes documents that are 100% duplicative, but what happens when they’re only 99% similar? Near Dupe (ND) detection will identify documents that have the same words, in the same order, and group them together. This has nothing to do with conceptual similarity – it’s a literal approach to similarity. So, those emails you get every morning from Yahoo Finance that have almost exactly the same text but with a few slight differences … they’ll be grouped together.

Practical Uses:

    • Sample the Data There are times when you don’t need to look at every document, especially when they’re all very similar. Using ND groups, you can select a “representative” document to represent other, similar documents. In other words, just look at one document from each ND group, not every single one, to get an idea of its importance.
    • Smart Batching – As described above, it makes sense to assign similar documents to a single reviewer. One way to do this is to make sure all members of a ND group are given to a single reviewer; they’ll be in a better position to spot the differences between documents and to make quick coding decisions.
    • Quality Control – ND groups can be used to spot conflicting coding decisions. For example, if two documents are 99% similar, how is it that one is marked as Responsive and the other is Non- Responsive? A quick search will identify these conflicts.
    • Remove Near Dupes – “Argh,” says the Reviewer. “Almost all of the documents are the same and we’re wasting time looking through them all. Can you remove all of the Near Dupes?” The answer is yes, we can, but you need to be careful. If we remove everything that is 95% similar, who’s to say that something important isn’t included in the 5% that’s different? Bottom line: it’s risky to remove near dupes, so proceed with caution.
    • Propagate Coding Decisions – It may be possible, but risky for the same reasons stated above, to only review the “representative” documents. If the representative is Responsive, then the other documents in the same ND group should be responsive.

6) Computer Assisted Review (“Predictive Coding” or “Technology Assisted Review”)
– The goal of computer assisted review is to train the analytics tool to make consistent, reliable responsiveness decisions on large sets of data. This can vastly reduce the volume of documents human review for production.

The Harbor Difference

Harbor’s ECA workflow leverages a processing engine that’s fully integrated into our Relativity environment. It reduces the time it takes to get access to the documents, and it provides those documents in a familiar review format. Once our system ingests data, the reviewer has access to a host of traditional features such as keyword search, reporting, and powerful culling strategies that include deduplication and de-NISTing. This workflow also offers advanced options like data visualization, near-duplication detection, data pivoting, sampling, email threading, clustering and conceptual searching.

Brainspace powers Harbor’s analytics offering and enables a truly unique analytics experience. It dynamically links multiple views of data that encompass: Overview Dashboard, transparent concept search, timeline, document clusters, communication analysis, and structured data facets.

Visual Analytics

Robust tools reveal the story inside your data by using powerful, interactive visualizations–even with the largest datasets. Our Dashboard, Focus Wheel, and Communication Network Graph all link together dynamically to provide multiple perspectives on any data set, or sub-data set.

Transparent Concept Search

Truly transparent Concept search gives reviewers in ECA complete control over the power of analytics, while helping them maintain a clearer understanding. It takes the guess work out of concept expansion, and delivers a versatile, defensible platform for attorneys.

Communication Analysis

State-of-the-art social network visualization enables users to effortlessly navigate the social media graph. It reveals the content and context of conversations, posts, direction of information flow, CC, BCC, and powerful, simple, alias consolidation.

Document Classification

Our unique approach to document classification incorporates multiple active learning methods to accelerate system training, depth and recall for planning and cost analysis, and delivers best-in- class matching results. Review less and decrease costs.

Contact Harbor Litigation, today, to see how our customizable ECA workflows can accelerate case understanding, defensibly reduce data sets, and significantly lower review costs.

Managed eDiscovery Services in the Cloud: The future is Now

Managing eDiscovery in the cloud is in the future for many organizations; but for others it’s the present. Managed eDiscovery offers law firms, corporations, and government entities the tools to control both costs and processes throughout the eDiscovery lifecycle.

There are four major components to eDiscovery operations: people, processes, software, and hardware. Managed eDiscovery allows your people to implement your processes, utilizing vendor software and hardware to run your operations. When the need arises, you have access to the vendor’s expertise. In some cases, you can license software yourself, and install it on the vendor’s hardware for your use.

In a nutshell, managed eDiscovery gives you your own customized eDiscovery solution without the capital outlays, maintenance, upgrades and personnel commitments required to build it yourself.

The Evolution of Managed Services

In the recent past, most legal departments made a choice between vendor-reliance and building in-house eDiscovery capabilities. When in-house capacity was insufficient, the legal department outsourced overflow to vendors.

Many companies found vendor-reliance unacceptable. Cost-predictions were often futile pricing models, compressed data and lack of communication frequently led to invoices that far exceeded estimates. Vendor workflows didn’t always mesh with in-house processes, and “black-box” vendor services caused uncertainty and frustration in setting and meeting expectations.

In response, some legal departments sought to build their own internal eDiscovery capability. This approach had the advantage of process and workflow control. In addition, companies were able to realize cost savings, and some law firms managed to create profit centers from their eDiscovery services.

However, the required investment in technology and expertise made in-house eDiscovery too expensive for the majority of companies and firms. Others made business decisions not to go the in-house route to limit risk exposure or to focus on core offerings. Yet companies and firms without robust litigation support departments found themselves at a competitive disadvantage, and largely powerless to exercise any control over escalating eDiscovery costs.

Market Realities are Changing In-House eDiscovery

Even the companies who did build eDiscovery departments are revisiting their in-house
model because of certain market realities:

    • More complexity. The complexity of some eDiscovery processes has increased with a growing diversity of file types, and the increased diligence expected by courts.
    • Fast-growing unstructured data. The volume of unstructured data continues to grow, and much of it is potentially subject to discovery. Some firms have declined altogether to take on the custodial challenges of big data.
    • Rapid technology changes. eDiscovery technology has undergone rapid change. Fast changes require larger and more frequent ongoing investments in updated technology, along with more personnel and training.
    • Security challenges. Recent highly publicized security breaches have increased the focus on cyber-security and caused some firms to look at ways to mitigate risks.
    • Rapid scaling challenges. When a matter grows larger than expected, the eDiscovery team may find it hard to get capital expenditure approvals for rapid scaling. It may be entirely unfeasible to go through the normal channels to purchase additional hardware and software.
    • Wider attorney acceptance of the cloud. More attorneys are accepting cloud-based solutions, along with more mature offerings and enhanced infrastructure to support them.

How Managed eDiscovery Meets Challenges and Opportunities

Managed eDiscovery presents an alternative “hybrid” option for companies who outsource to vendors, as well as for companies with in-house capability. Companies with in-house litigation support departments lose nothing by adding managed services. They still leverage their experience and knowledge on future matters, maintain their existing workflows, and exercise control over their data. And they gain much lower costs without capital investments, the advantage of rapid scaling, and the ability to outsource services if and when they want to.

How Does Managed eDiscovery Work?

Managed eDiscovery is a combination of cloud computing and support services. Cloud computing is a collection of technologies that allow access to computing power through the internet, instead of an organization’s server room. Managed eDiscovery takes primary advantage of two cloud computing technologies: Software as a Service (SaaS) and Infrastructure as a Service (IaaS).

Software as a Service (SaaS)

Any software application accessible as a web page is considered SaaS. SaaS is commonly used in the legal industry for hosted review. In a pure SaaS model, the software is licensed by the vendor who also takes responsibility for all maintenance including upgrades, patches, security and redundancy. If you need your storage to quickly spike up, your SaaS vendor can ramp up your storage allocation, usually without interrupting existing processes. You pay for the additional storage only for as long as you need it.

Infrastructure as a Service (IaaS)

IaaS grants customers access to servers, routers, storage, and other computing infrastructure over the internet. These services allow companies to utilize the internet for scalable storage and processing cycles. The infrastructure is similar to co-locating equipment at an offsite data center, except you don’t have to buy the equipment. Instead you only pay for what you use, and the environment can be scaled up or down to match the uneven workflow common in e-Discovery.


Typically, an organization uses in-house resources to handle eDiscovery phases through (or up to) collection. After collection, data is uploaded to the service provider’s data center. Some vendors offer high speed FTP (or FTP-like) transfer options while large data sets are often shipped directly to the data center. Your in-house technicians can take over from there and handle any or all phases from processing through production, including the setup and project management of hosted review databases.

With Managed eDiscovery, your technicians and project managers can log into software hosted in a secure data center and perform as much, or as little, of the actual data manipulation and project management as you choose. The service provider fills in the gaps and provides technical assistance. The software can be licensed by the vendor or by you.


For corporations, Managed eDiscovery allows attorneys to push all matters through the company’s workflows in a centralized location, collaborating with outside counsel wherever they’re physically located. Data can easily be harvested once and then used in multiple matters, replicating privilege and redaction calls where appropriate.

Additionally, organizations may find it easier to budget for Managed eDiscovery, as capital expenditures typically require more layers of approval and more advance notice than an expense budget. It’s also easier to manage and predict costs and return on investment with the monthly billing of Managed eDiscovery, instead of the startup costs, depreciation, and labor associated with buying and maintaining your own hardware and software.

Perhaps most importantly, Managed eDiscovery reduces stress on your internal systems and the people who maintain them.

Harbor Litigation Solutions Managed eDiscovery

  • Complete control by your inhouse operations team
  • Greater than 57% savings over building and maintaining the hardware and software yourself
  • Ability to ramp-up quickly with no capital outlays
  • Ability to scale-down when matters come offline
  • Bank-level data security
  • Done-for-you software upgrades and patches
  • Assistance when you need it
  • Not volume based – i.e. no per-GB fees
  • Easily re-use attorney work product across matters
  • Focus on core competencies, not hardware and software