How to Set up and Organize an Efficient Smart Document Review

There are many methodologies that can be employed in document review, and no single strategy is right for every set of documents. As new technologies emerge, it is important to think critically about how existing strategies can be updated. For example, traditional linear review (document by document in ingested order) doesn’t have to be a “dumb” review. Coupling a traditional review methodology with technology or even simple grouping methods can dramatically improve consistency and quality, reducing the overall cost of the review.

Documents being reviewed in the traditional fashion are naturally grouped in the order in which they are ingested into the database (using the original structure in which data was collected and processed). This standard grouping, while sufficient, doesn’t take into account the fact that similar documents may have been ingested at different times or that similar documents may exist within other custodians’ data. The fact that related documents are batched apart from one another means it is unlikely the same reviewer will be tasked with reviewing the similar documents. As with any subjective process, those reviewers may reasonably make different coding decisions on those similar documents.

When documents are coded inconsistently it can lead to larger problems in discovery, including the potential for opposing counsel to argue that the process used to identify relevant documents may not have been sufficient.




The simplest method for further grouping documents is to use the metadata associated with those documents. Grouping by date, domains or even the subject line pulls similar data sets together. By batching those groupings together, one reviewer can be assigned similarly themed documents. Not only does this increase speed, because the reviewer is seeing the same or similar documents over and over, but it also makes it less likely that these documents will be coded inconsistently from one another.


Near Deduplication

Near deduplication is not true deduplication based on hash value. Instead it is generally a process where documents are indexed and the indexed text is compared for similarities, therefore identifying near-duplicates based on text/content, not metadata. The user can set the level of similarity that is required to cause a document to be grouped with another document. Since only the text is being compared, documents that are different file types, different date ranges, or from different custodians can still be grouped together. Again, batching based on this grouping/criteria can lead to all the benefits described above.


Email Threading

Email threading is the process of pulling email conversations together. The process ties an original email to all the subsequent replies and forwards pertaining to that original email. This grouping allows the reviewer to see all the related documents in order as one conversation.


Other Grouping

While the grouping methods discussed above might work for many datasets, it is important to consider the data being reviewed to determine if these or other methodologies may work better. For example, a review with mixed file types like email and video files might need to be split and organized in multiple ways.


Regardless of the dataset or budget, clever grouping, or “smart batching,” of datasets can dramatically increase both the speed and quality of a traditional review as well as reduce overall costs.
Contact Harbor today to see how our customizable Document Review Workflows can efficiently organize your reviews, so more consistent coding decisions can be made, significantly lowering your document review costs.

ECA Analytics Workflow Video Demo, with Ericka Reed

Early Case Assessment (ECA) allows quick understanding of case issues, and identification of data for further review. This process has been effective because it not only saves time, but also saves clients ‘money. Discover this powerful, high-level overview of ESI workflow, with Ericka Reed, Client Services Manager.


Government Entity Uses Legal Technology to Cull and Review 5TB Collection within Budget

Process data from 40+ custodians from a government entity with a limited budget. Provide outside counsel with review database for searching, review, and issue coding. Produce responsive data to the opposing side in the most cost-effective way.

Download infographic


How Legal Technology can make eDiscovery More Efficient and Less Costly

Through this visual infographics, dive into a client case study. Discover how the Harbor team collected, processed, analyzed and reviewed 100,000 emails with limited time and a small budget, and provided the responsive data to the investigations team in a matter of two weeks.

Download infographic