How to Set up and Organize an Efficient Smart Document Review

There are many methodologies that can be employed in document review, and no single strategy is right for every set of documents. As new technologies emerge, it is important to think critically about how existing strategies can be updated. For example, traditional linear review (document by document in ingested order) doesn’t have to be a “dumb” review. Coupling a traditional review methodology with technology or even simple grouping methods can dramatically improve consistency and quality, reducing the overall cost of the review.

Documents being reviewed in the traditional fashion are naturally grouped in the order in which they are ingested into the database (using the original structure in which data was collected and processed). This standard grouping, while sufficient, doesn’t take into account the fact that similar documents may have been ingested at different times or that similar documents may exist within other custodians’ data. The fact that related documents are batched apart from one another means it is unlikely the same reviewer will be tasked with reviewing the similar documents. As with any subjective process, those reviewers may reasonably make different coding decisions on those similar documents.

When documents are coded inconsistently it can lead to larger problems in discovery, including the potential for opposing counsel to argue that the process used to identify relevant documents may not have been sufficient.




The simplest method for further grouping documents is to use the metadata associated with those documents. Grouping by date, domains or even the subject line pulls similar data sets together. By batching those groupings together, one reviewer can be assigned similarly themed documents. Not only does this increase speed, because the reviewer is seeing the same or similar documents over and over, but it also makes it less likely that these documents will be coded inconsistently from one another.


Near Deduplication

Near deduplication is not true deduplication based on hash value. Instead it is generally a process where documents are indexed and the indexed text is compared for similarities, therefore identifying near-duplicates based on text/content, not metadata. The user can set the level of similarity that is required to cause a document to be grouped with another document. Since only the text is being compared, documents that are different file types, different date ranges, or from different custodians can still be grouped together. Again, batching based on this grouping/criteria can lead to all the benefits described above.


Email Threading

Email threading is the process of pulling email conversations together. The process ties an original email to all the subsequent replies and forwards pertaining to that original email. This grouping allows the reviewer to see all the related documents in order as one conversation.


Other Grouping

While the grouping methods discussed above might work for many datasets, it is important to consider the data being reviewed to determine if these or other methodologies may work better. For example, a review with mixed file types like email and video files might need to be split and organized in multiple ways.


Regardless of the dataset or budget, clever grouping, or “smart batching,” of datasets can dramatically increase both the speed and quality of a traditional review as well as reduce overall costs.
Contact Harbor today to see how our customizable Document Review Workflows can efficiently organize your reviews, so more consistent coding decisions can be made, significantly lowering your document review costs.