
Literature Lab™, from Acumenta Biotech, is a unique platform for functional analysis of genetic data
through direct statistical mining of PubMed.
“Whether lists are 10 or 2000 genes in length, Literature Lab™ consistently identifies validating
associations and emerging relationships in the published literature. Beyond this, Literature Lab™ often brings to light novel and unanticipated associations that may help expand the research in new and exciting directions. This is exemplified in a recent application note in which a gene list derived from high resolution mapping in a genome wide association study of Multiple Sclerosis (MS) susceptibility uncovered viral disease associations, several of which have not been previously linked with MS in the literature”, explains Damon Anderson, PhD, Vice President of Business Development.
Literature Lab™ works by first quantifying the strength of co-occurrence of each gene in a submitted
gene list with any or all of 96,000 terms in the literature, based on MeSH ontology and proprietary
domain designations. It then establishes significance of these associations by comparing the submitted gene list with 1000 random gene lists. Heuristics are applied to establish to strength of the associations, for instance, by rewarding the contribution of multiple genes in a list and penalizing associations driven by sparse data. The results are then be visualized in tabular, clustering, and heatmap formats – with the ability to click and connect directly with the PubMed abstracts underlying the data.
“One of the most powerful strengths of the Literature Lab™ platform is that it addresses two big data
issues. First, modern technologies are producing genetic data at unprecedented rates often resulting in a backlog of experimental questions with no real actionable answers. Second, the PubMed literature record is expanding at an ever increasing rate, resulting in a huge resource of critical information that remains relatively untapped”, explains Dr. Anderson.
The number of abstracts in PubMed in the period mined by Literature Lab™ is 16,674,480. A record
1,081,927 were added to PubMed in 2015, a 29% increase over 2010 and a 118% increase over 2000.
“This rate will only continue to climb in the coming years, making it increasingly more difficult to identify actionable associations from genetic data. Finding that needle in the haystack is a virtual impossibility using tradition Boolean search methods. A Literature Lab™ analysis of a 25 gene set is the equivalent of 1 billion manual queries of PubMed”, says Paul Martinez, President and CEO. “That’s a truly powerful advantage of the Literature Lab™ technology.”