The 30th Annual European Symposium on Algorithms awarded its 2021 Test-of-Time Award to Sorin Istrail, professor of computational and mathematical sciences and professor of computer science, and his collaborators for their paper “SNPs Problems, Complexity and Algorithms.” The award recognizes “excellent papers in algorithms research” published 19 to 21 years ago that are “still influential and stimulating for the field today,” according to ESA’s website.
The paper focused on computational problems and algorithm formulation of haplotype phasing — a process of reconstructing the maternal and paternal combinations of genetic variants found on chromosomes inherited from both parents, according to Istrail.
A single person has two haplotypes — one from each parent — and understanding the interaction between these haplotypes can help scientists make predictions about a person’s susceptibility to diseases as well as the effectiveness of certain drugs, he added.
The researchers also proved that some problems in mining large amounts of data were NP-complete — a computer science classifier for problems that have an efficient algorithm for validating a given solution but may or may not have an efficient algorithm for finding a solution. An example of a NP-complete is a Sudoku puzzle — while it would take a lot of time to complete the puzzle, it would be much faster to check it over once completed.
Through the paper and its surrounding work, Istrail and colleagues made use of one of the pillars of computer science in genomics — recognizing that a genome can be modeled as a computational artifact and that, in doing so, one is able to access a powerful set of tools for analyzing it, according to Russell Schwartz, co-author of the paper and head of the computational biology department at Carnegie Mellon University.
The technique for solving these problems is “finding abstractions — a simplification of the problem that you can reason with mathematically,” Schwartz said. Besides computer science complexity classes — classifiers for computational problems that take similar time or memory to solve — other computer science concepts applied in the paper include graph theory and linear programming.
Istrail and colleagues published the paper in 2001 while working at Celera Genomics, a private research company focused on genomic sequencing, Istrail. Istrail was serving as senior director of informatics research at Celera at the time.
Leading up to the paper’s publication, Celera made large genomic datasets available for the first time as part of its effort to sequence the human genome, according to Istrail. The company hired a cohort of computational scientists, including the authors of the paper, to develop algorithms for extracting useful information from these datasets, he added.
The paper developed at Celera continues to impact genomics research 20 years after its publication. Shilpa Garg, an assistant professor at the University of Copenhagen who was not involved in the research, cited Istrail’s paper in a review article last year. Her article presented computational methods for known problems in genomics, such as a graph-based approach to efficiently integrating novel data types.
Garg’s research built on Istrail’s paper by examining new technologies for sequencing large strands of DNA. Most organisms have genomes that are too long to be read at once, so researchers instead read short segments of their genome and combine them.
In 2001, when Istrail and colleagues were working on the paper, the latest technology available for gathering large genomic sequencing data was short-read sequencing, but now long-read sequencing is becoming a viable option, according to Garg.
“The (2001) paper is actually one of the foundational papers in the computational haplotyping field,” Garg said. “Now, over two decades (later), there have been advancements (such that) we can produce these chromosome-level genomes … with minimal gaps.”
Aside from Istrail’s work at Celera and Brown, he is also a co-founder of RECOMB — an academic conference for the field of computational biology. The conference, which is approaching its 27th year, brings together leading researchers to discuss the latest advancements in genomics, according to Istrail.
As a researcher, Istrail continues to formulate ideas that advance the field while ensuring that his approaches are sound mathematically and biologically, according to Bjarni Halldorsson, associate professor in the department of biomedical engineering at Reykjavík University and head of sequence analysis at deCODE Genetics, a biopharmaceutical research company. Halldorsson completed his postdoctoral fellowship at Celera, where he reported to Istrail, and the two have collaborated on several papers since.
In addition to continuing his work in haplotype phasing, Istrail is also working on regulatory genomics, an area focused on the region at the beginning of every genome that controls gene expression. He is also working on algorithms to predict protein folding.
In his current research, Istrail continues to build on findings from the 2001 paper. “Genomics today is a world of … haplotypes, and the active research themes are all about understanding the role of (them) in health and disease,” he said.