New computational tool elucidates how deep neural networks interpret genomic data

Introducing SQUID: A New Tool for Interpreting AI Models in Genomics

Scientists may now be one step closer to understanding the internal logic of artificial intelligence (AI) models used for genomics, thanks to a new tool developed by the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory (CSHL). In a recent publication in Nature Artificial Intelligence, researchers describe a computational tool called Surrogate Quantitative Interpretability for Deep Networks (SQUID). This innovative tool leverages deep neural networks (DNNs) to enhance our interpretation of how AI models analyze the genome.

The Purpose of SQUID

The primary goal of SQUID is to address the complex challenge of interpreting AI models in genomics. As AI becomes increasingly integral to genomic research, decoding the decision-making processes of these models is crucial. SQUID aims to provide a clearer understanding of how AI models identify and interpret genetic variants, regulatory elements, and other genomic features. This understanding can lead to more accurate predictions and insights into genomic functions and disease mechanisms.

How SQUID Works

SQUID operates by generating a comprehensive in silico library of variant DNA sequences. It then trains a surrogate model known as a latent phenotype model on this data using a program called Multiplex Assays of Variant Effects Neural Network (MAVE-NN). The tool visualizes and interprets the parameters of the model, allowing scientists to run thousands of virtual experiments simultaneously. This process helps identify which algorithms make the most accurate predictions for different genetic variants.

According to the developers, SQUID uses “simple models with interpretable parameters” to approximate the DNN function within localized regions of sequence space. This method helps to “remove the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation.”

Advantages Over Other Methods

One of the key advantages of SQUID is its ability to leverage decades of quantitative genetics knowledge. Traditional tools for understanding genomic patterns have often been adapted from fields like computer vision or natural language processing. While useful, these tools are not specifically optimized for genomics. SQUID, on the other hand, is designed with the unique challenges of genomic data in mind.

Peter Koo, an assistant professor at CSHL and senior author of the study, explains, “The tools that people use to understand these patterns have mostly come from other fields like computer vision or natural language processing. While they can be useful, they are not optimal for genomics. What we did with SQUID was leverage decades of quantitative genetics knowledge to help us understand what these deep neural networks are learning.”

Potential Applications in Genomics Research

SQUID’s ability to interpret AI models has significant implications for genomics research. By providing more accurate and interpretable models, SQUID can aid in identifying binding motifs of transcription factors, reducing noise in attribute maps, and improving variant effect predictions. This capability is particularly valuable for studying epistatic interactions at cis-regulatory elements, where SQUID has demonstrated superior performance compared to other methods.

Although SQUID is more computationally demanding, it is especially beneficial for researchers conducting in-depth analyses of specific genomic sequences, such as disease-associated loci. Virtual experiments with SQUID can generate hypotheses about how particular regions of the genome function or how specific mutations might have clinically relevant effects.

Justin Kinney, a CSHL associate professor and co-author of the study, notes, “While virtual experiments can’t exactly replace lab tests, they can be very informative in helping scientists generate hypotheses about how a particular region of the genome works or how a mutation might have a clinically relevant effect.”

Conclusion

The development of SQUID represents a significant advancement in the field of genomics, offering a powerful tool for interpreting AI models. By enhancing our understanding of AI-driven genomic analysis, SQUID holds the potential to drive new discoveries and improve our ability to predict and understand genetic variations and their impacts. As researchers continue to explore its applications, SQUID is poised to become an invaluable asset in the quest to unlock the mysteries of the genome.

For more detailed insights into this groundbreaking tool, refer to the full study published in Nature Artificial Intelligence.