Machine learning reveals how disordered protein regions contribute to cancer-causing condensates

Fusion oncoproteins arise when a gene fuses with another gene and acquires new abilities. Such abilities can include the formation of biomolecular condensates, "droplets" of concentrated proteins, DNA or RNA.

The abnormal molecular condensates formed by fusion oncoproteins can disrupt cellular functions and drive cancer development, but the specific protein features behind this process remain unclear.

Scientists at St. Jude Children’s Research Hospital studied intrinsically disordered regions, unstructured protein segments that are often involved in condensate formation, to determine if they drive fusion oncoproteins to form condensates. They trained a machine learning model, called IDR-Puncta ML, with experimental data from intrinsically disordered regions in fusion oncoproteins to predict the behavior of other such regions.

The model found that only about 12% of all human intrinsically disordered regions form condensates and are within proteins with strong links to RNA–related functions.

Sign up for Blog Updates