59 Functions

To summarize the features embedded in the large-scale transcriptomic dataset generated in this study and to facilitate downstream application by other researchers, we trained multiple predictive models using various algorithms, built upon the dataset presented herein. To evaluate model performance and guide the selection of optimal classifiers for robust subtype prediction, a set of external, independent samples was integrated with our cohort and annotated using the same analytical pipeline. These external samples were also used as model input. By comparing the model predictions with the results obtained from the standard analytical pipeline, we identified the optimal model, which achieved a high accuracy.

Subsequently, in conjunction with the findings of this study, we developed a suite of methods for molecular characterization of multiple myeloma (MM) samples and implemented them in an R package (MMyeloMap) designed to enable comprehensive single-sample prediction. This tool takes transcript quantification results generated by the Salmon pipeline as input and optionally produces a comprehensive web-based report. The framework integrates multiple predictive models trained on multiple cohorts (see Methods), and provides an overall subtype assignment for the queried sample. The output includes figures illustrating transcriptomic clustering characteristics, dimensionality reduction visualizations, and other relevant features. When raw FASTQ files are used as input, the tool can also infer the patient’s immunoglobulin (Ig) isotype and identify potential driver alterations.

Please check https://github.com/JhuangLab/MMyeloMap/ and https://jhuanglab.github.io/MMyeloMap for the details.