- R 100%
| Data | ||
| .gitignore | ||
| distance_classifier.R | ||
| LICENSE | ||
| preproccesing_qpcr_data.R | ||
| README.md | ||
| trainning_logistic_model.R | ||
EpiGe-App: a machine-learning strategy for rapid classification of medulloblastoma using PCR-based methyl-genotyping
Collection of R-scripts used to perform DNA-methylation data analysis presented in EpiGe-App: a machine-learning strategy for rapid classification of medulloblastoma using PCR-based methyl-genotyping.
Preprocessing
preprocessing_qpcr_data.R
Reads the qpcr data from Data/deltaRn_dataframe.csv, filters out non-medulloblastoma data, performs the mean between replicates, and performs a log transform of the data. Finally, the methylation variable is binarized. The data are stored in ./Results.
Trainning logistic model
trainning_logistic_model.R
It reads the preprocessing data, performs model training using a Leave One Patient Out Cross-Validation (LOPOCV) loop, calculates the AUC and the confusion matrix of the model using the optimal cut-off point obtained from the AUC. Finally, the data are stored with the newly created variables and the logistic model is saved in ./Results.
Medulloblastoma subgroup assigment
distance_classifier.R
Reads the logistic model training data and obtains the molecular subgroup for each sample from the Data/IDAT_dataframe.csv file. Binary codes are created from the methylation state of each cytosine. The distances of each methylation code from the 3 reference codes are calculated for each molecular subgroup. The molecular subgroup of the closest group is assigned by distance. Finally, tables of information about the assignment and the confusion matrix are shown.