No description
Find a file
2023-05-23 16:49:15 +00:00
Data Epic data and scripts 2023-05-23 18:05:33 +02:00
.gitignore Epic data and scripts 2023-05-23 18:05:33 +02:00
distance_classifier.R Epic data and scripts 2023-05-23 18:05:33 +02:00
LICENSE Add LICENSE 2023-05-10 13:40:59 +00:00
preproccesing_qpcr_data.R Epic data and scripts 2023-05-23 18:05:33 +02:00
README.md Update README.md 2023-05-23 16:49:15 +00:00
trainning_logistic_model.R Epic data and scripts 2023-05-23 18:05:33 +02:00

EpiGe-App: a machine-learning strategy for rapid classification of medulloblastoma using PCR-based methyl-genotyping

Collection of R-scripts used to perform DNA-methylation data analysis presented in EpiGe-App: a machine-learning strategy for rapid classification of medulloblastoma using PCR-based methyl-genotyping.

Preprocessing

preprocessing_qpcr_data.R

Reads the qpcr data from Data/deltaRn_dataframe.csv, filters out non-medulloblastoma data, performs the mean between replicates, and performs a log transform of the data. Finally, the methylation variable is binarized. The data are stored in ./Results.

Trainning logistic model

trainning_logistic_model.R

It reads the preprocessing data, performs model training using a Leave One Patient Out Cross-Validation (LOPOCV) loop, calculates the AUC and the confusion matrix of the model using the optimal cut-off point obtained from the AUC. Finally, the data are stored with the newly created variables and the logistic model is saved in ./Results.

Medulloblastoma subgroup assigment

distance_classifier.R

Reads the logistic model training data and obtains the molecular subgroup for each sample from the Data/IDAT_dataframe.csv file. Binary codes are created from the methylation state of each cytosine. The distances of each methylation code from the 3 reference codes are calculated for each molecular subgroup. The molecular subgroup of the closest group is assigned by distance. Finally, tables of information about the assignment and the confusion matrix are shown.