################################################################## ###Estimating TP/TN prediction rates and AUC for mixed proteing### ################################################################## ####################### ###Files description### ####################### TP_full.txt - true positives validation set. Contains CADD/PP2/SIFT scores and fixed predictions of all new HGMD (professional version, experimentally validated DM category) missense mutations data that was not used to generate MSC. FP_full.txt - false positives validation set. Contains CADD/PP2/SIFT scores and fixed predictions of all private missense variants in WES data of 97 patients for which the disease-causing mutation was previously experimentally validated and published, and removed from the set. match*.py - python scripts to extract TP/TN predictions for the different methods and protein sets. FP_MSC*.txt - FP input for ROC curves and prediction rates estimate, generated by match_FP*.py. TP_MSC*.txt - TP input for ROC curves and prediction rates estimate, generated by match_TP*.py. mixed_genes.txt - all validation set's "mixed proteins" genes. 04-06_genes.txt - all validation set's "mixed proteins with 04-06 ratio" genes. ROC_CADD2_mixed_all.r - generating ROC curves and AUC estimates based on FP_MSC*.txt and TP_MSC*.txt files for "mixed proteins". ROC_CADD2_mixed_04-06.r - generating ROC curves and AUC estimates based on FP_MSC*.txt and TP_MSC*.txt files for "mixed proteins with 04-06 ratio". Prediction_rates_mixed_full - summary of TP/TN prediction rates for "mixed proteins". Prediction_rates_mixed_04-06 - summary of TP/TN prediction rates for "mixed proteins with 04-06 ratio". AUC_mixed_all - summary of AUC for "mixed proteins". AUC_mixed_all - summary of AUC for "mixed proteins with 04-06 ratio". ROC_mixed_all.pdf - ROC curve figure for "mixed proteins". ROC_mixed_04-06.pdf - ROC curve figure for "mixed proteins with 04-06 ratio".