Deconvolution¶
Cell fraction deconvolution by DeSide or Scaden.
- class deside.decon_cf.DeSide(model_dir: str, log_file_path: str | None = None, model_name: str = 'DeSide')[source]¶
DeSide model for predicting cell proportions in bulk RNA-seq data
- Parameters:
model_dir – the directory of saving well-trained model
log_file_path – the file path of log
model_name – only for naming
- get_model()[source]¶
Load pre-trained model by keras.models.load_model if exists.
- Returns:
pre-trained model
- get_x_before_predict(input_file, exp_type, transpose: bool = False, print_info: bool = True, scaling_by_sample: bool = False, scaling_by_constant: bool = True, pathway_mask: DataFrame | None = None, method_adding_pathway: str = 'add_to_end')[source]¶
- Parameters:
input_file – input file path
exp_type – ‘log_space’ or ‘raw_space’
transpose – if True, transpose the input dataframe
print_info – if True, print info
scaling_by_sample – if True, scaling by sample
scaling_by_constant – if True, scaling by constant
pathway_mask – if not None, use pathway mask to get pathway profiles
method_adding_pathway – ‘add_to_end’ or ‘convert’
- Returns:
x
- predict(input_file, exp_type, output_file_path: str | None = None, transpose: bool = False, print_info: bool = True, add_cell_type: bool = False, scaling_by_constant=False, scaling_by_sample=True, one_minus_alpha: bool = False, pathway_mask: DataFrame | None = None, method_adding_pathway: str = 'add_to_end', hyper_params: dict | None = None)[source]¶
Predicting cell proportions using pre-trained model.
- Parameters:
input_file – the file path of input file (.csv / .h5ad / pd.Dataframe), samples by genes simulated (or TCGA) bulk expression profiles, log2(TPM + 1) or TPM
output_file_path – the file path to save prediction result
exp_type – log_space or TPM, log_space means log2(TPM + 1)
transpose – transpose if exp_file formed as genes (index) by samples (columns)
print_info – print information during prediction
add_cell_type – only True when predicting cell types using classification model
scaling_by_constant – scaling log2(TPM + 1) by dividing 20
scaling_by_sample – scaling by sample, same as Scaden
one_minus_alpha – use 1 - alpha for all cell types if True
pathway_mask – if not None, use pathway mask to get pathway profiles
method_adding_pathway – ‘add_to_end’ or ‘convert’
hyper_params – hyper parameters for DNN model
- train_model(training_set_file_path: str | list, hyper_params: dict, cell_types: list | None = None, scaling_by_sample: bool = True, callback: bool = True, n_epoch: int = 10000, metrics: str = 'mse', n_patience: int = 100, scaling_by_constant=False, remove_cancer_cell=False, fine_tune=False, one_minus_alpha: bool = False, verbose=1, pathway_mask=None, method_adding_pathway='add_to_end', input_gene_list: str | None = None, filtered_gene_list: list | None = None, group_cell_types: dict | None = None)[source]¶
Training DeSide model
- Parameters:
training_set_file_path – the file path of training set, .h5ad file, log2cpm1p format, samples by genes
hyper_params – pre-determined hyperparameters for DeSide model
cell_types – specific a list of cell types instead of using all cell types in training set
scaling_by_sample – whether to scale the expression values of each sample to [0, 1] by ‘min_max’
callback – whether to use callback function when training model
n_epoch – the max number of epochs to train
metrics – mse (regression model) / accuracy (classifier)
n_patience – patience in early_stopping_callback
remove_cancer_cell – remove cancer cell from y if True, using “1-others”
fine_tune – fine tune pre-trained model
scaling_by_constant – scaling GEP by dividing a constant in log space, default value is 20, to make sure all expression values are in [0, 1) if True
one_minus_alpha – use 1 - alpha for all cell types if True
verbose – whether to print progress during training, 0: silent, 1: progress bar, 2: one line per epoch
pathway_mask – the mask of pathway genes, 1: pathway gene, 0: non-pathway gene, genes by pathways
method_adding_pathway – the method to use pathway profiles, ‘add_to_end’ or ‘convert’
input_gene_list – the gene list used as input for pathway profiles, if None: use all genes in training set; if “intersection_with_pathway_genes”: use the intersection of genes in training set and genes in pathways; if “filtered_genes”: use the genes in filtered_gene_list.
filtered_gene_list – the list of genes used as input, if None, use all genes in training set
group_cell_types – group cell types into a list of cell types, e.g. {‘group1’: [‘cell_type1’, ‘cell_type2’]}