Deconvolution

Cell fraction deconvolution by DeSide or Scaden.

class deside.decon_cf.DeSide(model_dir: str, log_file_path: str | None = None, model_name: str = 'DeSide')[source]

DeSide model for predicting cell proportions in bulk RNA-seq data

Parameters:
  • model_dir – the directory of saving well-trained model

  • log_file_path – the file path of log

  • model_name – only for naming

get_model()[source]

Load pre-trained model by keras.models.load_model if exists.

Returns:

pre-trained model

get_parameters() dict[source]

Get key parameters of the model.

get_x_before_predict(input_file, exp_type, transpose: bool = False, print_info: bool = True, scaling_by_sample: bool = False, scaling_by_constant: bool = True, pathway_mask: DataFrame | None = None, method_adding_pathway: str = 'add_to_end')[source]
Parameters:
  • input_file – input file path

  • exp_type – ‘log_space’ or ‘raw_space’

  • transpose – if True, transpose the input dataframe

  • print_info – if True, print info

  • scaling_by_sample – if True, scaling by sample

  • scaling_by_constant – if True, scaling by constant

  • pathway_mask – if not None, use pathway mask to get pathway profiles

  • method_adding_pathway – ‘add_to_end’ or ‘convert’

Returns:

x

predict(input_file, exp_type, output_file_path: str | None = None, transpose: bool = False, print_info: bool = True, add_cell_type: bool = False, scaling_by_constant=False, scaling_by_sample=True, one_minus_alpha: bool = False, pathway_mask: DataFrame | None = None, method_adding_pathway: str = 'add_to_end', hyper_params: dict | None = None)[source]

Predicting cell proportions using pre-trained model.

Parameters:
  • input_file – the file path of input file (.csv / .h5ad / pd.Dataframe), samples by genes simulated (or TCGA) bulk expression profiles, log2(TPM + 1) or TPM

  • output_file_path – the file path to save prediction result

  • exp_type – log_space or TPM, log_space means log2(TPM + 1)

  • transpose – transpose if exp_file formed as genes (index) by samples (columns)

  • print_info – print information during prediction

  • add_cell_type – only True when predicting cell types using classification model

  • scaling_by_constant – scaling log2(TPM + 1) by dividing 20

  • scaling_by_sample – scaling by sample, same as Scaden

  • one_minus_alpha – use 1 - alpha for all cell types if True

  • pathway_mask – if not None, use pathway mask to get pathway profiles

  • method_adding_pathway – ‘add_to_end’ or ‘convert’

  • hyper_params – hyper parameters for DNN model

train_model(training_set_file_path: str | list, hyper_params: dict, cell_types: list | None = None, scaling_by_sample: bool = True, callback: bool = True, n_epoch: int = 10000, metrics: str = 'mse', n_patience: int = 100, scaling_by_constant=False, remove_cancer_cell=False, fine_tune=False, one_minus_alpha: bool = False, verbose=1, pathway_mask=None, method_adding_pathway='add_to_end', input_gene_list: str | None = None, filtered_gene_list: list | None = None, group_cell_types: dict | None = None)[source]

Training DeSide model

Parameters:
  • training_set_file_path – the file path of training set, .h5ad file, log2cpm1p format, samples by genes

  • hyper_params – pre-determined hyperparameters for DeSide model

  • cell_types – specific a list of cell types instead of using all cell types in training set

  • scaling_by_sample – whether to scale the expression values of each sample to [0, 1] by ‘min_max’

  • callback – whether to use callback function when training model

  • n_epoch – the max number of epochs to train

  • metrics – mse (regression model) / accuracy (classifier)

  • n_patience – patience in early_stopping_callback

  • remove_cancer_cell – remove cancer cell from y if True, using “1-others”

  • fine_tune – fine tune pre-trained model

  • scaling_by_constant – scaling GEP by dividing a constant in log space, default value is 20, to make sure all expression values are in [0, 1) if True

  • one_minus_alpha – use 1 - alpha for all cell types if True

  • verbose – whether to print progress during training, 0: silent, 1: progress bar, 2: one line per epoch

  • pathway_mask – the mask of pathway genes, 1: pathway gene, 0: non-pathway gene, genes by pathways

  • method_adding_pathway – the method to use pathway profiles, ‘add_to_end’ or ‘convert’

  • input_gene_list – the gene list used as input for pathway profiles, if None: use all genes in training set; if “intersection_with_pathway_genes”: use the intersection of genes in training set and genes in pathways; if “filtered_genes”: use the genes in filtered_gene_list.

  • filtered_gene_list – the list of genes used as input, if None, use all genes in training set

  • group_cell_types – group cell types into a list of cell types, e.g. {‘group1’: [‘cell_type1’, ‘cell_type2’]}