uniport.Run

uniport.Run(adatas=None, adata_cm=None, mode='h', lambda_s=0.5, lambda_recon=1.0, lambda_kl=0.5, lambda_ot=1.0, iteration=30000, ref_id=None, save_OT=False, use_rep=['X', 'X'], out='latent', label_weight=None, reg=0.1, reg_m=1.0, batch_size=256, lr=0.0002, enc=None, gpu=0, prior=None, loss_type='BCE', outdir='output/', input_id=0, pred_id=1, seed=124, num_workers=4, patience=30, batch_key='domain_id', source_name='source', model_info=False, verbose=False)[source]

Run data integration

Parameters:
  • adatas – List of AnnData matrices, e.g. [adata1, adata2].

  • adata_cm – AnnData matrices containing common genes.

  • mode – Choose from [‘h’, ‘v’, ‘d’] If ‘h’, integrate data with common genes (Horizontal integration) If ‘v’, integrate data profiled from the same cells (Vertical integration) If ‘d’, inetrgate data without common genes (Diagonal integration) Default: ‘h’.

  • lambda_s – Balanced parameter for common and specific genes. Default: 0.5

  • lambda_recon – Balanced parameter for reconstruct term. Default: 1.0

  • lambda_kl – Balanced parameter for KL divergence. Default: 0.5

  • lambda_ot – Balanced parameter for OT. Default: 1.0

  • iteration – Max iterations for training. Training one batch_size samples is one iteration. Default: 30000

  • ref_id – Id of reference dataset. Default: None

  • save_OT – If True, output a global OT plan. Need more memory. Default: False

  • use_rep – Use ‘.X’ or ‘.obsm’. For mode=’d’ only. If use_rep=[‘X’,’X’], use ‘adatas[0].X’ and ‘adatas[1].X’ for integration. If use_rep=[‘X’,’X_lsi’], use ‘adatas[0].X’ and ‘adatas[1].obsm[‘X_lsi’]’ for integration. If use_rep=[‘X_pca’, ‘X_lsi’], use ‘adatas[0].obsm[‘X_pca’]’ and ‘adatas[1].obsm[‘X_lsi’]’ for integration. Default: [‘X’,’X’]

  • out – Output of uniPort. Choose from [‘latent’, ‘project’, ‘predict’]. If out==’latent’, train the network and output cell embeddings. If out==’project’, project data into the latent space and output cell embeddings. If out==’predict’, project data into the latent space and output cell embeddings through a specified decoder. Default: ‘latent’.

  • label_weight – Prior-guided weighted vectors. Default: None

  • reg – Entropy regularization parameter in OT. Default: 0.1

  • reg_m – Unbalanced OT parameter. Larger values means more balanced OT. Default: 1.0

  • batch_size – Number of samples per batch to load. Default: 256

  • lr – Learning rate. Default: 2e-4

  • enc – Structure of encoder

  • gpu – Index of GPU to use if GPU is available. Default: 0

  • prior – Prior correspondence matrix. Default: None

  • loss_type – type of loss. ‘BCE’, ‘MSE’ or ‘L1’. Default: ‘BCE’

  • outdir – Output directory. Default: ‘output/’

  • input_id – Only used when mode==’d’ and out==’predict’ to choose a encoder to project data. Default: 0

  • pred_id – Only used when out==’predict’ to choose a decoder to predict data. Default: 1

  • seed – Random seed for torch and numpy. Default: 124

  • patience – early stopping patience. Default: 10

  • batch_key – Name of batch in AnnData. Default: domain_id

  • source_name – Name of source in AnnData. Default: source

  • rep_celltype – Names of cell-type annotation in AnnData. Default: ‘cell_type’

  • umap – If True, perform UMAP for visualization. Default: False

  • model_info – If True, show structures of encoder and decoders.

  • verbose – Verbosity, True or False. Default: False

  • assess – If True, calculate the entropy_batch_mixing score and silhouette score to evaluate integration results. Default: False

  • show – If True, show the UMAP visualization of latent space. Default: False

Returns:

  • adata.h5ad – The AnnData matrice after integration. The representation of the data is stored at adata.obsm[‘latent’], adata.obsm[‘project’] or adata.obsm[‘predict’].

  • checkpoint – model.pt contains the variables of the model and config.pt contains the parameters of the model.

  • log.txt – Records model parameters.

  • umap.pdf – UMAP plot for visualization if umap=True.