uniport.Run

uniport.Run(adatas=None, adata_cm=None, mode='h', lambda_s=0.5, lambda_recon=1.0, lambda_kl=0.5, lambda_ot=1.0, iteration=30000, ref_id=None, save_OT=False, use_rep=['X', 'X'], out='latent', label_weight=None, reg=0.1, reg_m=1.0, batch_size=256, lr=0.0002, enc=None, gpu=0, prior=None, loss_type='BCE', outdir='output/', input_id=0, pred_id=1, seed=124, num_workers=4, patience=30, batch_key='domain_id', source_name='source', model_info=False, verbose=False)[source]

Run data integration

Parameters:

adatas – List of AnnData matrices, e.g. [adata1, adata2].
adata_cm – AnnData matrices containing common genes.
mode – Choose from [‘h’, ‘v’, ‘d’] If ‘h’, integrate data with common genes (Horizontal integration) If ‘v’, integrate data profiled from the same cells (Vertical integration) If ‘d’, inetrgate data without common genes (Diagonal integration) Default: ‘h’.
lambda_s – Balanced parameter for common and specific genes. Default: 0.5
lambda_recon – Balanced parameter for reconstruct term. Default: 1.0
lambda_kl – Balanced parameter for KL divergence. Default: 0.5
lambda_ot – Balanced parameter for OT. Default: 1.0
iteration – Max iterations for training. Training one batch_size samples is one iteration. Default: 30000
ref_id – Id of reference dataset. Default: None
save_OT – If True, output a global OT plan. Need more memory. Default: False
use_rep – Use ‘.X’ or ‘.obsm’. For mode=’d’ only. If use_rep=[‘X’,’X’], use ‘adatas[0].X’ and ‘adatas[1].X’ for integration. If use_rep=[‘X’,’X_lsi’], use ‘adatas[0].X’ and ‘adatas[1].obsm[‘X_lsi’]’ for integration. If use_rep=[‘X_pca’, ‘X_lsi’], use ‘adatas[0].obsm[‘X_pca’]’ and ‘adatas[1].obsm[‘X_lsi’]’ for integration. Default: [‘X’,’X’]
out – Output of uniPort. Choose from [‘latent’, ‘project’, ‘predict’]. If out==’latent’, train the network and output cell embeddings. If out==’project’, project data into the latent space and output cell embeddings. If out==’predict’, project data into the latent space and output cell embeddings through a specified decoder. Default: ‘latent’.
label_weight – Prior-guided weighted vectors. Default: None
reg – Entropy regularization parameter in OT. Default: 0.1
reg_m – Unbalanced OT parameter. Larger values means more balanced OT. Default: 1.0
batch_size – Number of samples per batch to load. Default: 256
lr – Learning rate. Default: 2e-4
enc – Structure of encoder
gpu – Index of GPU to use if GPU is available. Default: 0
prior – Prior correspondence matrix. Default: None
loss_type – type of loss. ‘BCE’, ‘MSE’ or ‘L1’. Default: ‘BCE’
outdir – Output directory. Default: ‘output/’
input_id – Only used when mode==’d’ and out==’predict’ to choose a encoder to project data. Default: 0
pred_id – Only used when out==’predict’ to choose a decoder to predict data. Default: 1
seed – Random seed for torch and numpy. Default: 124
patience – early stopping patience. Default: 10
batch_key – Name of batch in AnnData. Default: domain_id
source_name – Name of source in AnnData. Default: source
rep_celltype – Names of cell-type annotation in AnnData. Default: ‘cell_type’
umap – If True, perform UMAP for visualization. Default: False
model_info – If True, show structures of encoder and decoders.
verbose – Verbosity, True or False. Default: False
assess – If True, calculate the entropy_batch_mixing score and silhouette score to evaluate integration results. Default: False
show – If True, show the UMAP visualization of latent space. Default: False

Returns:

adata.h5ad – The AnnData matrice after integration. The representation of the data is stored at adata.obsm[‘latent’], adata.obsm[‘project’] or adata.obsm[‘predict’].
checkpoint – model.pt contains the variables of the model and config.pt contains the parameters of the model.
log.txt – Records model parameters.
umap.pdf – UMAP plot for visualization if umap=True.