Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in run_ulm: No Sources with More than min_n=2 Targets Despite Matrix-Network Compatibility #158

Open
victorsanchezarevalo opened this issue Oct 21, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@victorsanchezarevalo
Copy link

Describe the bug
I am encountering an error when using Decoupler's run_ulm function. The error states that no sources have more than min_n=2 targets, even though I have reduced the min_n parameter, and my dataset should have sufficient shared targets between the matrix (mat) and the network (collectri).

Error:

File ~/miniforge3/envs/decoupler/lib/python3.10/site-packages/spyder_kernels/customize/utils.py:209 in exec_encapsulate_locals
  exec_fun(compile(code_ast, filename, "exec"), globals)

File ~/Documentos/Mis_analisis/Ester_Martin/12_decoupler.py:225
  tf_acts, tf_pvals = dc.run_ulm(mat=mat, net=collectri, verbose=True, min_n=2)

File ~/miniforge3/envs/decoupler/lib/python3.10/site-packages/decoupler/method_ulm.py:108 in run_ulm
  net = filt_min_n(c, net, min_n=min_n)

File ~/miniforge3/envs/decoupler/lib/python3.10/site-packages/decoupler/pre.py:146 in filt_min_n
  raise ValueError("""No sources with more than min_n={0} targets. Make sure mat and net have shared target features or
ValueError: No sources with more than min_n=2 targets. Make sure mat and net have shared target features or
  reduce the number assigned to min_n

To Reproduce
Steps to reproduce the behavior:

  1. Install decoupler in a clean Python environment.
  2. Use the run_ulm function with the following setup:
    • Input matrix (mat): Gene expression matrix with n_genes x n_samples.
    • Regulatory network (collectri): A list of transcription factors and their target genes.
  3. Set min_n=2 to ensure that there are at least 2 targets per transcription factor.
  4. Run the analysis and observe the error when filt_min_n fails to find sufficient shared targets between the matrix and network.

If needed, I can provide a subset of the data that triggers the error for testing purposes.

Expected behavior
I expected the run_ulm function to return transcription factor activities and p-values when using the provided gene expression matrix (mat) and regulatory network (collectri), as there should be sufficient shared target genes between the two.

System

  • OS: Fedora 40
  • Python version: 3.10

Additional context
I have verified that my matrix contains valid gene names and is compatible with the regulatory network. Despite lowering min_n to 1, the issue persists. This error seems to indicate a mismatch between the features in the matrix and the network, but these have been checked for consistency.

@victorsanchezarevalo victorsanchezarevalo added the bug Something isn't working label Oct 21, 2024
@PauBadiaM
Copy link
Member

Hi @victorsanchezarevalo,
decoupler follows the observations x features convention (commonly used in Python), rather than the features x observations convention (more typical in R). You can transpose your matrix to match this format, and it should work. Feel free to reach out if you have any further questions!

@victorsanchezarevalo
Copy link
Author

Hi,

I’m encountering an issue when running ULM analysis with decoupler. I have already transposed my expression matrix as recommended (observations x features), but I am still getting the following error related to the min_n=2 parameter:

Code used:

# Preparing matrix and transposing
mat = results_df[['stat']].T.rename(index={'stat': 'treatment.vs.control'}).T

# Transposing matrix so genes are in rows
print(f"New mat shape: {mat.shape}")

# Assigning gene names from subset_adata.var['gene_name']
mat.index = subset_adata.var['gene_name'].values

# Checking the first 10 gene names
print(mat.index[:10])

# Retrieving CollecTRI gene regulatory network
settings.setup(curl_timeout=1200)
os.system('rm -rf ~/.cache/omnipathdb/*')
os.system('rm -rf ~/.cache/pypath/*')

collectri = dc.get_collectri(organism='mouse', split_complexes=False)

# Running ULM analysis with min_n=2
tf_acts, tf_pvals = dc.run_ulm(mat=mat, net=collectri, verbose=True, min_n=2)

# Checking results
print(tf_acts.head())
print(tf_pvals.head())

Error message:

ValueError: No sources with more than min_n=2 targets. Make sure mat and net have shared target features or reduce the number assigned to min_n

I have verified that the matrix has been transposed and gene names have been correctly assigned, but the issue persists. It seems that no transcription factors have more than 2 shared targets, even though min_n=2 is a reasonable threshold in this context.

Any help or suggestions would be appreciated!

Thank you!

@PauBadiaM
Copy link
Member

Could you show me the head of your input mat?

mat.head()

@victorsanchezarevalo
Copy link
Author

victorsanchezarevalo commented Oct 23, 2024

mat.head()
Out[20]: 
         treatment.vs.control
Sox17               -0.252628
Gm15452              0.320179
Gm26983              0.747064
Gm6187              -0.106062
Gm6119              -0.746242

@PauBadiaM
Copy link
Member

PauBadiaM commented Oct 24, 2024

Hi @victorsanchezarevalo,

As you show in your console output you have one observation (one contrast) and multiple genes (n). So, your matrix has wrong format features x observations (n, 1), not the correct observations x features (1, n). Transpose it again and it should be fine.

@PauBadiaM PauBadiaM self-assigned this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants