Skip to content

LLM-scCurator

LLM-scCurator logo

LLM-scCurator is a Python framework for noise-aware marker distillation that improves the robustness of zero-shot cell-type annotation with LLMs across single-cell and spatial transcriptomics.


What it does

LLM-scCurator adds a pre-prompt feature distillation layer: it suppresses recurrent biological/technical programs (e.g., ribosomal/mitochondrial, stress, cell cycle, TCR/Ig) while rescuing lineage- and state-defining markers, then applies leakage-safe lineage filters before LLM prompting.

Why it helps

  • Stabilizes inputs: reduces “garbage-in” marker lists by masking clonotype and ubiquitous programs.
  • Preserves biology: specificity-aware rescue retains informative, lineage-restricted markers.
  • Scales to discovery: supports hierarchical, coarse-to-fine annotation for complex tissues, including spatial modalities (Visium/Xenium).

Quick start


Privacy

We respect the sensitivity of clinical and biological data. LLM-scCurator is designed so that raw expression matrices and cell-level metadata can remain within your local environment.

  • Local execution: Preprocessing, confounder masking, and feature ranking run locally on your machine.
  • Minimal transmission (optional): If you choose to use an external LLM API, only anonymized, cluster-level marker lists (e.g., top 50 gene symbols) and minimal tissue context are sent.
  • User control: You decide what additional context (e.g., disease state, treatment, platform) to include. Always follow institutional policy and the LLM provider’s terms before sharing any information.

Example workflows (institutional-policy friendly)

Many institutions restrict which AI tools can be used with internal clinical or research datasets. To support these real-world constraints, we provide two end-to-end workflows that keep raw matrices and cell-level metadata local and avoid external LLM API calls unless explicitly permitted:

  • Fully local LLM (Ollama): Curate features and optionally annotate clusters using a local LLM backend (no external transmission). examples/local/local_quickstart_ollama.ipynb

  • Local feature distillation → Approved chat LLM annotation (no external LLM API calls): Curate features locally, export a curated cluster→genes table, then annotate it via an institution-approved chat interface (e.g., Microsoft Copilot “Work”) by uploading the CSV/Excel or pasting markers. examples/local/local_quickstart_approved_ai_workflow.ipynb


Project home and issue tracker: