Have you ever run Monocle3 and ended up with a tangled mess of trajectories on your UMAP plot? You’re not alone.
Monocle3 is one of the most popular tools for trajectory inference in single-cell RNA-seq, but its default settings can sometimes produce overly complex graphs — full of short loops, spurious branches, and paths that don’t make biological sense.
In this post, we’ll break down:
- Why your Monocle3 trajectories look messy
- What’s happening inside the Learn_Graph() function
- How to clean up your graph using the Learn_Graph_Control parameters
💥Why Monocle3 Trajectories Get Messy
Messy trajectories usually stem from a combination of the following:
- Overfitting to local variation
The principal graph may latch onto subtle, potentially noisy variation in the data — over-interpreting it as biologically meaningful structure. - High-dimensional noise
UMAP and PCA reduce thousands of gene expression features down to a few dimensions. Noise in the input space can easily distort the low-dimensional structure, especially if no proper filtering or batch correction was applied. - Cells from different states appear close in space
UMAP preserves local structure, but cells from distinct biological states may end up near each other in the embedding. This can confuse the trajectory learner and lead to incorrect branching. - Truly complex developmental processes
Sometimes the messiness is real — biology doesn’t always follow clean, linear paths. In such cases, you need tools that can summarize complexity without losing interpretability.
⚙️How The Learn_Graph() Function Works (and How to Tune It)
The Learn_Graph() function in Monocle3 builds the trajectory backbone on top of your low-dimensional embedding (usually UMAP). Here’s what it does under the hood:
- Identifies representative “centers” — key points in the embedding that anchor the graph.
- Connects the centers using a graph learning algorithm (SimplePPT).
- Returns a principal graph that reflects likely lineages and branches.
The default settings, while often a good starting point, can sometimes be too sensitive, leading to overfitting of the graph to the underlying embedding.
🛠️ Clean Up Your Graph with Learn_Graph_Control
To gain better control, use the Learn_Graph_Control argument. This lets you tune the complexity and smoothness of the learned trajectory.
Recommended Parameters
R
CDS <- learn_graph(CDS,
use_partition = TRUE,
verbose = FALSE,
learn_graph_control = list(ncenter = 50,
epsilon = 1e-6,
minimal_k = 10,
minimal_branch_len = 15))
Parameter Breakdown
- ncenter: Number of centers for the principal graph. Smaller = simpler graph with fewer branches. Try 30–100 depending on dataset size.
- minimal_k: Minimum number of neighbors used in smoothing. Higher = more smoothing, reducing noise.
- minimal_branch_len: Minimum branch length (in UMAP units). Removes tiny loops or “stub” branches that often clutter the trajectory.
Check Your Monocle3 Version
Some of these tuning options may vary depending on your Monocle3 version. Upgrade if needed to ensure compatibility with learn_graph_control. To check:
R
packageVersion("monocle3")
🧼 Still Messy? Check Your UMAP First
Even with proper graph parameters, if your UMAP is noisy, the learned graph will still reflect that noise.
Here’s a simple trick: Increase the min_dist in UMAP to reduce overfitting of local variation. This helps pull apart clusters and reduce spurious proximity between unrelated cell types.
R
CDS <- preprocess_cds(CDS, ...)
CDS <- reduce_dimension(CDS,
reduction_method = "UMAP",
umap.metric = "cosine",
umap.min_dist = 0.4)
✅ Final Thoughts
Monocle3 is a powerful tool, but like any modeling algorithm, it benefits from careful parameter tuning and preprocessing. Messy graphs aren’t necessarily a bug — they often reflect noisy embeddings or overly sensitive defaults.