Monte Carlo Stability Tutorial

This tutorial explains why Monte Carlo stability analysis is useful and how the monte_carlo_fit routine in MagicA is designed and used.

Why Monte Carlo stability?

Large-sample-size effects occur when very large datasets make statistical tests extremely powerful: p-values become very small even for effects that are practically negligible. In other words, with enough data a test can reject the null hypothesis for differences that have no practical importance. The Monte Carlo stability workflow therefore looks for a sampling size where the chosen goodness-of-fit tests start to “pass” in a practical sense and parameter estimates stop changing much across repeats — that sample size is reported as the stability point.

Methodology (what we do)

Build a grid of sample sizes to test (or use a provided list).
For each size, generate multiple subsamples (repeats) from the original data.
For each subsample, fit the distribution (unless fixed parameters are supplied).
Compute goodness-of-fit metrics (e.g., KS p-value, chi-square p-value, RMSE).
Store all results in an xarray Dataset with dimensions sizes x repeats.
Detect stability points using a moving-window criterion on variability (e.g., CV).
Optionally save a 2x3 summary figure with red dashed vertical lines marking each variable’s detected stability sample size.

Inputs and options

sizes: list or auto-generated grid of sample sizes.
n_repeats: number of subsamples per size.
tests: list of metrics to compute: [‘ks’,’chi2’,’rmse’].
stability_method: method for detecting stability points (‘kneedle’ [default], ‘cv’, ‘plateau’).
sampling: sampling strategy (random, bootstrap, disjoint).
fit_kwargs: passed to the underlying fit routine (e.g., floc=0).
distribution_params: optional fixed parameters (skip fitting when supplied).
fig_output_path: optional path to save the 2x3 summary figure.
plot_type: ‘series’ (median + IQR) or ‘boxplots’.

Stability Detection Methods

MagicA provides three methods for detecting when parameters and goodness-of-fit metrics have stabilized. Each method has different strengths depending on the metric behavior.

CV Method (Coefficient of Variation)

Principle: Monitors the coefficient of variation (CV = std/mean) across repeats. Stability is detected when CV remains below a threshold for a consecutive window of sample sizes.

Mathematical definition:

\[CV_n = \frac{\sigma_n}{\mu_n}\]

Where \(\sigma_n\) is the standard deviation and \(\mu_n\) is the mean across repeats at sample size n.

When to use:

Erratic p-values (KS, Chi-square tests)
When you need quantitative variability assessment
Conservative estimates preferred

Parameters (via kwargs):

window_size: Number of consecutive sizes for validation (default: 25% of n_sizes)
cv_threshold: Maximum allowed CV (default: 0.1)

Kneedle Method (Elbow Detection)

Principle: Based on the “Kneedle” algorithm (Satopää et al., 2011). Finds the “elbow” or “knee” point in a curve by:

Normalizing the curve to [0, 1]
Computing a reference line from first to last point
Finding the point with maximum perpendicular distance from the reference line

Mathematical definition:

The knee point is where:

\[\text{knee} = \arg\max_i |y_i - (y_0 + (y_n - y_0) \cdot \frac{i}{n})|\]

Where \(y_i\) is the normalized curve value at point i.

When to use:

RMSE and converging parameters (RECOMMENDED)
Smooth, monotonic convergence patterns
Finding “point of diminishing returns”
Production analyses requiring objective detection

Parameters (via kwargs):

smooth: Whether to smooth curves before detection (default: True)
smoothing_method: ‘savgol’ (preserves features) or ‘spline’ (default: ‘savgol’)

Reference: Satopää, V., Albrecht, J., Irwin, D., & Raghavan, B. (2011). Finding a “Kneedle” in a Haystack: Detecting Knee Points in System Behavior. 2011 31st International Conference on Distributed Computing Systems Workshops, 166-171.

Plateau Method (Relative Gain Heuristic)

Principle: Detects when relative improvement between consecutive points becomes negligible. Stability is found when the relative change stays below a threshold for L consecutive points.

Mathematical definition:

Relative change at point i:

\[\Delta_i = \frac{|y_{i-1} - y_i|}{|y_{i-1}|}\]

Stability when:

\[\Delta_i < \epsilon \text{ for } L \text{ consecutive points}\]

When to use:

Limited computational budget
Early “good enough” detection
Iterative analyses with early stopping

Parameters (via kwargs):

consecutive_points: Number of points below tolerance (default: 3)
relative_tolerance: Maximum relative change (default: 0.01, i.e., 1%)

Method Selection Guide

Method	Best For	Typical Use Case
Kneedle	RMSE, parameters	Production RMSE analysis (RECOMMENDED)
CV	P-values (KS, Chi²)	Conservative p-value stability
Plateau	Early detection, limited budget	Quick exploration, iterative tuning

Recommended workflow:

# Use Kneedle for RMSE (most reliable)
results = adjuster.monte_carlo_fit(
    tests=['rmse'],
    stability_method='kneedle',
    smooth=True
)

recommended_n = results.attrs['recommended_size']
print(f"RMSE stable at n = {recommended_n}")

Sampling strategies

The helper _generate_subsample_indices() produces integer index lists that are used by the Monte Carlo routines to build subsamples from the original dataset. Below is a concise, practical explanation of the three supported strategies and how to choose between them.

random (without replacement) - What: Each subsample contains size unique indices drawn randomly from the

original dataset (no duplicates inside the same subsample).
- Constraint: size must be less than or equal to the original sample size N.
- When to use: Typical default for studying how estimates change with smaller sample sizes. Keeps each subsample representative and avoids internal duplication.
bootstrap (with replacement) - What: Each subsample is drawn with replacement, so the same original row

can appear multiple times in a single subsample.
- Constraint: None — size may be larger than N because indices can repeat.
- When to use: Use when you need to estimate uncertainty (variance, CIs) via resampling, or when you want to allow size >= N for simulation purposes.
disjoint (non-overlapping partitions) - What: The original indices are shuffled and partitioned into non-overlapping

blocks of length size. Each block is a subsample with no shared indices.
- Constraint: size must be <= N. The number of blocks per shuffle is N // size; for n_repeats larger than that, additional shuffles are used.
- When to use: Use when you want independent partitions (similar to simple cross-validation) and want to avoid overlap between subsamples within the same shuffle.

Reproducibility and seed

A random seed controls the pseudo-random generator used to create indices. Providing the same seed reproduces the same sequence of subsamples. This is recommended for experiments where results must be repeatable.

Practical guidance

Use random as a safe default when size <= N and you want unbiased subsamples without duplicates.
Use bootstrap when you need to estimate uncertainty from resampling or when you want to allow size >= N.
Use disjoint when you need non-overlapping partitions to compare independent fits or to maximize coverage of the original dataset without duplication.

See _generate_subsample_indices() source for exact behaviour and edge-case handling.

Outputs

An xarray.Dataset with dimensions: - sizes: tested sample sizes - repeats: repetition index
Data variables: param_0, param_1, …, ks_statistic, ks_pvalue, chi2_statistic, chi2_pvalue, rmse (depending on tests).
Attributes: - stability_points: dict of detected sizes per variable with method info - recommended_size: sample size where primary metric stabilizes - primary_metric: the metric used for recommended_size (typically ‘rmse’) - stable_pvalue_ks, stable_pvalue_chi2, stable_rmse: metric values at stable point - figure_path: path to saved figure (if generated) - Metadata about sampling, bins, and configuration

Quick example

import numpy as np
from magica.core import MagicAdjuster

data = np.random.weibull(2, 1000)
adjuster = MagicAdjuster(data)
adjuster.fit_distribution('weibull_min')

# Use Kneedle method for RMSE stability detection
ds = adjuster.monte_carlo_fit(
    sizes=[50, 100, 200, 400, 600, 800],
    n_repeats=50,
    tests=['ks', 'chi2', 'rmse'],
    stability_method='kneedle',  # Recommended for RMSE
    smooth=True,                 # Smooth curves before detection
    sampling='random',
    fit_kwargs={'floc': 0},
    fig_output_path='mc_summary.png',
    plot_type='series'
)

# Access recommended size
recommended_n = ds.attrs['recommended_size']
print(f"Recommended sample size: {recommended_n}")

# Check all stability points
print(ds.attrs['stability_points'])

Interpretation tips

Look at medians/boxplots of parameters across sizes — convergence indicates stable estimation.
Check ks_pvalue / chi2_pvalue behavior: rising p-values toward larger sizes suggest better fit at those sizes, but beware of the large-sample-size effect where p-values can become unreliable.
RMSE is the most reliable indicator for stability: it shows smooth, monotonic decrease and clear convergence. Always include RMSE in your tests.
Use recommended_size (based on primary metric, typically RMSE) as the recommended minimum sample size for robust parameter estimation.
The generated figure shows: - Parameter convergence across sample sizes - Goodness-of-fit evolution (KS, Chi-square p-values, RMSE) - Red vertical lines marking detected stability points - Orange dashed lines showing smoothed curves (for Kneedle method) - Info boxes with metric values at stable points
Compare stability points across different methods to understand metric behavior.

Method Comparison

When analyzing your results, compare how different metrics stabilize:

stability = ds.attrs['stability_points']

# RMSE typically stabilizes earliest and most reliably
rmse_stable = stability['rmse']['size']
rmse_method = stability['rmse']['method']

# P-values may stabilize later or show erratic behavior
ks_stable = stability['ks_pvalue']['size']

print(f"RMSE stable at n = {rmse_stable} (method: {rmse_method})")
print(f"KS p-value stable at n = {ks_stable}")

# Use RMSE-based recommendation for production
recommended = ds.attrs['recommended_size']
print(f"\n⭐ Recommended sample size: {recommended}")

Monte Carlo Stability Tutorial

Why Monte Carlo stability?

Methodology (what we do)

Inputs and options

Stability Detection Methods

CV Method (Coefficient of Variation)

Kneedle Method (Elbow Detection)

Plateau Method (Relative Gain Heuristic)

Method Selection Guide

Sampling strategies

Reproducibility and seed

Practical guidance

Outputs

Quick example

Interpretation tips

Method Comparison

Further reading