Monte Carlo Stability Analysis

The Monte Carlo stability analysis is a powerful method to determine the minimum sample size needed for reliable parameter estimation and to assess how statistical tests behave with different sample sizes.

Overview

The monte_carlo_fit() method performs repeated subsampling and distribution fitting to:

  1. Identify the minimum sample size where parameters stabilize

  2. Track how goodness-of-fit tests evolve with sample size

  3. Detect the “large sample size effect” where p-values become unreliable

  4. Provide empirical distributions for robust inference

Method Signature

def monte_carlo_fit(
    self,
    sizes: Optional[List[int]] = None,
    n_repeats: int = 20,
    tests: List[str] = ['ks'],
    stability_method: str = 'aggregate',
    fig_output_path: Optional[str] = None,
    plot_type: str = 'series',
    sampling: str = 'random',
    seed: Optional[int] = None,
    min_size: int = 50,
    max_size: Optional[int] = None,
    n_sizes: int = 10,
    distribution_params: Optional[Tuple] = None,
    **kwargs
) -> xr.Dataset

Parameters

Sample Size Configuration

sizesList[int], optional

Explicit list of sample sizes to test. If not provided, sizes are automatically generated based on min_size, max_size, and n_sizes.

Example: [100, 200, 500, 1000, 2000]

min_sizeint, default=50

Minimum sample size to test (used only if sizes is not provided).

max_sizeint, optional

Maximum sample size to test. Defaults to the size of the original dataset.

n_sizesint, default=10

Number of different sample sizes to test (used only if sizes is not provided).

Monte Carlo Configuration

n_repeatsint, default=20

Number of independent subsamples to draw for each sample size. Higher values provide better statistical estimates but increase computation time.

  • For quick exploration: 10-20 repeats

  • For production analysis: 30-50 repeats

  • For publication-quality results: 50-100 repeats

samplingstr, default=’random’

Sampling strategy for creating subsamples. Options:

  • 'random': Independent random draws without replacement (most common)

  • 'bootstrap': Random draws with replacement (useful for uncertainty quantification)

  • 'disjoint': Non-overlapping partitions (best for temporal/spatial data)

seedint, optional

Random seed for reproducibility. Always set this for reproducible analyses.

Goodness-of-Fit Tests

testsList[str], default=[‘ks’]

List of goodness-of-fit tests to perform. Options:

  • 'ks': Kolmogorov-Smirnov test (p-value based)

  • 'chi2': Chi-square test (p-value based)

  • 'rmse': Root Mean Square Error (distance metric)

⭐ Recommendation: Always include 'rmse' for stability detection, as it shows clearer convergence behavior than p-value based tests.

Example: tests=['ks', 'chi2', 'rmse']

Stability Detection

stability_methodstr, default=’kneedle’

Method for detecting when parameters and tests have stabilized. Options:

  • 'kneedle': Kneedle algorithm for elbow detection (default, best for RMSE)

  • 'cv': Coefficient of Variation method (robust for erratic metrics)

  • 'plateau': Relative gain heuristic (early “good enough” detection)

  • 'aggregate': Legacy median-based method (still supported)

  • None: No stability detection

⭐ Default: ‘kneedle’ - Recommended for RMSE stability detection as it excels at finding the “point of diminishing returns” in smooth, converging metrics.

See Stability Detection Methods for detailed explanations and selection guidance.

Method-Specific Parameters (via kwargs):

For CV method:

  • window_size: Number of consecutive sizes for validation (default: 25% of n_sizes)

  • cv_threshold: Maximum allowed coefficient of variation (default: 0.1)

For Kneedle method:

  • smooth: Whether to smooth curves before detection (default: True)

  • smoothing_method: ‘savgol’ (default) or ‘spline’

For Plateau method:

  • consecutive_points: Number of consecutive points below tolerance (default: 3)

  • relative_tolerance: Maximum relative change threshold (default: 0.01, i.e., 1%)

Visualization

fig_output_pathstr, optional

Path where the summary figure should be saved. If provided, automatically generates a 2×3 grid showing parameter and test evolution.

Example: 'monte_carlo_results.png'

plot_typestr, default=’series’

Style of plots to generate:

  • 'series': Line plots with median and IQR shading

  • 'boxplots': Box plots showing full distribution per size

Advanced Parameters

distribution_paramsTuple, optional

If provided, uses these fixed parameters instead of fitting. Useful for testing how well the method recovers known parameters.

kwargs

Additional keyword arguments passed to the goodness-of-fit tests (e.g., bins='doane' for chi-square test).

Return Value

Returns an xarray.Dataset containing:

Dimensions

  • sizes: Sample sizes tested (length = n_sizes)

  • repeats: Repetition index (length = n_repeats)

Data Variables

  • param_0, param_1, ...: Fitted distribution parameters for each (size, repeat) combination

  • ks_statistic, ks_pvalue: Kolmogorov-Smirnov test results (if 'ks' in tests)

  • chi2_statistic, chi2_pvalue: Chi-square test results (if 'chi2' in tests)

  • rmse: Root mean square error values (if 'rmse' in tests)

Attributes

  • distribution: Name of the fitted distribution

  • original_data_size: Size of the original dataset

  • sampling_method: Sampling strategy used

  • bins_method: Binning method used for chi-square test

  • stability_points: Dictionary with detected stability information

  • recommended_size: Sample size where primary metric stabilizes (None if not detected)

  • primary_metric: The metric used for recommended_size (typically ‘rmse’)

  • stable_pvalue_ks: KS test p-value at stability point (if ‘ks’ in tests)

  • stable_pvalue_chi2: Chi-square p-value at stability point (if ‘chi2’ in tests)

  • stable_rmse: RMSE value at stability point (if ‘rmse’ in tests)

  • n_repeats: Number of repetitions performed

  • min_size: Minimum sample size tested

  • max_size: Maximum sample size tested

  • n_sizes: Number of sample sizes tested

  • figure_path: Path to saved figure (if fig_output_path was provided)

  • created_by: Identifier for the method

Stability Detection Methods

MagicA provides three advanced methods for detecting when parameters and goodness-of-fit metrics have stabilized. Each method has specific strengths depending on the metric behavior.

CV Method (Coefficient of Variation)

How it works:

  1. For each parameter/test, computes the coefficient of variation (CV = std/mean) across repeats at each sample size

  2. Uses a sliding window (default: 25% of n_sizes) to check for stability

  3. Checks if CV stays below a threshold (default: 0.1) for all sizes in the window

  4. Returns the first size where this condition is met

Mathematical formula:

\[CV_n = \frac{\sigma_n}{\mu_n}\]

Where: - \(\sigma_n\) = standard deviation across repeats at size n - \(\mu_n\) = mean value across repeats at size n

Advantages:

  • Robust to erratic metrics (especially p-values)

  • Provides quantitative measure of variability

  • Good for assessing precision of estimates

  • Works well with any number of repeats

Disadvantages:

  • May be conservative (detects stability later than other methods)

  • Less sensitive to smooth convergence patterns

Parameters:

  • window_size: Number of consecutive sizes for validation (default: max(2, n_sizes // 4))

  • cv_threshold: Maximum allowed CV (default: 0.1)

Result format:

stability_points = {
    'rmse': {
        'size': 800,
        'index': 4,
        'cv_at_stability': 0.087,
        'smoothed_curve': None,
        'method': 'cv'
    }
}

When to use: - When working with erratic p-values (KS, Chi-square) - When you need quantitative variability assessment - When conservative estimates are preferred

Kneedle Method (Elbow Detection)

How it works:

  1. Normalizes the curve to [0, 1] range

  2. Computes the reference line connecting first and last points

  3. Calculates perpendicular distance from each point to the reference line

  4. The knee/elbow is at the point with maximum distance

  5. Optional: smooths the curve before detection using Savitzky-Golay filter or spline

Mathematical formula:

\[D_i = |y_i - (y_0 + (y_n - y_0) \cdot \frac{i}{n})|\]

Where: - \(D_i\) = distance from reference line at point i - \(y_i\) = normalized curve value at point i - \(y_0, y_n\) = first and last curve values

Advantages:

  • Optimal for RMSE: Excels at finding “point of diminishing returns” in smooth curves

  • Clear mathematical definition of “elbow”

  • Automatic curve direction detection

  • Smoothing reduces noise sensitivity

  • Based on published algorithm (Satopää et al., 2011)

Disadvantages:

  • Requires relatively smooth convergence

  • May not work well with highly erratic metrics

Parameters:

  • smooth: Whether to smooth before detection (default: True, recommended)

  • smoothing_method: ‘savgol’ (default, preserves features) or ‘spline’

Result format:

stability_points = {
    'rmse': {
        'size': 600,
        'index': 3,
        'cv_at_stability': None,
        'smoothed_curve': array([...]),  # If smoothing was used
        'method': 'kneedle'
    }
}

When to use: - ⭐ RECOMMENDED for RMSE stability detection - When metrics show smooth, monotonic convergence - When you want to find the “sweet spot” of diminishing returns - For production analyses requiring objective elbow detection

Reference:

Satopää, V., Albrecht, J., Irwin, D., & Raghavan, B. (2011). Finding a “Kneedle” in a Haystack: Detecting Knee Points in System Behavior. 2011 31st International Conference on Distributed Computing Systems Workshops, 166-171.

Plateau Method (Relative Gain Heuristic)

How it works:

  1. Computes relative change between consecutive points:

    \[\begin{split}\\Delta_i = \\frac{|y_{i-1} - y_i|}{|y_{i-1}|}\end{split}\]
  2. Checks if relative change stays below tolerance for L consecutive points

  3. Returns the first point where this “plateau” condition is met

Mathematical formula:

Stability detected when:

\[\Delta_i < \epsilon \text{ for } L \text{ consecutive points}\]

Where: - \(\Delta_i\) = relative change at point i - \(\epsilon\) = relative tolerance (default: 0.01, i.e., 1%) - \(L\) = consecutive points threshold (default: 3)

Advantages:

  • Early detection of “good enough” stabilization

  • Intuitive interpretation (% improvement)

  • Fast - no smoothing required

  • Good for early stopping in iterative analyses

Disadvantages:

  • May detect premature plateaus

  • Sensitive to parameter tuning

  • Less robust to noise than other methods

Parameters:

  • consecutive_points: Number of consecutive points below tolerance (default: 3)

  • relative_tolerance: Maximum relative change threshold (default: 0.01)

Result format:

stability_points = {
    'rmse': {
        'size': 500,
        'index': 2,
        'cv_at_stability': None,
        'smoothed_curve': None,
        'method': 'plateau'
    }
}

When to use: - When computational budget is limited - For early “good enough” detection - When slight improvements beyond plateau are acceptable - In iterative analyses where early stopping is beneficial

Method Selection Guide

Choose the appropriate method based on your metric and goals:

Method

Best For

Strengths

Typical Use Case

Kneedle

RMSE, converging parameters

Objective elbow detection, smooth curves

Production RMSE analysis (RECOMMENDED)

CV

P-values (KS, Chi-square)

Robust to noise, quantifies variability

P-value stability, conservative estimates

Plateau

Early detection, limited budget

Fast, intuitive, early stopping

Quick exploration, iterative tuning

Aggregate

Legacy analyses

Simple, robust

Backward compatibility

Recommended Workflow:

# 1. Use Kneedle for RMSE (most reliable)
results_rmse = processor.monte_carlo_fit(
    tests=['rmse'],
    stability_method='kneedle',
    smooth=True
)

# 2. Use CV for p-values (robust to noise)
results_pvalues = processor.monte_carlo_fit(
    tests=['ks', 'chi2'],
    stability_method='cv',
    cv_threshold=0.15  # Can relax for p-values
)

# 3. Compare stability points
rmse_stable = results_rmse.attrs['recommended_size']
print(f"RMSE stable at n = {rmse_stable} (MOST RELIABLE)")

Interpreting Stability Points

Each stability point contains:

  • size: The sample size where stability is first detected (None if not detected)

  • index: The index in the sizes list (None if not detected)

  • cv_at_stability: Coefficient of variation at stability (only for 'cv' method)

  • smoothed_curve: Smoothed curve data (only for 'kneedle' method with smoothing)

  • method: Name of the detection method used

Accessing stability information:

# Get recommended size (based on primary metric, usually RMSE)
recommended_n = results.attrs['recommended_size']
primary_metric = results.attrs['primary_metric']

if recommended_n:
    print(f"✓ {primary_metric.upper()} stable at n = {recommended_n}")

    # Get quality metrics at stable point
    if 'stable_rmse' in results.attrs:
        print(f"  RMSE at stable point: {results.attrs['stable_rmse']:.4f}")
    if 'stable_pvalue_ks' in results.attrs:
        print(f"  KS p-value at stable point: {results.attrs['stable_pvalue_ks']:.4f}")
else:
    print("✗ No stability detected - try larger sample sizes")

If stability is not detected:

  • The tested sample sizes are too small - increase max_size

  • The metric has high variability - increase n_repeats

  • The threshold is too strict - relax method-specific parameters

  • The data distribution may not be appropriate

Complete Example

import numpy as np
import magica as ma

# Load data
data = np.random.weibull(2, 10000) * 8 + 2
processor = ma.read_data(data)
processor.fit_distribution('weibull_min')

# Run comprehensive Monte Carlo analysis with Kneedle method
results = processor.monte_carlo_fit(
    sizes=[100, 200, 400, 600, 800, 1000, 1500, 2000, 3000, 4000],
    n_repeats=50,
    tests=['ks', 'chi2', 'rmse'],  # Include RMSE!
    stability_method='kneedle',     # Use Kneedle for RMSE
    smooth=True,                    # Smooth curves before detection
    sampling='random',
    seed=42,
    fig_output_path='monte_carlo_analysis.png'
)

# Check recommended size (based on RMSE with Kneedle)
recommended_n = results.attrs['recommended_size']
primary_metric = results.attrs['primary_metric']

print(f"⭐ {primary_metric.upper()} stabilizes at n = {recommended_n} (RECOMMENDED)")
print(f"   RMSE at stable point: {results.attrs['stable_rmse']:.4f}")
print(f"   KS p-value at stable point: {results.attrs['stable_pvalue_ks']:.4f}")

# Check all stability points
stability = results.attrs['stability_points']
print("\nAll stability points:")
for metric, info in stability.items():
    size = info['size']
    method = info['method']
    if size:
        print(f"   {metric}: n = {size} (method: {method})")
    else:
        print(f"   {metric}: no clear stability detected")

# Use the recommended size for robust inference
print(f"\nRecommended subsample size for production: {recommended_n}")

Sampling Strategies

Random Sampling (default)

Each subsample is drawn independently without replacement from the original data.

When to use:

  • General purpose analysis

  • Independent observations

  • Standard stability analysis

Example:

results = processor.monte_carlo_fit(
    sampling='random',
    seed=42
)

Bootstrap Sampling

Each subsample is drawn with replacement, allowing the same observation to appear multiple times.

When to use:

  • Uncertainty quantification

  • Confidence interval estimation

  • When subsample size > original data size

Example:

results = processor.monte_carlo_fit(
    sampling='bootstrap',
    seed=42
)

Disjoint Sampling

Data is divided into non-overlapping partitions. Each observation is used exactly once per sample size.

When to use:

  • Time series data

  • Spatially correlated data

  • When you want lower variance between repeats

  • More representative of data structure

Example:

results = processor.monte_carlo_fit(
    sampling='disjoint',
    seed=42
)

Large Sample Size Effect

For very large datasets (>10,000 samples), statistical tests become extremely powerful, causing:

  • P-values become very small even for negligible deviations

  • Difficulty distinguishing statistical from practical significance

  • Over-rejection of perfectly adequate distributions

Solution: Use the Monte Carlo CPS (Coefficient/P-value/Sample size) method to find the optimal subsample size where:

  1. Parameters have converged (stable estimates)

  2. P-values haven’t yet inflated (interpretable tests)

  3. RMSE has stabilized (good fit quality)

See the MagicAdjuster Tutorial for a complete demonstration.

See Also