Monte Carlo Stability Analysis
The Monte Carlo stability analysis is a powerful method to determine the minimum sample size needed for reliable parameter estimation and to assess how statistical tests behave with different sample sizes.
Overview
The monte_carlo_fit() method performs repeated subsampling and distribution fitting to:
Identify the minimum sample size where parameters stabilize
Track how goodness-of-fit tests evolve with sample size
Detect the “large sample size effect” where p-values become unreliable
Provide empirical distributions for robust inference
Method Signature
def monte_carlo_fit(
self,
sizes: Optional[List[int]] = None,
n_repeats: int = 20,
tests: List[str] = ['ks'],
stability_method: str = 'aggregate',
fig_output_path: Optional[str] = None,
plot_type: str = 'series',
sampling: str = 'random',
seed: Optional[int] = None,
min_size: int = 50,
max_size: Optional[int] = None,
n_sizes: int = 10,
distribution_params: Optional[Tuple] = None,
**kwargs
) -> xr.Dataset
Parameters
Sample Size Configuration
- sizesList[int], optional
Explicit list of sample sizes to test. If not provided, sizes are automatically generated based on
min_size,max_size, andn_sizes.Example:
[100, 200, 500, 1000, 2000]- min_sizeint, default=50
Minimum sample size to test (used only if
sizesis not provided).- max_sizeint, optional
Maximum sample size to test. Defaults to the size of the original dataset.
- n_sizesint, default=10
Number of different sample sizes to test (used only if
sizesis not provided).
Monte Carlo Configuration
- n_repeatsint, default=20
Number of independent subsamples to draw for each sample size. Higher values provide better statistical estimates but increase computation time.
For quick exploration: 10-20 repeats
For production analysis: 30-50 repeats
For publication-quality results: 50-100 repeats
- samplingstr, default=’random’
Sampling strategy for creating subsamples. Options:
'random': Independent random draws without replacement (most common)'bootstrap': Random draws with replacement (useful for uncertainty quantification)'disjoint': Non-overlapping partitions (best for temporal/spatial data)
- seedint, optional
Random seed for reproducibility. Always set this for reproducible analyses.
Goodness-of-Fit Tests
- testsList[str], default=[‘ks’]
List of goodness-of-fit tests to perform. Options:
'ks': Kolmogorov-Smirnov test (p-value based)'chi2': Chi-square test (p-value based)'rmse': Root Mean Square Error (distance metric)
⭐ Recommendation: Always include
'rmse'for stability detection, as it shows clearer convergence behavior than p-value based tests.Example:
tests=['ks', 'chi2', 'rmse']
Stability Detection
- stability_methodstr, default=’kneedle’
Method for detecting when parameters and tests have stabilized. Options:
'kneedle': Kneedle algorithm for elbow detection (default, best for RMSE)'cv': Coefficient of Variation method (robust for erratic metrics)'plateau': Relative gain heuristic (early “good enough” detection)'aggregate': Legacy median-based method (still supported)None: No stability detection
⭐ Default: ‘kneedle’ - Recommended for RMSE stability detection as it excels at finding the “point of diminishing returns” in smooth, converging metrics.
See Stability Detection Methods for detailed explanations and selection guidance.
Method-Specific Parameters (via kwargs):
For CV method:
window_size: Number of consecutive sizes for validation (default: 25% of n_sizes)
cv_threshold: Maximum allowed coefficient of variation (default: 0.1)For Kneedle method:
smooth: Whether to smooth curves before detection (default: True)
smoothing_method: ‘savgol’ (default) or ‘spline’For Plateau method:
consecutive_points: Number of consecutive points below tolerance (default: 3)
relative_tolerance: Maximum relative change threshold (default: 0.01, i.e., 1%)
Visualization
- fig_output_pathstr, optional
Path where the summary figure should be saved. If provided, automatically generates a 2×3 grid showing parameter and test evolution.
Example:
'monte_carlo_results.png'- plot_typestr, default=’series’
Style of plots to generate:
'series': Line plots with median and IQR shading'boxplots': Box plots showing full distribution per size
Advanced Parameters
- distribution_paramsTuple, optional
If provided, uses these fixed parameters instead of fitting. Useful for testing how well the method recovers known parameters.
- kwargs
Additional keyword arguments passed to the goodness-of-fit tests (e.g.,
bins='doane'for chi-square test).
Return Value
Returns an xarray.Dataset containing:
Dimensions
sizes: Sample sizes tested (length =n_sizes)repeats: Repetition index (length =n_repeats)
Data Variables
param_0, param_1, ...: Fitted distribution parameters for each (size, repeat) combinationks_statistic, ks_pvalue: Kolmogorov-Smirnov test results (if'ks'in tests)chi2_statistic, chi2_pvalue: Chi-square test results (if'chi2'in tests)rmse: Root mean square error values (if'rmse'in tests)
Attributes
distribution: Name of the fitted distributionoriginal_data_size: Size of the original datasetsampling_method: Sampling strategy usedbins_method: Binning method used for chi-square teststability_points: Dictionary with detected stability informationrecommended_size: Sample size where primary metric stabilizes (None if not detected)primary_metric: The metric used for recommended_size (typically ‘rmse’)stable_pvalue_ks: KS test p-value at stability point (if ‘ks’ in tests)stable_pvalue_chi2: Chi-square p-value at stability point (if ‘chi2’ in tests)stable_rmse: RMSE value at stability point (if ‘rmse’ in tests)n_repeats: Number of repetitions performedmin_size: Minimum sample size testedmax_size: Maximum sample size testedn_sizes: Number of sample sizes testedfigure_path: Path to saved figure (iffig_output_pathwas provided)created_by: Identifier for the method
Stability Detection Methods
MagicA provides three advanced methods for detecting when parameters and goodness-of-fit metrics have stabilized. Each method has specific strengths depending on the metric behavior.
CV Method (Coefficient of Variation)
How it works:
For each parameter/test, computes the coefficient of variation (CV = std/mean) across repeats at each sample size
Uses a sliding window (default: 25% of n_sizes) to check for stability
Checks if CV stays below a threshold (default: 0.1) for all sizes in the window
Returns the first size where this condition is met
Mathematical formula:
Where: - \(\sigma_n\) = standard deviation across repeats at size n - \(\mu_n\) = mean value across repeats at size n
Advantages:
Robust to erratic metrics (especially p-values)
Provides quantitative measure of variability
Good for assessing precision of estimates
Works well with any number of repeats
Disadvantages:
May be conservative (detects stability later than other methods)
Less sensitive to smooth convergence patterns
Parameters:
window_size: Number of consecutive sizes for validation (default: max(2, n_sizes // 4))cv_threshold: Maximum allowed CV (default: 0.1)
Result format:
stability_points = {
'rmse': {
'size': 800,
'index': 4,
'cv_at_stability': 0.087,
'smoothed_curve': None,
'method': 'cv'
}
}
When to use: - When working with erratic p-values (KS, Chi-square) - When you need quantitative variability assessment - When conservative estimates are preferred
Kneedle Method (Elbow Detection)
How it works:
Normalizes the curve to [0, 1] range
Computes the reference line connecting first and last points
Calculates perpendicular distance from each point to the reference line
The knee/elbow is at the point with maximum distance
Optional: smooths the curve before detection using Savitzky-Golay filter or spline
Mathematical formula:
Where: - \(D_i\) = distance from reference line at point i - \(y_i\) = normalized curve value at point i - \(y_0, y_n\) = first and last curve values
Advantages:
Optimal for RMSE: Excels at finding “point of diminishing returns” in smooth curves
Clear mathematical definition of “elbow”
Automatic curve direction detection
Smoothing reduces noise sensitivity
Based on published algorithm (Satopää et al., 2011)
Disadvantages:
Requires relatively smooth convergence
May not work well with highly erratic metrics
Parameters:
smooth: Whether to smooth before detection (default: True, recommended)smoothing_method: ‘savgol’ (default, preserves features) or ‘spline’
Result format:
stability_points = {
'rmse': {
'size': 600,
'index': 3,
'cv_at_stability': None,
'smoothed_curve': array([...]), # If smoothing was used
'method': 'kneedle'
}
}
When to use: - ⭐ RECOMMENDED for RMSE stability detection - When metrics show smooth, monotonic convergence - When you want to find the “sweet spot” of diminishing returns - For production analyses requiring objective elbow detection
Reference:
Satopää, V., Albrecht, J., Irwin, D., & Raghavan, B. (2011). Finding a “Kneedle” in a Haystack: Detecting Knee Points in System Behavior. 2011 31st International Conference on Distributed Computing Systems Workshops, 166-171.
Plateau Method (Relative Gain Heuristic)
How it works:
Computes relative change between consecutive points:
\[\begin{split}\\Delta_i = \\frac{|y_{i-1} - y_i|}{|y_{i-1}|}\end{split}\]Checks if relative change stays below tolerance for L consecutive points
Returns the first point where this “plateau” condition is met
Mathematical formula:
Stability detected when:
Where: - \(\Delta_i\) = relative change at point i - \(\epsilon\) = relative tolerance (default: 0.01, i.e., 1%) - \(L\) = consecutive points threshold (default: 3)
Advantages:
Early detection of “good enough” stabilization
Intuitive interpretation (% improvement)
Fast - no smoothing required
Good for early stopping in iterative analyses
Disadvantages:
May detect premature plateaus
Sensitive to parameter tuning
Less robust to noise than other methods
Parameters:
consecutive_points: Number of consecutive points below tolerance (default: 3)relative_tolerance: Maximum relative change threshold (default: 0.01)
Result format:
stability_points = {
'rmse': {
'size': 500,
'index': 2,
'cv_at_stability': None,
'smoothed_curve': None,
'method': 'plateau'
}
}
When to use: - When computational budget is limited - For early “good enough” detection - When slight improvements beyond plateau are acceptable - In iterative analyses where early stopping is beneficial
Method Selection Guide
Choose the appropriate method based on your metric and goals:
Method |
Best For |
Strengths |
Typical Use Case |
|---|---|---|---|
Kneedle |
RMSE, converging parameters |
Objective elbow detection, smooth curves |
Production RMSE analysis (RECOMMENDED) |
CV |
P-values (KS, Chi-square) |
Robust to noise, quantifies variability |
P-value stability, conservative estimates |
Plateau |
Early detection, limited budget |
Fast, intuitive, early stopping |
Quick exploration, iterative tuning |
Aggregate |
Legacy analyses |
Simple, robust |
Backward compatibility |
Recommended Workflow:
# 1. Use Kneedle for RMSE (most reliable)
results_rmse = processor.monte_carlo_fit(
tests=['rmse'],
stability_method='kneedle',
smooth=True
)
# 2. Use CV for p-values (robust to noise)
results_pvalues = processor.monte_carlo_fit(
tests=['ks', 'chi2'],
stability_method='cv',
cv_threshold=0.15 # Can relax for p-values
)
# 3. Compare stability points
rmse_stable = results_rmse.attrs['recommended_size']
print(f"RMSE stable at n = {rmse_stable} (MOST RELIABLE)")
Interpreting Stability Points
Each stability point contains:
size: The sample size where stability is first detected (Noneif not detected)index: The index in thesizeslist (Noneif not detected)cv_at_stability: Coefficient of variation at stability (only for'cv'method)smoothed_curve: Smoothed curve data (only for'kneedle'method with smoothing)method: Name of the detection method used
Accessing stability information:
# Get recommended size (based on primary metric, usually RMSE)
recommended_n = results.attrs['recommended_size']
primary_metric = results.attrs['primary_metric']
if recommended_n:
print(f"✓ {primary_metric.upper()} stable at n = {recommended_n}")
# Get quality metrics at stable point
if 'stable_rmse' in results.attrs:
print(f" RMSE at stable point: {results.attrs['stable_rmse']:.4f}")
if 'stable_pvalue_ks' in results.attrs:
print(f" KS p-value at stable point: {results.attrs['stable_pvalue_ks']:.4f}")
else:
print("✗ No stability detected - try larger sample sizes")
If stability is not detected:
The tested sample sizes are too small - increase
max_sizeThe metric has high variability - increase
n_repeatsThe threshold is too strict - relax method-specific parameters
The data distribution may not be appropriate
Recommended Practices
Choosing Sample Sizes
# For small to medium datasets (< 5000 samples)
sizes = [50, 100, 200, 400, 600, 800]
# For large datasets (> 10000 samples) - CPS method
sizes = [100, 200, 400, 600, 800, 1000, 1500, 2000, 3000, 4000]
Using RMSE for Stability Detection
⭐ Always include RMSE in your tests for stability detection:
results = adjuster.monte_carlo_fit(
sizes=[100, 200, 500, 1000, 2000],
n_repeats=50,
tests=['ks', 'chi2', 'rmse'], # RMSE is crucial!
sampling='random',
seed=42
)
Why RMSE is recommended:
Monotonic decrease: RMSE decreases smoothly as sample size increases
Clear convergence: Stabilization point is visually obvious
Robust: Less affected by sampling variability than p-values
Direct measure: Measures actual fit quality, not hypothesis test power
No inflation: Unlike p-values, doesn’t become artificially small with large samples
P-values (KS, Chi-square) can be erratic and are subject to the “large sample size effect” where they become very small even for good fits when the sample is large.
Accessing Stability Information
# Get recommended size (based on primary metric, typically RMSE)
recommended_n = results.attrs['recommended_size']
primary_metric = results.attrs['primary_metric']
if recommended_n is not None:
print(f"✓ {primary_metric.upper()} stabilizes at n = {recommended_n}")
print(f" This is the recommended minimum sample size")
# Get quality metrics at the stable point
print(f"\nMetrics at stable point (n = {recommended_n}):")
if 'stable_rmse' in results.attrs:
print(f" RMSE: {results.attrs['stable_rmse']:.4f}")
if 'stable_pvalue_ks' in results.attrs:
print(f" KS p-value: {results.attrs['stable_pvalue_ks']:.4f}")
if 'stable_pvalue_chi2' in results.attrs:
print(f" Chi-square p-value: {results.attrs['stable_pvalue_chi2']:.4f}")
else:
print("✗ No clear stability detected - try larger sample sizes or more repeats")
# Access individual stability points for all metrics
stability = results.attrs['stability_points']
for metric, info in stability.items():
size = info['size']
method = info['method']
if size:
print(f"{metric}: stable at n = {size} (method: {method})")
if info['cv_at_stability'] is not None:
print(f" CV at stability: {info['cv_at_stability']:.3f}")
Complete Example
import numpy as np
import magica as ma
# Load data
data = np.random.weibull(2, 10000) * 8 + 2
processor = ma.read_data(data)
processor.fit_distribution('weibull_min')
# Run comprehensive Monte Carlo analysis with Kneedle method
results = processor.monte_carlo_fit(
sizes=[100, 200, 400, 600, 800, 1000, 1500, 2000, 3000, 4000],
n_repeats=50,
tests=['ks', 'chi2', 'rmse'], # Include RMSE!
stability_method='kneedle', # Use Kneedle for RMSE
smooth=True, # Smooth curves before detection
sampling='random',
seed=42,
fig_output_path='monte_carlo_analysis.png'
)
# Check recommended size (based on RMSE with Kneedle)
recommended_n = results.attrs['recommended_size']
primary_metric = results.attrs['primary_metric']
print(f"⭐ {primary_metric.upper()} stabilizes at n = {recommended_n} (RECOMMENDED)")
print(f" RMSE at stable point: {results.attrs['stable_rmse']:.4f}")
print(f" KS p-value at stable point: {results.attrs['stable_pvalue_ks']:.4f}")
# Check all stability points
stability = results.attrs['stability_points']
print("\nAll stability points:")
for metric, info in stability.items():
size = info['size']
method = info['method']
if size:
print(f" {metric}: n = {size} (method: {method})")
else:
print(f" {metric}: no clear stability detected")
# Use the recommended size for robust inference
print(f"\nRecommended subsample size for production: {recommended_n}")
Sampling Strategies
Random Sampling (default)
Each subsample is drawn independently without replacement from the original data.
When to use:
General purpose analysis
Independent observations
Standard stability analysis
Example:
results = processor.monte_carlo_fit(
sampling='random',
seed=42
)
Bootstrap Sampling
Each subsample is drawn with replacement, allowing the same observation to appear multiple times.
When to use:
Uncertainty quantification
Confidence interval estimation
When subsample size > original data size
Example:
results = processor.monte_carlo_fit(
sampling='bootstrap',
seed=42
)
Disjoint Sampling
Data is divided into non-overlapping partitions. Each observation is used exactly once per sample size.
When to use:
Time series data
Spatially correlated data
When you want lower variance between repeats
More representative of data structure
Example:
results = processor.monte_carlo_fit(
sampling='disjoint',
seed=42
)
Large Sample Size Effect
For very large datasets (>10,000 samples), statistical tests become extremely powerful, causing:
P-values become very small even for negligible deviations
Difficulty distinguishing statistical from practical significance
Over-rejection of perfectly adequate distributions
Solution: Use the Monte Carlo CPS (Coefficient/P-value/Sample size) method to find the optimal subsample size where:
Parameters have converged (stable estimates)
P-values haven’t yet inflated (interpretable tests)
RMSE has stabilized (good fit quality)
See the MagicAdjuster Tutorial for a complete demonstration.
See Also
MagicAdjuster Tutorial: Complete tutorial with real data examples
Core Module: API reference for MagicAdjuster class
AutoFitter Tutorial: Automatic distribution selection