Monte Carlo Stability Tutorial
==============================

This tutorial explains why Monte Carlo stability analysis is useful and how
the `monte_carlo_fit` routine in MagicA is designed and used.

Why Monte Carlo stability?
--------------------------

Large-sample-size effects occur when very large datasets make statistical tests
extremely powerful: p-values become very small even for effects that are
practically negligible. In other words, with enough data a test can reject the
null hypothesis for differences that have no practical importance. The Monte
Carlo stability workflow therefore looks for a sampling size where the chosen
goodness-of-fit tests start to "pass" in a practical sense and parameter
estimates stop changing much across repeats — that sample size is reported as
the stability point.


Methodology (what we do)
-------------------------

1. Build a grid of sample sizes to test (or use a provided list).
2. For each size, generate multiple subsamples (repeats) from the original data.
3. For each subsample, fit the distribution (unless fixed parameters are supplied).
4. Compute goodness-of-fit metrics (e.g., KS p-value, chi-square p-value, RMSE).
5. Store all results in an xarray Dataset with dimensions `sizes` x `repeats`.
6. Detect stability points using a moving-window criterion on variability (e.g., CV).
7. Optionally save a 2x3 summary figure with red dashed vertical lines marking each
   variable's detected stability sample size.

Inputs and options
------------------

- `sizes`: list or auto-generated grid of sample sizes.
- `n_repeats`: number of subsamples per size.
- `tests`: list of metrics to compute: `['ks','chi2','rmse']`.
- `stability_method`: method for detecting stability points (`'kneedle'` [default], `'cv'`, `'plateau'`).
- `sampling`: sampling strategy (`random`, `bootstrap`, `disjoint`).
- `fit_kwargs`: passed to the underlying fit routine (e.g., `floc=0`).
- `distribution_params`: optional fixed parameters (skip fitting when supplied).
- `fig_output_path`: optional path to save the 2x3 summary figure.
- `plot_type`: `'series'` (median + IQR) or `'boxplots'`.

Stability Detection Methods
----------------------------

MagicA provides three methods for detecting when parameters and goodness-of-fit
metrics have stabilized. Each method has different strengths depending on the
metric behavior.

CV Method (Coefficient of Variation)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Principle**: Monitors the coefficient of variation (CV = std/mean) across
repeats. Stability is detected when CV remains below a threshold for a
consecutive window of sample sizes.

**Mathematical definition**:

.. math::

    CV_n = \frac{\sigma_n}{\mu_n}

Where :math:`\sigma_n` is the standard deviation and :math:`\mu_n` is the mean
across repeats at sample size n.

**When to use**:

- Erratic p-values (KS, Chi-square tests)
- When you need quantitative variability assessment
- Conservative estimates preferred

**Parameters** (via kwargs):

- `window_size`: Number of consecutive sizes for validation (default: 25% of n_sizes)
- `cv_threshold`: Maximum allowed CV (default: 0.1)

Kneedle Method (Elbow Detection)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Principle**: Based on the "Kneedle" algorithm (Satopää et al., 2011). Finds the
"elbow" or "knee" point in a curve by:

1. Normalizing the curve to [0, 1]
2. Computing a reference line from first to last point
3. Finding the point with maximum perpendicular distance from the reference line

**Mathematical definition**:

The knee point is where:

.. math::

    \text{knee} = \arg\max_i |y_i - (y_0 + (y_n - y_0) \cdot \frac{i}{n})|

Where :math:`y_i` is the normalized curve value at point i.

**When to use**:

- **RMSE and converging parameters** (RECOMMENDED)
- Smooth, monotonic convergence patterns
- Finding "point of diminishing returns"
- Production analyses requiring objective detection

**Parameters** (via kwargs):

- `smooth`: Whether to smooth curves before detection (default: True)
- `smoothing_method`: 'savgol' (preserves features) or 'spline' (default: 'savgol')

**Reference**: Satopää, V., Albrecht, J., Irwin, D., & Raghavan, B. (2011).
*Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior*.
2011 31st International Conference on Distributed Computing Systems Workshops, 166-171.

Plateau Method (Relative Gain Heuristic)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Principle**: Detects when relative improvement between consecutive points
becomes negligible. Stability is found when the relative change stays below a
threshold for L consecutive points.

**Mathematical definition**:

Relative change at point i:

.. math::

    \Delta_i = \frac{|y_{i-1} - y_i|}{|y_{i-1}|}

Stability when:

.. math::

    \Delta_i < \epsilon \text{ for } L \text{ consecutive points}

**When to use**:

- Limited computational budget
- Early "good enough" detection
- Iterative analyses with early stopping

**Parameters** (via kwargs):

- `consecutive_points`: Number of points below tolerance (default: 3)
- `relative_tolerance`: Maximum relative change (default: 0.01, i.e., 1%)

Method Selection Guide
~~~~~~~~~~~~~~~~~~~~~~

+----------------+------------------+-------------------------+
| Method         | Best For         | Typical Use Case        |
+================+==================+=========================+
| **Kneedle**    | RMSE,            | Production RMSE         |
|                | parameters       | analysis (RECOMMENDED)  |
+----------------+------------------+-------------------------+
| **CV**         | P-values         | Conservative p-value    |
|                | (KS, Chi²)       | stability               |
+----------------+------------------+-------------------------+
| **Plateau**    | Early detection, | Quick exploration,      |
|                | limited budget   | iterative tuning        |
+----------------+------------------+-------------------------+

**Recommended workflow**:

.. code-block:: python

    # Use Kneedle for RMSE (most reliable)
    results = adjuster.monte_carlo_fit(
        tests=['rmse'],
        stability_method='kneedle',
        smooth=True
    )
    
    recommended_n = results.attrs['recommended_size']
    print(f"RMSE stable at n = {recommended_n}")

Sampling strategies
-------------------

The helper `_generate_subsample_indices()` produces integer index lists that are
used by the Monte Carlo routines to build subsamples from the original dataset.
Below is a concise, practical explanation of the three supported strategies and
how to choose between them.

- random (without replacement)
  - What: Each subsample contains `size` unique indices drawn randomly from the
    original dataset (no duplicates inside the same subsample).
  - Constraint: `size` must be less than or equal to the original sample size
    `N`.
  - When to use: Typical default for studying how estimates change with smaller
    sample sizes. Keeps each subsample representative and avoids internal
    duplication.

- bootstrap (with replacement)
  - What: Each subsample is drawn with replacement, so the same original row
    can appear multiple times in a single subsample.
  - Constraint: None — `size` may be larger than `N` because indices can repeat.
  - When to use: Use when you need to estimate uncertainty (variance, CIs) via
    resampling, or when you want to allow `size` >= `N` for simulation purposes.

- disjoint (non-overlapping partitions)
  - What: The original indices are shuffled and partitioned into non-overlapping
    blocks of length `size`. Each block is a subsample with no shared indices.
  - Constraint: `size` must be <= `N`. The number of blocks per shuffle is
    `N // size`; for `n_repeats` larger than that, additional shuffles are used.
  - When to use: Use when you want independent partitions (similar to simple
    cross-validation) and want to avoid overlap between subsamples within the
    same shuffle.

Reproducibility and seed
------------------------

- A random `seed` controls the pseudo-random generator used to create indices.
  Providing the same seed reproduces the same sequence of subsamples. This is
  recommended for experiments where results must be repeatable.

Practical guidance
------------------

- Use `random` as a safe default when `size <= N` and you want unbiased
  subsamples without duplicates.
- Use `bootstrap` when you need to estimate uncertainty from resampling or when
  you want to allow `size >= N`.
- Use `disjoint` when you need non-overlapping partitions to compare independent
  fits or to maximize coverage of the original dataset without duplication.

See `_generate_subsample_indices()` source for exact behaviour and edge-case
handling.

Outputs
-------

- An `xarray.Dataset` with dimensions:
  - `sizes`: tested sample sizes
  - `repeats`: repetition index
- Data variables: `param_0, param_1, ...`, `ks_statistic`, `ks_pvalue`,
  `chi2_statistic`, `chi2_pvalue`, `rmse` (depending on `tests`).
- Attributes: 
  - `stability_points`: dict of detected sizes per variable with method info
  - `recommended_size`: sample size where primary metric stabilizes
  - `primary_metric`: the metric used for recommended_size (typically 'rmse')
  - `stable_pvalue_ks`, `stable_pvalue_chi2`, `stable_rmse`: metric values at stable point
  - `figure_path`: path to saved figure (if generated)
  - Metadata about sampling, bins, and configuration

Quick example
-------------

.. code-block:: python

    import numpy as np
    from magica.core import MagicAdjuster

    data = np.random.weibull(2, 1000)
    adjuster = MagicAdjuster(data)
    adjuster.fit_distribution('weibull_min')

    # Use Kneedle method for RMSE stability detection
    ds = adjuster.monte_carlo_fit(
        sizes=[50, 100, 200, 400, 600, 800],
        n_repeats=50,
        tests=['ks', 'chi2', 'rmse'],
        stability_method='kneedle',  # Recommended for RMSE
        smooth=True,                 # Smooth curves before detection
        sampling='random',
        fit_kwargs={'floc': 0},
        fig_output_path='mc_summary.png',
        plot_type='series'
    )

    # Access recommended size
    recommended_n = ds.attrs['recommended_size']
    print(f"Recommended sample size: {recommended_n}")
    
    # Check all stability points
    print(ds.attrs['stability_points'])

Interpretation tips
-------------------

- Look at medians/boxplots of parameters across `sizes` — convergence indicates
  stable estimation.
- Check `ks_pvalue` / `chi2_pvalue` behavior: rising p-values toward larger sizes
  suggest better fit at those sizes, but beware of the large-sample-size effect
  where p-values can become unreliable.
- **RMSE is the most reliable indicator** for stability: it shows smooth, monotonic
  decrease and clear convergence. Always include RMSE in your tests.
- Use `recommended_size` (based on primary metric, typically RMSE) as the
  recommended minimum sample size for robust parameter estimation.
- The generated figure shows:
  - Parameter convergence across sample sizes
  - Goodness-of-fit evolution (KS, Chi-square p-values, RMSE)
  - Red vertical lines marking detected stability points
  - Orange dashed lines showing smoothed curves (for Kneedle method)
  - Info boxes with metric values at stable points
- Compare stability points across different methods to understand metric behavior.

Method Comparison
-----------------

When analyzing your results, compare how different metrics stabilize:

.. code-block:: python

    stability = ds.attrs['stability_points']
    
    # RMSE typically stabilizes earliest and most reliably
    rmse_stable = stability['rmse']['size']
    rmse_method = stability['rmse']['method']
    
    # P-values may stabilize later or show erratic behavior
    ks_stable = stability['ks_pvalue']['size']
    
    print(f"RMSE stable at n = {rmse_stable} (method: {rmse_method})")
    print(f"KS p-value stable at n = {ks_stable}")
    
    # Use RMSE-based recommendation for production
    recommended = ds.attrs['recommended_size']
    print(f"\n⭐ Recommended sample size: {recommended}")

Further reading
---------------

See :doc:`/api/core` for function signature and options, and the example
notebook in this folder for a hands-on run.