Monte Carlo Stability Analysis
==============================

The Monte Carlo stability analysis is a powerful method to determine the minimum sample size needed for reliable parameter estimation and to assess how statistical tests behave with different sample sizes.

Overview
--------

The ``monte_carlo_fit()`` method performs repeated subsampling and distribution fitting to:

1. Identify the minimum sample size where parameters stabilize
2. Track how goodness-of-fit tests evolve with sample size
3. Detect the "large sample size effect" where p-values become unreliable
4. Provide empirical distributions for robust inference

Method Signature
----------------

.. code-block:: python

    def monte_carlo_fit(
        self,
        sizes: Optional[List[int]] = None,
        n_repeats: int = 20,
        tests: List[str] = ['ks'],
        stability_method: str = 'aggregate',
        fig_output_path: Optional[str] = None,
        plot_type: str = 'series',
        sampling: str = 'random',
        seed: Optional[int] = None,
        min_size: int = 50,
        max_size: Optional[int] = None,
        n_sizes: int = 10,
        distribution_params: Optional[Tuple] = None,
        **kwargs
    ) -> xr.Dataset

Parameters
----------

Sample Size Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~

**sizes** : List[int], optional
    Explicit list of sample sizes to test. If not provided, sizes are automatically generated based on ``min_size``, ``max_size``, and ``n_sizes``.
    
    Example: ``[100, 200, 500, 1000, 2000]``

**min_size** : int, default=50
    Minimum sample size to test (used only if ``sizes`` is not provided).

**max_size** : int, optional
    Maximum sample size to test. Defaults to the size of the original dataset.

**n_sizes** : int, default=10
    Number of different sample sizes to test (used only if ``sizes`` is not provided).

Monte Carlo Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~

**n_repeats** : int, default=20
    Number of independent subsamples to draw for each sample size. Higher values provide better statistical estimates but increase computation time.
    
    - For quick exploration: 10-20 repeats
    - For production analysis: 30-50 repeats
    - For publication-quality results: 50-100 repeats

**sampling** : str, default='random'
    Sampling strategy for creating subsamples. Options:
    
    - ``'random'``: Independent random draws without replacement (most common)
    - ``'bootstrap'``: Random draws with replacement (useful for uncertainty quantification)
    - ``'disjoint'``: Non-overlapping partitions (best for temporal/spatial data)

**seed** : int, optional
    Random seed for reproducibility. Always set this for reproducible analyses.

Goodness-of-Fit Tests
~~~~~~~~~~~~~~~~~~~~~

**tests** : List[str], default=['ks']
    List of goodness-of-fit tests to perform. Options:
    
    - ``'ks'``: Kolmogorov-Smirnov test (p-value based)
    - ``'chi2'``: Chi-square test (p-value based)
    - ``'rmse'``: Root Mean Square Error (distance metric)
    
    **⭐ Recommendation**: Always include ``'rmse'`` for stability detection, as it shows clearer convergence behavior than p-value based tests.
    
    Example: ``tests=['ks', 'chi2', 'rmse']``

Stability Detection
~~~~~~~~~~~~~~~~~~~

**stability_method** : str, default='kneedle'
    Method for detecting when parameters and tests have stabilized. Options:
    
    - ``'kneedle'``: Kneedle algorithm for elbow detection (default, best for RMSE)
    - ``'cv'``: Coefficient of Variation method (robust for erratic metrics)
    - ``'plateau'``: Relative gain heuristic (early "good enough" detection)
    - ``'aggregate'``: Legacy median-based method (still supported)
    - ``None``: No stability detection
    
    **⭐ Default: 'kneedle'** - Recommended for RMSE stability detection as it excels at finding the "point of diminishing returns" in smooth, converging metrics.

    See `Stability Detection Methods`_ for detailed explanations and selection guidance.

**Method-Specific Parameters (via kwargs):**

    For **CV method**:
    
    - ``window_size``: Number of consecutive sizes for validation (default: 25% of n_sizes)
    - ``cv_threshold``: Maximum allowed coefficient of variation (default: 0.1)
    
    For **Kneedle method**:
    
    - ``smooth``: Whether to smooth curves before detection (default: True)
    - ``smoothing_method``: 'savgol' (default) or 'spline'
    
    For **Plateau method**:
    
    - ``consecutive_points``: Number of consecutive points below tolerance (default: 3)
    - ``relative_tolerance``: Maximum relative change threshold (default: 0.01, i.e., 1%)

Visualization
~~~~~~~~~~~~~

**fig_output_path** : str, optional
    Path where the summary figure should be saved. If provided, automatically generates a 2×3 grid showing parameter and test evolution.
    
    Example: ``'monte_carlo_results.png'``

**plot_type** : str, default='series'
    Style of plots to generate:
    
    - ``'series'``: Line plots with median and IQR shading
    - ``'boxplots'``: Box plots showing full distribution per size

Advanced Parameters
~~~~~~~~~~~~~~~~~~~

**distribution_params** : Tuple, optional
    If provided, uses these fixed parameters instead of fitting. Useful for testing how well the method recovers known parameters.

**kwargs**
    Additional keyword arguments passed to the goodness-of-fit tests (e.g., ``bins='doane'`` for chi-square test).

Return Value
------------

Returns an ``xarray.Dataset`` containing:

**Dimensions**

- ``sizes``: Sample sizes tested (length = ``n_sizes``)
- ``repeats``: Repetition index (length = ``n_repeats``)

**Data Variables**

- ``param_0, param_1, ...``: Fitted distribution parameters for each (size, repeat) combination
- ``ks_statistic, ks_pvalue``: Kolmogorov-Smirnov test results (if ``'ks'`` in tests)
- ``chi2_statistic, chi2_pvalue``: Chi-square test results (if ``'chi2'`` in tests)
- ``rmse``: Root mean square error values (if ``'rmse'`` in tests)

**Attributes**

- ``distribution``: Name of the fitted distribution
- ``original_data_size``: Size of the original dataset
- ``sampling_method``: Sampling strategy used
- ``bins_method``: Binning method used for chi-square test
- ``stability_points``: Dictionary with detected stability information
- ``recommended_size``: Sample size where primary metric stabilizes (None if not detected)
- ``primary_metric``: The metric used for recommended_size (typically 'rmse')
- ``stable_pvalue_ks``: KS test p-value at stability point (if 'ks' in tests)
- ``stable_pvalue_chi2``: Chi-square p-value at stability point (if 'chi2' in tests)
- ``stable_rmse``: RMSE value at stability point (if 'rmse' in tests)
- ``n_repeats``: Number of repetitions performed
- ``min_size``: Minimum sample size tested
- ``max_size``: Maximum sample size tested
- ``n_sizes``: Number of sample sizes tested
- ``figure_path``: Path to saved figure (if ``fig_output_path`` was provided)
- ``created_by``: Identifier for the method

Stability Detection Methods
----------------------------

MagicA provides three advanced methods for detecting when parameters and goodness-of-fit metrics have stabilized. Each method has specific strengths depending on the metric behavior.

CV Method (Coefficient of Variation)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**How it works:**

1. For each parameter/test, computes the **coefficient of variation** (CV = std/mean) across repeats at each sample size
2. Uses a **sliding window** (default: 25% of n_sizes) to check for stability
3. Checks if CV stays **below a threshold** (default: 0.1) for all sizes in the window
4. Returns the first size where this condition is met

**Mathematical formula:**

.. math::

    CV_n = \frac{\sigma_n}{\mu_n}

Where:
- :math:`\sigma_n` = standard deviation across repeats at size n
- :math:`\mu_n` = mean value across repeats at size n

**Advantages:**

- Robust to erratic metrics (especially p-values)
- Provides quantitative measure of variability
- Good for assessing precision of estimates
- Works well with any number of repeats

**Disadvantages:**

- May be conservative (detects stability later than other methods)
- Less sensitive to smooth convergence patterns

**Parameters:**

- ``window_size``: Number of consecutive sizes for validation (default: max(2, n_sizes // 4))
- ``cv_threshold``: Maximum allowed CV (default: 0.1)

**Result format:**

.. code-block:: python

    stability_points = {
        'rmse': {
            'size': 800,
            'index': 4,
            'cv_at_stability': 0.087,
            'smoothed_curve': None,
            'method': 'cv'
        }
    }

**When to use:** 
- When working with erratic p-values (KS, Chi-square)
- When you need quantitative variability assessment
- When conservative estimates are preferred

Kneedle Method (Elbow Detection)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**How it works:**

1. **Normalizes** the curve to [0, 1] range
2. Computes the **reference line** connecting first and last points
3. Calculates **perpendicular distance** from each point to the reference line
4. The **knee/elbow** is at the point with maximum distance
5. Optional: **smooths the curve** before detection using Savitzky-Golay filter or spline

**Mathematical formula:**

.. math::

    D_i = |y_i - (y_0 + (y_n - y_0) \cdot \frac{i}{n})|

Where:
- :math:`D_i` = distance from reference line at point i
- :math:`y_i` = normalized curve value at point i
- :math:`y_0, y_n` = first and last curve values

**Advantages:**

- **Optimal for RMSE**: Excels at finding "point of diminishing returns" in smooth curves
- Clear mathematical definition of "elbow"
- Automatic curve direction detection
- Smoothing reduces noise sensitivity
- Based on published algorithm (Satopää et al., 2011)

**Disadvantages:**

- Requires relatively smooth convergence
- May not work well with highly erratic metrics

**Parameters:**

- ``smooth``: Whether to smooth before detection (default: True, recommended)
- ``smoothing_method``: 'savgol' (default, preserves features) or 'spline'

**Result format:**

.. code-block:: python

    stability_points = {
        'rmse': {
            'size': 600,
            'index': 3,
            'cv_at_stability': None,
            'smoothed_curve': array([...]),  # If smoothing was used
            'method': 'kneedle'
        }
    }

**When to use:**
- **⭐ RECOMMENDED for RMSE stability detection**
- When metrics show smooth, monotonic convergence
- When you want to find the "sweet spot" of diminishing returns
- For production analyses requiring objective elbow detection

**Reference:**

Satopää, V., Albrecht, J., Irwin, D., & Raghavan, B. (2011). *Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior*. 2011 31st International Conference on Distributed Computing Systems Workshops, 166-171.

Plateau Method (Relative Gain Heuristic)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**How it works:**

1. Computes **relative change** between consecutive points:
   
   .. math::
   
       \\Delta_i = \\frac{|y_{i-1} - y_i|}{|y_{i-1}|}

2. Checks if relative change stays **below tolerance** for L consecutive points
3. Returns the first point where this "plateau" condition is met

**Mathematical formula:**

Stability detected when:

.. math::

    \Delta_i < \epsilon \text{ for } L \text{ consecutive points}

Where:
- :math:`\Delta_i` = relative change at point i
- :math:`\epsilon` = relative tolerance (default: 0.01, i.e., 1%)
- :math:`L` = consecutive points threshold (default: 3)

**Advantages:**

- Early detection of "good enough" stabilization
- Intuitive interpretation (% improvement)
- Fast - no smoothing required
- Good for early stopping in iterative analyses

**Disadvantages:**

- May detect premature plateaus
- Sensitive to parameter tuning
- Less robust to noise than other methods

**Parameters:**

- ``consecutive_points``: Number of consecutive points below tolerance (default: 3)
- ``relative_tolerance``: Maximum relative change threshold (default: 0.01)

**Result format:**

.. code-block:: python

    stability_points = {
        'rmse': {
            'size': 500,
            'index': 2,
            'cv_at_stability': None,
            'smoothed_curve': None,
            'method': 'plateau'
        }
    }

**When to use:**
- When computational budget is limited
- For early "good enough" detection
- When slight improvements beyond plateau are acceptable
- In iterative analyses where early stopping is beneficial

Method Selection Guide
~~~~~~~~~~~~~~~~~~~~~~

Choose the appropriate method based on your metric and goals:

+----------------+------------------+---------------------+-------------------------+
| Method         | Best For         | Strengths           | Typical Use Case        |
+================+==================+=====================+=========================+
| **Kneedle**    | RMSE, converging | Objective elbow     | **Production RMSE**     |
|                | parameters       | detection, smooth   | **analysis**            |
|                |                  | curves              | **(RECOMMENDED)**       |
+----------------+------------------+---------------------+-------------------------+
| **CV**         | P-values (KS,    | Robust to noise,    | P-value stability,      |
|                | Chi-square)      | quantifies          | conservative estimates  |
|                |                  | variability         |                         |
+----------------+------------------+---------------------+-------------------------+
| **Plateau**    | Early detection, | Fast, intuitive,    | Quick exploration,      |
|                | limited budget   | early stopping      | iterative tuning        |
+----------------+------------------+---------------------+-------------------------+
| **Aggregate**  | Legacy analyses  | Simple, robust      | Backward compatibility  |
|                |                  |                     |                         |
+----------------+------------------+---------------------+-------------------------+

**Recommended Workflow:**

.. code-block:: python

    # 1. Use Kneedle for RMSE (most reliable)
    results_rmse = processor.monte_carlo_fit(
        tests=['rmse'],
        stability_method='kneedle',
        smooth=True
    )
    
    # 2. Use CV for p-values (robust to noise)
    results_pvalues = processor.monte_carlo_fit(
        tests=['ks', 'chi2'],
        stability_method='cv',
        cv_threshold=0.15  # Can relax for p-values
    )
    
    # 3. Compare stability points
    rmse_stable = results_rmse.attrs['recommended_size']
    print(f"RMSE stable at n = {rmse_stable} (MOST RELIABLE)")

Interpreting Stability Points
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each stability point contains:

- ``size``: The sample size where stability is first detected (``None`` if not detected)
- ``index``: The index in the ``sizes`` list (``None`` if not detected)
- ``cv_at_stability``: Coefficient of variation at stability (only for ``'cv'`` method)
- ``smoothed_curve``: Smoothed curve data (only for ``'kneedle'`` method with smoothing)
- ``method``: Name of the detection method used

**Accessing stability information:**

.. code-block:: python

    # Get recommended size (based on primary metric, usually RMSE)
    recommended_n = results.attrs['recommended_size']
    primary_metric = results.attrs['primary_metric']
    
    if recommended_n:
        print(f"✓ {primary_metric.upper()} stable at n = {recommended_n}")
        
        # Get quality metrics at stable point
        if 'stable_rmse' in results.attrs:
            print(f"  RMSE at stable point: {results.attrs['stable_rmse']:.4f}")
        if 'stable_pvalue_ks' in results.attrs:
            print(f"  KS p-value at stable point: {results.attrs['stable_pvalue_ks']:.4f}")
    else:
        print("✗ No stability detected - try larger sample sizes")

**If stability is not detected:**

- The tested sample sizes are too small - increase ``max_size``
- The metric has high variability - increase ``n_repeats``
- The threshold is too strict - relax method-specific parameters
- The data distribution may not be appropriate

Recommended Practices
---------------------

Choosing Sample Sizes
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # For small to medium datasets (< 5000 samples)
    sizes = [50, 100, 200, 400, 600, 800]
    
    # For large datasets (> 10000 samples) - CPS method
    sizes = [100, 200, 400, 600, 800, 1000, 1500, 2000, 3000, 4000]

Using RMSE for Stability Detection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**⭐ Always include RMSE in your tests for stability detection:**

.. code-block:: python

    results = adjuster.monte_carlo_fit(
        sizes=[100, 200, 500, 1000, 2000],
        n_repeats=50,
        tests=['ks', 'chi2', 'rmse'],  # RMSE is crucial!
        sampling='random',
        seed=42
    )

**Why RMSE is recommended:**

1. **Monotonic decrease**: RMSE decreases smoothly as sample size increases
2. **Clear convergence**: Stabilization point is visually obvious
3. **Robust**: Less affected by sampling variability than p-values
4. **Direct measure**: Measures actual fit quality, not hypothesis test power
5. **No inflation**: Unlike p-values, doesn't become artificially small with large samples

P-values (KS, Chi-square) can be erratic and are subject to the "large sample size effect" where they become very small even for good fits when the sample is large.

Accessing Stability Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Get recommended size (based on primary metric, typically RMSE)
    recommended_n = results.attrs['recommended_size']
    primary_metric = results.attrs['primary_metric']
    
    if recommended_n is not None:
        print(f"✓ {primary_metric.upper()} stabilizes at n = {recommended_n}")
        print(f"  This is the recommended minimum sample size")
        
        # Get quality metrics at the stable point
        print(f"\nMetrics at stable point (n = {recommended_n}):")
        if 'stable_rmse' in results.attrs:
            print(f"  RMSE: {results.attrs['stable_rmse']:.4f}")
        if 'stable_pvalue_ks' in results.attrs:
            print(f"  KS p-value: {results.attrs['stable_pvalue_ks']:.4f}")
        if 'stable_pvalue_chi2' in results.attrs:
            print(f"  Chi-square p-value: {results.attrs['stable_pvalue_chi2']:.4f}")
    else:
        print("✗ No clear stability detected - try larger sample sizes or more repeats")
    
    # Access individual stability points for all metrics
    stability = results.attrs['stability_points']
    for metric, info in stability.items():
        size = info['size']
        method = info['method']
        if size:
            print(f"{metric}: stable at n = {size} (method: {method})")
            if info['cv_at_stability'] is not None:
                print(f"  CV at stability: {info['cv_at_stability']:.3f}")

Complete Example
----------------

.. code-block:: python

    import numpy as np
    import magica as ma
    
    # Load data
    data = np.random.weibull(2, 10000) * 8 + 2
    processor = ma.read_data(data)
    processor.fit_distribution('weibull_min')
    
    # Run comprehensive Monte Carlo analysis with Kneedle method
    results = processor.monte_carlo_fit(
        sizes=[100, 200, 400, 600, 800, 1000, 1500, 2000, 3000, 4000],
        n_repeats=50,
        tests=['ks', 'chi2', 'rmse'],  # Include RMSE!
        stability_method='kneedle',     # Use Kneedle for RMSE
        smooth=True,                    # Smooth curves before detection
        sampling='random',
        seed=42,
        fig_output_path='monte_carlo_analysis.png'
    )
    
    # Check recommended size (based on RMSE with Kneedle)
    recommended_n = results.attrs['recommended_size']
    primary_metric = results.attrs['primary_metric']
    
    print(f"⭐ {primary_metric.upper()} stabilizes at n = {recommended_n} (RECOMMENDED)")
    print(f"   RMSE at stable point: {results.attrs['stable_rmse']:.4f}")
    print(f"   KS p-value at stable point: {results.attrs['stable_pvalue_ks']:.4f}")
    
    # Check all stability points
    stability = results.attrs['stability_points']
    print("\nAll stability points:")
    for metric, info in stability.items():
        size = info['size']
        method = info['method']
        if size:
            print(f"   {metric}: n = {size} (method: {method})")
        else:
            print(f"   {metric}: no clear stability detected")
    
    # Use the recommended size for robust inference
    print(f"\nRecommended subsample size for production: {recommended_n}")

Sampling Strategies
-------------------

Random Sampling (default)
~~~~~~~~~~~~~~~~~~~~~~~~~~

Each subsample is drawn **independently without replacement** from the original data.

**When to use:**

- General purpose analysis
- Independent observations
- Standard stability analysis

**Example:**

.. code-block:: python

    results = processor.monte_carlo_fit(
        sampling='random',
        seed=42
    )

Bootstrap Sampling
~~~~~~~~~~~~~~~~~~

Each subsample is drawn **with replacement**, allowing the same observation to appear multiple times.

**When to use:**

- Uncertainty quantification
- Confidence interval estimation
- When subsample size > original data size

**Example:**

.. code-block:: python

    results = processor.monte_carlo_fit(
        sampling='bootstrap',
        seed=42
    )

Disjoint Sampling
~~~~~~~~~~~~~~~~~

Data is divided into **non-overlapping partitions**. Each observation is used exactly once per sample size.

**When to use:**

- Time series data
- Spatially correlated data
- When you want lower variance between repeats
- More representative of data structure

**Example:**

.. code-block:: python

    results = processor.monte_carlo_fit(
        sampling='disjoint',
        seed=42
    )

Large Sample Size Effect
-------------------------

For very large datasets (>10,000 samples), statistical tests become extremely powerful, causing:

- P-values become very small even for negligible deviations
- Difficulty distinguishing statistical from practical significance
- Over-rejection of perfectly adequate distributions

**Solution**: Use the Monte Carlo CPS (Coefficient/P-value/Sample size) method to find the optimal subsample size where:

1. Parameters have converged (stable estimates)
2. P-values haven't yet inflated (interpretable tests)
3. RMSE has stabilized (good fit quality)

See the :doc:`/tutorials/magic_adjuster_tutorial` for a complete demonstration.

See Also
--------

- :doc:`/tutorials/magic_adjuster_tutorial`: Complete tutorial with real data examples
- :doc:`core`: API reference for MagicAdjuster class
- :doc:`/tutorials/auto_fitter_tutorial`: Automatic distribution selection