Core Module
===========

The core module contains the primary classes for statistical data processing and distribution fitting.

.. automodule:: magica.core
   :members:
   :undoc-members:
   :show-inheritance:

DataProcessor
-------------

The `DataProcessor` class handles data loading, validation, and provides access to distribution fitting capabilities.

.. autoclass:: magica.core.DataProcessor
   :members:
   :undoc-members:
   :show-inheritance:

Automatic Distribution Selection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DataProcessor provides the `get_auto_fitter()` method to create an AutoFitter instance for automatic distribution selection:

.. code-block:: python

    import numpy as np
    import magica as ma
    
    # Load data
    data = np.random.weibull(2, 1000)
    processor = ma.read_data(data)
    
    # Get AutoFitter for automatic distribution selection
    auto_fitter = processor.get_auto_fitter(criterion='rmse')
    
    # Find best distribution
    best = auto_fitter.fit_best_distribution()
    print(f"Best distribution: {best['distribution']}")

For complete AutoFitter documentation, see :doc:`auto_fitter`.

Extreme Value Analysis
~~~~~~~~~~~~~~~~~~~~~~~

DataProcessor also provides the `get_extremes_analyzer()` method to create an ExtremesAnalyzer instance for return period and return value analysis:

.. code-block:: python

    import pandas as pd
    import magica as ma
    
    # Create time series with datetime index
    dates = pd.date_range('1980-01-01', '2023-12-31', freq='D')
    wind_speeds = [...]  # your data
    series = pd.Series(wind_speeds, index=dates)
    
    # Load data and create extremes analyzer
    processor = ma.read_data(series)
    extremes = processor.get_extremes_analyzer(time_unit='years')
    
    # Fit GEV distribution and calculate return values
    extremes.fit_distribution('genextreme')
    rv_100 = extremes.return_value(100)  # 100-year return value
    print(f"100-year return value: {rv_100:.2f}")

For complete ExtremesAnalyzer documentation, see :doc:`extremes`.

MagicAdjuster
---------------

The `MagicAdjuster` class is the central component for statistical distribution fitting and Monte Carlo stability analysis.

.. autoclass:: magica.core.MagicAdjuster
   :members:
   :undoc-members:
   :show-inheritance:

Monte Carlo Fitting
~~~~~~~~~~~~~~~~~~~

The `monte_carlo_fit` method performs stability analysis to determine minimum sample sizes for reliable parameter estimation.

.. note::
   For complete documentation including all parameters, stability detection methods, and best practices, see :doc:`monte_carlo`.

**Quick Example:**

.. code-block:: python

    import numpy as np
    from magica.core import MagicAdjuster
    
    # Generate sample data
    data = np.random.weibull(2, 1000)
    
    # Create adjuster and fit distribution
    adjuster = MagicAdjuster(data)
    adjuster.fit_distribution('weibull_min')
    
    # Run Monte Carlo stability analysis
    results = adjuster.monte_carlo_fit(
        sizes=[100, 200, 500, 1000],
        n_repeats=30,
        tests=['ks', 'chi2', 'rmse'],  # Always include RMSE!
        sampling='random',
        seed=42,
        fig_output_path='stability.png'
    )
    
    # Check RMSE stability (most reliable indicator)
    rmse_size = results.attrs['stability_points']['rmse']['size']
    print(f"Recommended minimum sample size: {rmse_size}")

**Return Value:**
**Return Value:**

xarray.Dataset with:

- **Dimensions**: 
  - `sizes`: Sample sizes tested
  - `repeats`: Repetition index for each size

- **Data Variables**: 
  - `param_0, param_1, ...`: Fitted distribution parameters
  - `ks_statistic, ks_pvalue`: Kolmogorov-Smirnov test results (if requested)
  - `chi2_statistic, chi2_pvalue`: Chi-square test results (if requested)
  - `rmse`: Root mean square error values (if requested)

- **Attributes**: 
  - `distribution`: Distribution name
  - `original_data_size`: Size of original dataset
  - `sampling_method`: Sampling strategy used
  - `stability_points`: Detected stability points for each variable (dict with 'size', 'index', 'cv_at_stability')
  - `figure_path`: Path to saved figure (if generated)

.. tip::
   **Always include 'rmse' in your tests** - it provides the most reliable stability detection because it shows smooth, monotonic convergence unlike p-values which can be erratic.

For complete parameter documentation, stability methods, and best practices, see :doc:`monte_carlo`.

Goodness-of-Fit Testing
~~~~~~~~~~~~~~~~~~~~~~~

The class provides comprehensive goodness-of-fit testing with multiple methods:

**Chi-Square Test:**
Tests the hypothesis that data follows the fitted distribution using histogram-based comparison.

**Kolmogorov-Smirnov Test:**
Non-parametric test comparing empirical and theoretical cumulative distribution functions.

**Root Mean Square Error (RMSE):**
Measures the average deviation between empirical and theoretical distributions.

Utility Methods
~~~~~~~~~~~~~~~

**Binning Strategies:**

The class supports multiple binning strategies for histogram-based tests:

- `_calculate_sturges_bins()`: Sturges' rule (log-based)
- `_calculate_rice_bins()`: Rice rule (cube root)
- `_calculate_freedman_diaconis_bins()`: Freedman-Diaconis rule (IQR-based)
- `_calculate_scott_bins()`: Scott's rule (standard deviation-based)
- `_calculate_doane_bins()`: Doane's rule (skewness-adjusted)

**Subsampling:**

- `_generate_subsample_indices()`: Creates index lists for different sampling strategies

Sampling strategies
-------------------

For a didactic, longer discussion of sampling strategies (`random`, `bootstrap`,
and `disjoint`) and practical advice on when to use each, see the Monte Carlo
tutorial: :doc:`/tutorials/monte_carlo`.