Skip to content

Model Comparison

Compare different baseline methods for predicting spatial accuracy patterns.

Evaluation Scenarios

Scenario Split Method Question Answered
Unseen Spatial spatial How well predict accuracy at new locations?
Unseen Sampling sampling How well predict accuracy for new density levels?

Models Compared

Category Model Description
Traditional ML linear Linear Regression
svm Support Vector Regression (RBF)
lightgbm Gradient Boosting
GAM gam_monotonic Monotonic GAM with interaction
Interpolation interpolation IDW (Inverse Distance Weighting)
Two-Stage two_stage GAM (density) + SVM (spatial residual)

Usage

from geoequity.evaluation import eval_baseline_comparison

# Define models to compare
model_list = ['linear', 'lightgbm', 'svm', 'gam_monotonic', 'interpolation', 'two_stage']

# Scenario 1: Unseen Spatial
report_spatial = eval_baseline_comparison(
    df_analysis,
    model_list=model_list,
    density_bins=30,
    split_method='spatial',
    train_by='grid',
    evaluate_by='grid',
    metric='correlation',
    full_features='Spatial'
)

# Scenario 2: Unseen Sampling
report_sampling = eval_baseline_comparison(
    df_analysis,
    model_list=model_list,
    density_bins=30,
    split_method='sampling',
    train_by='grid',
    evaluate_by='sampling',
    metric='correlation',
    full_features='Spatial'
)

Visualization

import matplotlib.pyplot as plt
import numpy as np

def plot_comparison(report_spatial, report_sampling, model_list):
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    pred_name = list(report_spatial.keys())[0]
    spatial_scores = [report_spatial[pred_name].get(m, 0) for m in model_list]
    sampling_scores = [report_sampling[pred_name].get(m, 0) for m in model_list]

    x = np.arange(len(model_list))
    colors = ['#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7', '#DDA0DD', '#FF6B6B']

    axes[0].bar(x, spatial_scores, color=colors)
    axes[0].set_title('Unseen Spatial')
    axes[0].set_xticks(x)
    axes[0].set_xticklabels(model_list, rotation=45)

    axes[1].bar(x, sampling_scores, color=colors)
    axes[1].set_title('Unseen Sampling')
    axes[1].set_xticks(x)
    axes[1].set_xticklabels(model_list, rotation=45)

    plt.tight_layout()
    plt.show()

plot_comparison(report_spatial, report_sampling, model_list)

Key Insights

Unseen Spatial (New Locations)

  • Interpolation methods (IDW) leverage spatial autocorrelation
  • TwoStageModel captures both global density effect and local patterns
  • Traditional ML often struggles without spatial structure

Unseen Sampling (New Density Levels)

  • GAM excels at capturing density→accuracy relationship
  • TwoStageModel combines density modeling with spatial residuals
  • Linear/SVM fail to generalize to unseen density ranges

TwoStageModel Advantage

Decomposes the problem into interpretable components: - Stage 1: Global density effect (monotonic relationship) - Stage 2: Location-specific residuals (spatial patterns)

This decomposition provides: 1. Better generalization to new conditions 2. Interpretable insights about accuracy drivers 3. Separate modeling of different sources of variation