Skip to content

OGModel

The main class for the Overfit-to-Generalization framework.

Class Definition

og_learn.framework.OGModel

Bases: BaseEstimator, RegressorMixin

Overfit-to-Generalization Model

Combines a high-variance (HV) model for pseudo-label generation with a low-variance (LV) model for generalization, using density-aware sampling.

The OG framework works by: 1. Training HV model (e.g., LightGBM) on original data 2. Each epoch of LV training: - Sample data with density-aware weighting (prioritize sparse regions) - Add noise to features (oscillation) - Generate pseudo-labels from HV model - Train LV model on pseudo-labels

Parameters

hv : str or object High-variance model. Can be: - str: 'lightgbm', 'xgboost', 'catboost' (uses preset config) - object: Custom model with fit/predict interface

str or object

Low-variance model. Can be: - str: 'mlp', 'resnet', 'transformer' (uses preset config) - object: Custom model with fit/predict interface

float, default=0.05

Noise injection level for pseudo-labels (controls regularization)

float, default=0.1

Weight for density-aware sampling (0=uniform, higher=more sparse-focused)

int, default=100

Number of training epochs for LV model

bool, default=True

Whether to use early stopping

int, default=5

Early stopping patience (epochs without improvement)

bool, default=True

Whether to print training progress

Examples

Preset OG

og = OGModel(hv='lightgbm', lv='mlp') og.fit(X_train, y_train, density=density_values) predictions = og.predict(X_test)

Custom OG

from lightgbm import LGBMRegressor custom_hv = LGBMRegressor(n_estimators=500, num_leaves=300) og = OGModel(hv=custom_hv, lv='mlp', oscillation=0.03) og.fit(X_train, y_train, density=density_values)

__init__(hv='lightgbm', lv='mlp', oscillation=0.05, sampling_alpha=0.1, epochs=100, early_stopping=False, patience=5, eval_every_epochs=5, verbose=True, seed=42, tensorboard_dir=None, tensorboard_name=None)

fit(X, y, density=None, X_valid=None, y_valid=None)

Fit the OG model.

Parameters

X : array-like of shape (n_samples, n_features) Training features

array-like of shape (n_samples,)

Training targets

array-like of shape (n_samples,), optional

Data density at each sample location. Higher values = denser areas. If None, uniform sampling is used (no density-aware weighting).

array-like, optional

Validation features for early stopping

array-like, optional

Validation targets for early stopping

Returns

self

predict(X)

Predict using the fitted LV model.

Parameters

X : array-like of shape (n_samples, n_features) Features to predict

Returns

y_pred : ndarray of shape (n_samples,) Predicted values


Constructor

OGModel(
    hv='lightgbm',
    lv='mlp',
    oscillation=0.05,
    sampling_alpha=0.1,
    epochs=100,
    early_stopping=True,
    patience=10,
    seed=42,
    verbose=True,
    tensorboard_dir=None,
    tensorboard_name=None,
    eval_every_epochs=10
)

Parameters

Parameter Type Default Description
hv str or model 'lightgbm' High-variance model. Can be a preset name or a model instance with fit()/predict() methods
lv str or model 'mlp' Low-variance model. Can be a preset name or a model instance
oscillation float 0.05 Noise injection strength for pseudo-label generation
sampling_alpha float 0.1 Exponent for density-aware sampling weights
epochs int 100 Number of training epochs for LV model
early_stopping bool True Whether to use early stopping
patience int 10 Early stopping patience (epochs without improvement)
seed int 42 Random seed for reproducibility
verbose bool True Whether to print training progress
tensorboard_dir str None Directory for TensorBoard logs
tensorboard_name str None Name for this run in TensorBoard
eval_every_epochs int 10 Frequency of evaluation/logging

Methods

fit

model.fit(X, y, density=None, X_valid=None, y_valid=None, epochs=None)

Train the OG model.

Parameters:

Parameter Type Description
X array-like Training features, shape (n_samples, n_features)
y array-like Training target, shape (n_samples,)
density array-like Spatial density for each sample, shape (n_samples,)
X_valid array-like Validation features (optional)
y_valid array-like Validation target (optional)
epochs int Override epochs from constructor

Returns: self

Example:

model = OGModel(hv='lightgbm', lv='mlp')
model.fit(
    X_train, y_train,
    density=density_train,
    X_valid=X_valid,
    y_valid=y_valid,
    epochs=100
)

predict

predictions = model.predict(X)

Make predictions.

Parameters:

Parameter Type Description
X array-like Features, shape (n_samples, n_features)

Returns: numpy.ndarray - Predictions, shape (n_samples,)

Example:

predictions = model.predict(X_test)

Attributes

Attribute Type Description
_hv_model object Fitted HV model instance
_lv_model object Fitted LV model instance
hv_name str Name of HV model
lv_name str Name of LV model

Complete Example

from og_learn import OGModel, calculate_density
from sklearn.metrics import r2_score
import numpy as np

# Prepare data
density = calculate_density(X_train[:, 0], X_train[:, 1])

# Create and train model
model = OGModel(
    hv='lightgbm',
    lv='resnet',
    oscillation=0.05,
    sampling_alpha=0.1,
    epochs=100,
    early_stopping=True,
    patience=15,
    seed=42,
    tensorboard_dir='runs/og_resnet'
)

model.fit(
    X_train, y_train,
    density=density,
    X_valid=X_valid,
    y_valid=y_valid
)

# Evaluate
predictions = model.predict(X_test)
print(f"Test R²: {r2_score(y_test, predictions):.4f}")