OGModel¶

The main class for the Overfit-to-Generalization framework.

Class Definition¶

`og_learn.framework.OGModel` ¶

Bases: BaseEstimator, RegressorMixin

Overfit-to-Generalization Model

Combines a high-variance (HV) model for pseudo-label generation with a low-variance (LV) model for generalization, using density-aware sampling.

The OG framework works by: 1. Training HV model (e.g., LightGBM) on original data 2. Each epoch of LV training: - Sample data with density-aware weighting (prioritize sparse regions) - Add noise to features (oscillation) - Generate pseudo-labels from HV model - Train LV model on pseudo-labels

Parameters¶

hv : str or object High-variance model. Can be: - str: 'lightgbm', 'xgboost', 'catboost' (uses preset config) - object: Custom model with fit/predict interface

str or object

Low-variance model. Can be: - str: 'mlp', 'resnet', 'transformer' (uses preset config) - object: Custom model with fit/predict interface

float, default=0.05

Noise injection level for pseudo-labels (controls regularization)

float, default=0.1

Weight for density-aware sampling (0=uniform, higher=more sparse-focused)

int, default=100

Number of training epochs for LV model

bool, default=True

Whether to use early stopping

int, default=5

Early stopping patience (epochs without improvement)

bool, default=True

Whether to print training progress

Examples¶

Preset OG¶

og = OGModel(hv='lightgbm', lv='mlp') og.fit(X_train, y_train, density=density_values) predictions = og.predict(X_test)

Custom OG¶

from lightgbm import LGBMRegressor custom_hv = LGBMRegressor(n_estimators=500, num_leaves=300) og = OGModel(hv=custom_hv, lv='mlp', oscillation=0.03) og.fit(X_train, y_train, density=density_values)

`init(hv='lightgbm', lv='mlp', oscillation=0.05, sampling_alpha=0.1, epochs=100, early_stopping=False, patience=5, eval_every_epochs=5, verbose=True, seed=42, tensorboard_dir=None, tensorboard_name=None)` ¶

`fit(X, y, density=None, X_valid=None, y_valid=None)` ¶

Fit the OG model.

Parameters¶

X : array-like of shape (n_samples, n_features) Training features

array-like of shape (n_samples,)

Training targets

array-like of shape (n_samples,), optional

Data density at each sample location. Higher values = denser areas. If None, uniform sampling is used (no density-aware weighting).

array-like, optional

Validation features for early stopping

array-like, optional

Validation targets for early stopping

Returns¶

self

`predict(X)` ¶

Predict using the fitted LV model.

Parameters¶

X : array-like of shape (n_samples, n_features) Features to predict

Returns¶

y_pred : ndarray of shape (n_samples,) Predicted values

Constructor¶

OGModel(
    hv='lightgbm',
    lv='mlp',
    oscillation=0.05,
    sampling_alpha=0.1,
    epochs=100,
    early_stopping=True,
    patience=10,
    seed=42,
    verbose=True,
    tensorboard_dir=None,
    tensorboard_name=None,
    eval_every_epochs=10
)

Parameters¶

Parameter	Type	Default	Description
`hv`	str or model	`'lightgbm'`	High-variance model. Can be a preset name or a model instance with `fit()`/`predict()` methods
`lv`	str or model	`'mlp'`	Low-variance model. Can be a preset name or a model instance
`oscillation`	float	`0.05`	Noise injection strength for pseudo-label generation
`sampling_alpha`	float	`0.1`	Exponent for density-aware sampling weights
`epochs`	int	`100`	Number of training epochs for LV model
`early_stopping`	bool	`True`	Whether to use early stopping
`patience`	int	`10`	Early stopping patience (epochs without improvement)
`seed`	int	`42`	Random seed for reproducibility
`verbose`	bool	`True`	Whether to print training progress
`tensorboard_dir`	str	`None`	Directory for TensorBoard logs
`tensorboard_name`	str	`None`	Name for this run in TensorBoard
`eval_every_epochs`	int	`10`	Frequency of evaluation/logging

Methods¶

fit¶

model.fit(X, y, density=None, X_valid=None, y_valid=None, epochs=None)

Train the OG model.

Parameters:

Parameter	Type	Description
`X`	array-like	Training features, shape (n_samples, n_features)
`y`	array-like	Training target, shape (n_samples,)
`density`	array-like	Spatial density for each sample, shape (n_samples,)
`X_valid`	array-like	Validation features (optional)
`y_valid`	array-like	Validation target (optional)
`epochs`	int	Override epochs from constructor

Returns: self

Example:

model = OGModel(hv='lightgbm', lv='mlp')
model.fit(
    X_train, y_train,
    density=density_train,
    X_valid=X_valid,
    y_valid=y_valid,
    epochs=100
)

predict¶

predictions = model.predict(X)

Make predictions.

Parameters:

Parameter	Type	Description
`X`	array-like	Features, shape (n_samples, n_features)

Returns: numpy.ndarray - Predictions, shape (n_samples,)

Example:

predictions = model.predict(X_test)

Attributes¶

Attribute	Type	Description
`_hv_model`	object	Fitted HV model instance
`_lv_model`	object	Fitted LV model instance
`hv_name`	str	Name of HV model
`lv_name`	str	Name of LV model

Complete Example¶

from og_learn import OGModel, calculate_density
from sklearn.metrics import r2_score
import numpy as np

# Prepare data
density = calculate_density(X_train[:, 0], X_train[:, 1])

# Create and train model
model = OGModel(
    hv='lightgbm',
    lv='resnet',
    oscillation=0.05,
    sampling_alpha=0.1,
    epochs=100,
    early_stopping=True,
    patience=15,
    seed=42,
    tensorboard_dir='runs/og_resnet'
)

model.fit(
    X_train, y_train,
    density=density,
    X_valid=X_valid,
    y_valid=y_valid
)

# Evaluate
predictions = model.predict(X_test)
print(f"Test R²: {r2_score(y_test, predictions):.4f}")

OGModel¶

Class Definition¶

og_learn.framework.OGModel ¶

Parameters¶

Examples¶

Preset OG¶

Custom OG¶

__init__(hv='lightgbm', lv='mlp', oscillation=0.05, sampling_alpha=0.1, epochs=100, early_stopping=False, patience=5, eval_every_epochs=5, verbose=True, seed=42, tensorboard_dir=None, tensorboard_name=None) ¶

fit(X, y, density=None, X_valid=None, y_valid=None) ¶

Parameters¶

Returns¶

predict(X) ¶

Parameters¶

Returns¶

Constructor¶

Parameters¶

Methods¶

fit¶

predict¶

Attributes¶

Complete Example¶

`og_learn.framework.OGModel` ¶

`init(hv='lightgbm', lv='mlp', oscillation=0.05, sampling_alpha=0.1, epochs=100, early_stopping=False, patience=5, eval_every_epochs=5, verbose=True, seed=42, tensorboard_dir=None, tensorboard_name=None)` ¶

`fit(X, y, density=None, X_valid=None, y_valid=None)` ¶

`predict(X)` ¶