Skip to content

Feature Engineering API

Utilities for spatiotemporal feature engineering.


simple_feature_engineering

from og_learn.feature import simple_feature_engineering

df_processed, feature_cols = simple_feature_engineering(
    df,
    feature_cols,
    time_col='time',
    k=3,
    standardize=True,
    add_temporal=True
)

Apply complete feature engineering pipeline.

Parameters:

Parameter Type Default Description
df DataFrame - Input data
feature_cols list - Original feature column names
time_col str 'time' Name of time column
k int 3 Number of spatial harmonics
standardize bool True Apply StandardScaler
add_temporal bool True Add temporal features

Returns: - df_processed (DataFrame): Processed data - feature_cols (list): Updated feature column names


compute_spatial_harmonics

from og_learn.feature import compute_spatial_harmonics

df = compute_spatial_harmonics(df, k=3, lon_col='longitude', lat_col='latitude')

Add spatial harmonic features.

Parameters:

Parameter Type Default Description
df DataFrame - Input data
k int 3 Number of harmonics
lon_col str 'longitude' Longitude column name
lat_col str 'latitude' Latitude column name

Returns: DataFrame with added harmonic columns

Created columns: - lon_sin_1, lon_cos_1, ..., lon_sin_k, lon_cos_k - lat_sin_1, lat_cos_1, ..., lat_sin_k, lat_cos_k


compute_temporal_features

from og_learn.feature import compute_temporal_features

df = compute_temporal_features(df, time_col='time')

Add temporal features from datetime column.

Parameters:

Parameter Type Default Description
df DataFrame - Input data
time_col str 'time' Time column name (datetime)

Returns: DataFrame with added temporal columns

Created columns: - time_month: Month (1-12), normalized to [0, 1] - time_hour: Hour (0-23), normalized to [0, 1] - time_day_of_month: Day (1-31), normalized to [0, 1]


Example

import pandas as pd
from og_learn.feature import (
    compute_spatial_harmonics,
    compute_temporal_features,
    simple_feature_engineering
)

# Load data
df = pd.read_parquet('data.parquet')

# Option 1: Step-by-step
df = compute_temporal_features(df, time_col='time')
df = compute_spatial_harmonics(df, k=3)

# Option 2: Complete pipeline
df, feature_cols = simple_feature_engineering(
    df,
    feature_cols=['longitude', 'latitude', 'temp', 'pressure'],
    k=3,
    standardize=True
)