Usage
This library can be used in two different ways. But in both ways, you have to perform these steps first:
-
Import the library and read the historical data as well as covariates datasets.
import pandas as pd
from tsf import TSF
df = pd.read_parquet("data/delivery_data.parquet")
weather_df = pd.read_csv("data/weather.csv") -
Make the TSF instance with the desired parameters.
tsf = TSF(data=df, date_col="date", target_col="quantity_delivered",
validation_starting_date="2023-01-01", validation_ending_date="2023-12-01",
forecast_starting_date="2024-01-01", forecast_ending_date="2024-12-01",
period="D", aggregate="M", agg=True, starting_date="2018-01-01", product_col="brand",
discontinue=True, check=[("2023-01-01", "2023-12-01")], covariates=[weather_df])
Step-by-step
This is if you want to do the steps of preprocessing, feature engineering and forecasting one step at a time.
This is useful if you want to do some custom preprocessing/feature engineering after the library has done those steps, or if you want the intermediate dataframes and not the final forecast.
-
Preprocessing:
df = tsf.preprocess()
-
Feature Engineering with appropriate parameters:
df = tsf.engineer(df=df, n_lags=12, order=2, fourier=6, seasonal_features=False, cyclic_features=True, time_features=True)
-
Forecasting with appropriate model:
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, StackingRegressor
from sklearn.linear_model import HuberRegressor
rf = RandomForestRegressor(random_state=42)
gb = GradientBoostingRegressor(random_state=42)
hr = HuberRegressor()
model = StackingRegressor(estimators=[("rf", rf), ("gb", gb)], final_estimator=hr, passthrough=False)
df, model, total_scores, product_scores = tsf.forecast(df=df, model=model)
In one go
Otherwise, if you want to directly perform the forecast, you can use this method with the appropriate parameters:
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, StackingRegressor
from sklearn.linear_model import HuberRegressor
rf = RandomForestRegressor(random_state=42)
gb = GradientBoostingRegressor(random_state=42)
hr = HuberRegressor()
model = StackingRegressor(estimators=[("rf", rf), ("gb", gb)], final_estimator=hr, passthrough=False)
df, model, total_scores, product_scores = tsf.tsf(model=model, n_lags=3, order=2, fourier=2, seasonal_features=True, cyclic_features=True, time_features=False)