Data
The datasets consisted of Google ads performance, Microsoft ads performance and Meta ads performance data, with each dataframe containing features for date, campaign type (each channel could have multiple campaigns), cost of marketing, number of impressions generated, number of clicks and conversions.
To begin with, the provided datasets for Google Ads, Microsoft Ads, and Meta Ads are concatenated.
The Google, Microsoft and Meta datasets have multiple occurrences for some date-campaign pairs so they are aggregated on Dates and Campaign Types after adding the Campaign types column to dataframes which don’t have them (in this case Meta).
def aggregate(df, name):
df["Date"] = pd.to_datetime(df["Date"])
if "Campaign type" not in df.columns:
df["Campaign type"] = name
df = df.groupby(["Campaign type", "Date"]).aggregate('sum').reset_index()
df["Platform"] = name
return df
Next, it is found that some datasets may have missing dates for some Campaign types (Google and Microsoft in this case) so these dates are imputed with value 0 in the numerical columns.
def fill_missing_dates(df):
updated = pd.DataFrame()
for campaign in df["Campaign type"].unique():
temp = df[df["Campaign type"] == campaign].set_index("Date")
complete = pd.DataFrame(index=pd.date_range(temp.index.min(), temp.index.max(), freq='D')).reset_index().rename(columns={'index': 'Date'})
merged = pd.merge(complete, temp, how='left', on='Date').fillna(0)
merged["Campaign type"] = campaign
updated = pd.concat([updated, merged]).reset_index(drop=True)
return updated
These preprocessed dataframes are then joined into a single dataframe called ads.
def preprocess(df, name):
return fill_missing_dates(aggregate(df, name))
google = preprocess(google, "Google")
microsoft = preprocess(microsoft, "Microsoft")
meta = preprocess(meta, "Meta")
ads = pd.concat([google, microsoft, meta])[["Date", "Platform", "Campaign type", "Cost", "Impressions", "Clicks", "Conversions"]].set_index("Date")