Appearance
Column Types
The data warehouse uses a consistent set of BigQuery data types. These conventions ensure predictable behavior across all tables and prevent type-related errors in refresh scripts and views.
Type Conventions
| Category | BigQuery Type | Convention | Examples |
|---|---|---|---|
| Identifiers | STRING | All IDs are stored as STRING, even when numeric | CampaignID, AdID, CreativeID, AccountID |
| Rates / Percentages | FLOAT64 | All rates, percentages, and ratios use FLOAT64 | CTR, CPC, CPM, VideoViewsrate, ConversionRate |
| Counts | INT64 | Whole-number metrics use INT64 | Impressions, Clicks, VideoViews, Leads |
| Currency / Monetary | FLOAT64 | Cost and value columns use FLOAT64 | Cost, ConversionValue, CostPerConversion |
| Dates | DATE | Reporting dates use DATE (not DATETIME or TIMESTAMP) | Date |
| Timestamps | TIMESTAMP | Audit and log timestamps use TIMESTAMP | _ingested_at, run_timestamp |
| Text | STRING | Names, URLs, and descriptive fields use STRING | CampaignName, ImageURL, HeadlineText |
Rationale
STRING for All IDs
Even though most platform IDs are numeric, they are stored as STRING for several reasons:
- Leading zeros: Some campaign codes (e.g.,
0005547) have meaningful leading zeros that would be lost with numeric types - Platform consistency: LinkedIn uses alphanumeric creative IDs in some cases
- Join safety: Comparing STRING to STRING avoids implicit type casting in JOIN conditions
- Future-proofing: Platforms may change their ID formats without warning
FLOAT64 for Rates and Percentages
Rates and percentages are decimal values that require floating-point precision:
- A
CTRof0.0234(2.34%) cannot be represented as an integer VideoViewsratewas originally typed asINTEGER, which caused data loss. It was corrected toFLOAT64on 2026-02-08.- Using
FLOAT64consistently avoids implicit casting issues when computing derived metrics
INT64 for Counts
Count metrics are always whole numbers:
Impressions,Clicks,VideoViewsare always integers from the platform APIs- Using
INT64provides precise arithmetic without floating-point rounding issues - BigQuery's
SUM()onINT64returnsINT64, preserving precision
DATE for Reporting Dates
The Date column uses the DATE type (not DATETIME or TIMESTAMP) because:
- Advertising data is aggregated at the daily level
- There is no time-of-day component to the data
DATEpartitioning is more efficient thanTIMESTAMPpartitioning for daily data- Date comparisons are simpler without timezone considerations
TIMESTAMP for Audit Fields
The _ingested_at and run_timestamp columns use TIMESTAMP because:
- They record the exact moment of ingestion, including time of day
- They use UTC timezone implicitly
- They enable precise tracking of when data was last refreshed
NULL Handling
All columns should handle NULL values gracefully:
- Use
COALESCE(column, default_value)when a default is needed - Use
NULLIF(column, '')to convert empty strings to NULL - Use
SAFE_DIVIDE(a, b)instead ofa / bto avoid division-by-zero errors - UDFs return sensible defaults (e.g.,
"Unknown","None","Other") instead of NULL
Type Change Procedure
If a column's type needs to be corrected, follow the Schema Changes workflow. The key steps are:
ALTER TABLE ... ALTER COLUMN ... SET DATA TYPE {new_type}on the staging tableALTER TABLE ... ALTER COLUMN ... SET DATA TYPE {new_type}on the final table- Verify with
INFORMATION_SCHEMA.COLUMNS - Test ingestion with the new type