Skip to content

Column Types

The data warehouse uses a consistent set of BigQuery data types. These conventions ensure predictable behavior across all tables and prevent type-related errors in refresh scripts and views.

Type Conventions

CategoryBigQuery TypeConventionExamples
IdentifiersSTRINGAll IDs are stored as STRING, even when numericCampaignID, AdID, CreativeID, AccountID
Rates / PercentagesFLOAT64All rates, percentages, and ratios use FLOAT64CTR, CPC, CPM, VideoViewsrate, ConversionRate
CountsINT64Whole-number metrics use INT64Impressions, Clicks, VideoViews, Leads
Currency / MonetaryFLOAT64Cost and value columns use FLOAT64Cost, ConversionValue, CostPerConversion
DatesDATEReporting dates use DATE (not DATETIME or TIMESTAMP)Date
TimestampsTIMESTAMPAudit and log timestamps use TIMESTAMP_ingested_at, run_timestamp
TextSTRINGNames, URLs, and descriptive fields use STRINGCampaignName, ImageURL, HeadlineText

Rationale

STRING for All IDs

Even though most platform IDs are numeric, they are stored as STRING for several reasons:

  • Leading zeros: Some campaign codes (e.g., 0005547) have meaningful leading zeros that would be lost with numeric types
  • Platform consistency: LinkedIn uses alphanumeric creative IDs in some cases
  • Join safety: Comparing STRING to STRING avoids implicit type casting in JOIN conditions
  • Future-proofing: Platforms may change their ID formats without warning

FLOAT64 for Rates and Percentages

Rates and percentages are decimal values that require floating-point precision:

  • A CTR of 0.0234 (2.34%) cannot be represented as an integer
  • VideoViewsrate was originally typed as INTEGER, which caused data loss. It was corrected to FLOAT64 on 2026-02-08.
  • Using FLOAT64 consistently avoids implicit casting issues when computing derived metrics

INT64 for Counts

Count metrics are always whole numbers:

  • Impressions, Clicks, VideoViews are always integers from the platform APIs
  • Using INT64 provides precise arithmetic without floating-point rounding issues
  • BigQuery's SUM() on INT64 returns INT64, preserving precision

DATE for Reporting Dates

The Date column uses the DATE type (not DATETIME or TIMESTAMP) because:

  • Advertising data is aggregated at the daily level
  • There is no time-of-day component to the data
  • DATE partitioning is more efficient than TIMESTAMP partitioning for daily data
  • Date comparisons are simpler without timezone considerations

TIMESTAMP for Audit Fields

The _ingested_at and run_timestamp columns use TIMESTAMP because:

  • They record the exact moment of ingestion, including time of day
  • They use UTC timezone implicitly
  • They enable precise tracking of when data was last refreshed

NULL Handling

All columns should handle NULL values gracefully:

  • Use COALESCE(column, default_value) when a default is needed
  • Use NULLIF(column, '') to convert empty strings to NULL
  • Use SAFE_DIVIDE(a, b) instead of a / b to avoid division-by-zero errors
  • UDFs return sensible defaults (e.g., "Unknown", "None", "Other") instead of NULL

Type Change Procedure

If a column's type needs to be corrected, follow the Schema Changes workflow. The key steps are:

  1. ALTER TABLE ... ALTER COLUMN ... SET DATA TYPE {new_type} on the staging table
  2. ALTER TABLE ... ALTER COLUMN ... SET DATA TYPE {new_type} on the final table
  3. Verify with INFORMATION_SCHEMA.COLUMNS
  4. Test ingestion with the new type

SUSE Paid Advertising Data Warehouse V2