Appearance
Dimension Normalization Rules
This page documents all known naming inconsistencies in the three campaign dimension columns (ProductLine, Product, Campaign) and the normalization rules applied by UDFs to produce consistent output. It serves as the single source of truth for both the engineering team and the customer/agency team.
Living Document
This page must be updated whenever a new normalization rule is added to a UDF. Anyone modifying get_product_line, get_product, or get_subproduct must update this page.
Impact Summary
Analysis conducted 2026-02-09 across all 6 platforms (~837K rows, $5.56M total spend). UDF fixes deployed 2026-02-09 — Unknown spend reduced from ~$316K to ~$81K (74% reduction). Catalog combinations reduced from 52 to 44.
| Category | Issues | Rows Affected | Spend Affected | Status |
|---|---|---|---|---|
Campaign name normalization (get_subproduct) | 4 | ~57,000 | $436K | ✅ Deployed |
Missing product keywords (get_product) | 6 | ~8,000 | $232K | ✅ Deployed |
| Regex tolerance (quarter/whitespace) | 4 of 5 | ~18,000 | $253K | ✅ 4/5 Deployed |
| Legacy naming conventions (no pipes) | ~77 campaigns | ~3,000 | $68K | Pending |
| Business decisions needed | 4 | ~20,000 | $270K | Awaiting customer input |
Why Normalization Is Needed
Campaign names in ad platforms cannot be renamed retroactively. Over time, naming convention drift produces variants like:
- Case differences: "Always On" vs "Always on"
- Abbreviation vs full name: "VW ABM" vs "Volkswagen ABM"
- Redundant prefixes: "MLS Umbrella" vs "Umbrella"
- Missing keywords: Products like "CentOS", "SAP", "Horizontal" not in UDF allowlists
- Typos: "Kubenetes" instead of "Kubernetes"
- Format bugs: Missing spaces (
AI|instead ofAI |), compound quarters (Q2+3)
The UDFs (get_product_line, get_product, get_subproduct) handle normalization at query time, so historical data is automatically corrected without backfilling.
Normalization Rules by UDF
get_subproduct — Campaign Name Normalization
Deployed 2026-02-09
All rules in this section are now live. get_subproduct has a CASE LOWER(TRIM(raw)) normalization layer matching the pattern used by get_product_line and get_product.
| # | Raw Value(s) | Normalized To | Product Lines | Platforms | Rows | Spend | Reason | Status |
|---|---|---|---|---|---|---|---|---|
| C-001 | Always on | Always On | Edge, Linux, AI | LinkedIn, Reddit | 15,825 | $156K | Case inconsistency — FY24 Q1 vs FY25+ | ✅ |
| C-002 | VW ABM | Volkswagen ABM | Linux | 6sense | 1,333 | $273 | Abbreviation — standardize to full name | ✅ |
| C-003 | MLS Umbrella | Umbrella | Linux | 6sense, LinkedIn, Reddit | 25,727 | $117K | Redundant "MLS" prefix — Product already says MLS. Combined with existing "Umbrella" rows (15,400 rows, $211K on Google/Microsoft), total impact is 41,127 rows / $328K | ✅ |
| C-004 | AI (when Campaign=None due to AI| bug) | AI | AI | 18,466 | $199K | Missing space before pipe in Google SRCH campaigns | ✅ |
get_product — Product Name Normalization
These are missing keywords in the product regex allowlist that cause Product = "Unknown" (and cascading Campaign = "None"). All deployed 2026-02-09.
| # | Raw Value in Campaign Name | Normalized To | Product Line | Platforms | Rows | Spend | Reason | Status |
|---|---|---|---|---|---|---|---|---|
| P-001 | ECM + Security | ECM | Cloud Native | 5,311 | $96K | Compound token — sub-variant of ECM | ✅ | |
| P-002 | Kubenetes | Kubernetes | Cloud Native | Google, LinkedIn | 261 | $18K | Typo — missing "r" | ✅ |
| P-003 | Horizontal | Edge | Edge | Google, LinkedIn | 2,843 | $45K | Edge go-to-market motion | ✅ |
| P-004 | CentOS | MLS | Linux | 498 | $14K | CentOS EOL migration → MLS | ✅ | |
| P-005 | SAP | SLES | Linux | Google, LinkedIn | 1,444 | $27K | SAP workloads → SLES | ✅ |
| P-006 | Kubernetes in get_subproduct | (add to regex) | Cloud Native | 50 | $4K | Missing from subproduct allowlist branches | ✅ |
get_product_line + get_product — Regex Tolerance Fixes
These campaigns follow the standard pipe-delimited convention but have minor formatting issues that break the regex. R-001 through R-004 deployed 2026-02-09.
| # | Issue | Example | Platforms | Rows | Spend | Fix | Status |
|---|---|---|---|---|---|---|---|
| R-001 | Q2+3 compound quarter | FY25 | Q2+3 | EDGE | Telco | ~1,000 | $32K | Change Q\d+ to Q[\d+]+ in regex | ✅ | |
| R-002 | Q1| / Q2| missing space | FY26 | Q1| Cloud Native | LinkedIn, 6sense | ~500 | $5K | Change \s+| to \s*| after quarter | ✅ |
| R-003 | FY26| missing space | SUSE | DISP | 6S | FY26| Q1 | 6sense | ~200 | $1K | Change \s+| to \s*| after FY | ✅ |
| R-004 | YTB channel not handled | SUSE | YTB | Google | ... | 130 | $4K | Add YTB.*Google branch to UDFs | ✅ | |
| R-005 | Old 6-segment format | EDGE | Horizontal | Always on (no Product slot) | Google, LinkedIn | 2,843 | $49K | Fallback regex for FY24 Q1 naming | Pending |
get_product_line — Product Line Normalization
No normalization rules needed — get_product_line already has a proper CASE UPPER(TRIM(raw)) block and handles all current product line values correctly.
Business Decisions Needed
These issues cannot be resolved by UDF rules alone — they require input from the paid media team or agency.
| # | Issue | Current State | Options | Spend | Priority |
|---|---|---|---|---|---|
| B-001 | Digital Sovereignty under ProductLine = All | All / All / Digital Sovereignty | (a) Keep as-is (cross-product initiative) or (b) Promote to its own ProductLine | $253K | HIGH |
| B-002 | Edge/Edge/Telco vs Edge/Telco split | Both exist — different naming eras | (a) Telco always a Product or (b) Telco always a SubProduct | $135K | MEDIUM |
| B-003 | Rancher AWS under ECM product | Cloud Native / ECM / Rancher AWS | Should it be Cloud Native / Rancher / Rancher AWS? | $5K | LOW |
| B-004 | KubeCon event campaigns | No product classification | Likely Cloud Native / Kubernetes / KubeCon | $9K | LOW |
Unknown Dimension Mapping
Category A: Fixable by UDF Rule (No Customer Input Needed)
Covered by the normalization rules tables above. Total recoverable: $248K (78% of Unknown spend).
Category B: Legacy Naming Convention (Customer Input Helpful)
~77 campaigns using pre-2025 underscore-delimited format. Not parseable by current pipe-delimited UDFs.
| Format | Example | Campaigns | Spend | Suggested Approach |
|---|---|---|---|---|
| Numeric ID prefix | 0005731_OA_LinkedIn_Carousel_IDC_SLES_&_SUMA_White_Paper | 24 | $25K | Keyword-based fallback regex (medium risk) |
| FY-prefixed legacy | FY23Q4_GDG_EL_LL_PDLX_OA_Linux_Distribution_EOL_Google_... | 20 | $39K | Map FY-prefix codes: EL→Linux, ECM→Cloud Native, EDG→Edge |
| Telco/Edge content | 0003489_Edge_Telco_Gorilla_Guide_Google_NA_TIER1_TOFU_MOFU | 14 | $7K | Keyword match on "Edge"/"Telco" |
| Total Category B | ~58 | $71K |
Category C: Genuinely Ambiguous (Customer Decision Required)
| Campaign Name | Platform | Spend | Ambiguity |
|---|---|---|---|
KubeCon 24 - Consideration - Web Visits - * | $9.4K | Event campaign — which product? | |
Combined_Assets_*_TIER* | $1.9K | Completely generic — no product indicators | |
0005991_Cloud_Native_Edge_Essentials_Ebook | $1.7K | Spans Cloud Native AND Edge | |
0 (literal zero) | $742 | Placeholder/test campaign | |
| Total Category C | ~$17K |
Data Quality Observations (Non-Breaking)
These issues do not affect UDF output today but represent source data inconsistencies:
| Issue | Example | Rows | Platforms | Risk |
|---|---|---|---|---|
| Double-space before pipe | ECM | (two spaces) | 17,840 | Handled by \s+ — would break if changed to \s | |
| Double-space after quarter | Q4 | (two spaces) | ~130 | Google, StackAdapt | Same — handled by \s+ |
Double-space after SUSE | | SUSE | PSOC (two spaces) | ~300 | Handled by .* patterns |
How to Add a New Rule
Identify the inconsistency in the catalog view:
sqlSELECT * FROM `paidteam-data-warehouse.customer_ads_suse.campaign_dimension_catalog_v2`;Determine which UDF produces the inconsistent value (
get_product_line,get_product, orget_subproduct)Add the normalization rule to the UDF SQL. All three dimension UDFs use a 2-layer pattern: extraction (inner subquery) then normalization (outer CASE):
sql-- get_subproduct: add to the CASE LOWER(TRIM(raw)) block WHEN 'new variant' THEN 'Canonical Name' -- get_product: add to the CASE UPPER(TRIM(raw)) block WHEN "NEW_VARIANT" THEN "Canonical" -- If the raw value isn't being extracted at all, also add the -- token to the regex allowlist in each platform branchUpdate this page with the new rule in the appropriate table above
Redeploy the UDF to BigQuery and verify via the catalog view
Update the changelog (
/changelog/)
Cross-Reference
- Catalog view:
campaign_dimension_catalog_v2— see all current dimension combinations - UDF source files:
sql/customer_ads_suse/routines/get_product_line.sql,get_product.sql,get_subproduct.sql - Dev-docs:
dev-docs/campaign-dimension-mapping/normalization-rules.md— technical implementation notes - Project instructions: Reference this page as mandatory reading before UDF changes