Skip to content

Dimension Normalization Rules

This page documents all known naming inconsistencies in the three campaign dimension columns (ProductLine, Product, Campaign) and the normalization rules applied by UDFs to produce consistent output. It serves as the single source of truth for both the engineering team and the customer/agency team.

Living Document

This page must be updated whenever a new normalization rule is added to a UDF. Anyone modifying get_product_line, get_product, or get_subproduct must update this page.

Impact Summary

Analysis conducted 2026-02-09 across all 6 platforms (~837K rows, $5.56M total spend). UDF fixes deployed 2026-02-09 — Unknown spend reduced from ~$316K to ~$81K (74% reduction). Catalog combinations reduced from 52 to 44.

CategoryIssuesRows AffectedSpend AffectedStatus
Campaign name normalization (get_subproduct)4~57,000$436K✅ Deployed
Missing product keywords (get_product)6~8,000$232K✅ Deployed
Regex tolerance (quarter/whitespace)4 of 5~18,000$253K✅ 4/5 Deployed
Legacy naming conventions (no pipes)~77 campaigns~3,000$68KPending
Business decisions needed4~20,000$270KAwaiting customer input

Why Normalization Is Needed

Campaign names in ad platforms cannot be renamed retroactively. Over time, naming convention drift produces variants like:

  • Case differences: "Always On" vs "Always on"
  • Abbreviation vs full name: "VW ABM" vs "Volkswagen ABM"
  • Redundant prefixes: "MLS Umbrella" vs "Umbrella"
  • Missing keywords: Products like "CentOS", "SAP", "Horizontal" not in UDF allowlists
  • Typos: "Kubenetes" instead of "Kubernetes"
  • Format bugs: Missing spaces (AI| instead of AI |), compound quarters (Q2+3)

The UDFs (get_product_line, get_product, get_subproduct) handle normalization at query time, so historical data is automatically corrected without backfilling.


Normalization Rules by UDF

get_subproduct — Campaign Name Normalization

Deployed 2026-02-09

All rules in this section are now live. get_subproduct has a CASE LOWER(TRIM(raw)) normalization layer matching the pattern used by get_product_line and get_product.

#Raw Value(s)Normalized ToProduct LinesPlatformsRowsSpendReasonStatus
C-001Always onAlways OnEdge, Linux, AILinkedIn, Reddit15,825$156KCase inconsistency — FY24 Q1 vs FY25+
C-002VW ABMVolkswagen ABMLinux6sense1,333$273Abbreviation — standardize to full name
C-003MLS UmbrellaUmbrellaLinux6sense, LinkedIn, Reddit25,727$117KRedundant "MLS" prefix — Product already says MLS. Combined with existing "Umbrella" rows (15,400 rows, $211K on Google/Microsoft), total impact is 41,127 rows / $328K
C-004AI (when Campaign=None due to AI| bug)AIAIGoogle18,466$199KMissing space before pipe in Google SRCH campaigns

get_product — Product Name Normalization

These are missing keywords in the product regex allowlist that cause Product = "Unknown" (and cascading Campaign = "None"). All deployed 2026-02-09.

#Raw Value in Campaign NameNormalized ToProduct LinePlatformsRowsSpendReasonStatus
P-001ECM + SecurityECMCloud NativeLinkedIn5,311$96KCompound token — sub-variant of ECM
P-002KubenetesKubernetesCloud NativeGoogle, LinkedIn261$18KTypo — missing "r"
P-003HorizontalEdgeEdgeGoogle, LinkedIn2,843$45KEdge go-to-market motion
P-004CentOSMLSLinuxGoogle498$14KCentOS EOL migration → MLS
P-005SAPSLESLinuxGoogle, LinkedIn1,444$27KSAP workloads → SLES
P-006Kubernetes in get_subproduct(add to regex)Cloud NativeGoogle50$4KMissing from subproduct allowlist branches

get_product_line + get_product — Regex Tolerance Fixes

These campaigns follow the standard pipe-delimited convention but have minor formatting issues that break the regex. R-001 through R-004 deployed 2026-02-09.

#IssueExamplePlatformsRowsSpendFixStatus
R-001Q2+3 compound quarterFY25 | Q2+3 | EDGE | TelcoLinkedIn~1,000$32KChange Q\d+ to Q[\d+]+ in regex
R-002Q1| / Q2| missing spaceFY26 | Q1| Cloud NativeLinkedIn, 6sense~500$5KChange \s+| to \s*| after quarter
R-003FY26| missing spaceSUSE | DISP | 6S | FY26| Q16sense~200$1KChange \s+| to \s*| after FY
R-004YTB channel not handledSUSE | YTB | Google | ...Google130$4KAdd YTB.*Google branch to UDFs
R-005Old 6-segment formatEDGE | Horizontal | Always on (no Product slot)Google, LinkedIn2,843$49KFallback regex for FY24 Q1 namingPending

get_product_line — Product Line Normalization

No normalization rules needed — get_product_line already has a proper CASE UPPER(TRIM(raw)) block and handles all current product line values correctly.


Business Decisions Needed

These issues cannot be resolved by UDF rules alone — they require input from the paid media team or agency.

#IssueCurrent StateOptionsSpendPriority
B-001Digital Sovereignty under ProductLine = AllAll / All / Digital Sovereignty(a) Keep as-is (cross-product initiative) or (b) Promote to its own ProductLine$253KHIGH
B-002Edge/Edge/Telco vs Edge/Telco splitBoth exist — different naming eras(a) Telco always a Product or (b) Telco always a SubProduct$135KMEDIUM
B-003Rancher AWS under ECM productCloud Native / ECM / Rancher AWSShould it be Cloud Native / Rancher / Rancher AWS?$5KLOW
B-004KubeCon event campaignsNo product classificationLikely Cloud Native / Kubernetes / KubeCon$9KLOW

Unknown Dimension Mapping

Category A: Fixable by UDF Rule (No Customer Input Needed)

Covered by the normalization rules tables above. Total recoverable: $248K (78% of Unknown spend).

Category B: Legacy Naming Convention (Customer Input Helpful)

~77 campaigns using pre-2025 underscore-delimited format. Not parseable by current pipe-delimited UDFs.

FormatExampleCampaignsSpendSuggested Approach
Numeric ID prefix0005731_OA_LinkedIn_Carousel_IDC_SLES_&_SUMA_White_Paper24$25KKeyword-based fallback regex (medium risk)
FY-prefixed legacyFY23Q4_GDG_EL_LL_PDLX_OA_Linux_Distribution_EOL_Google_...20$39KMap FY-prefix codes: EL→Linux, ECM→Cloud Native, EDG→Edge
Telco/Edge content0003489_Edge_Telco_Gorilla_Guide_Google_NA_TIER1_TOFU_MOFU14$7KKeyword match on "Edge"/"Telco"
Total Category B~58$71K

Category C: Genuinely Ambiguous (Customer Decision Required)

Campaign NamePlatformSpendAmbiguity
KubeCon 24 - Consideration - Web Visits - *LinkedIn$9.4KEvent campaign — which product?
Combined_Assets_*_TIER*LinkedIn$1.9KCompletely generic — no product indicators
0005991_Cloud_Native_Edge_Essentials_EbookLinkedIn$1.7KSpans Cloud Native AND Edge
0 (literal zero)LinkedIn$742Placeholder/test campaign
Total Category C~$17K

Data Quality Observations (Non-Breaking)

These issues do not affect UDF output today but represent source data inconsistencies:

IssueExampleRowsPlatformsRisk
Double-space before pipeECM | (two spaces)17,840LinkedInHandled by \s+ — would break if changed to \s
Double-space after quarterQ4 | (two spaces)~130Google, StackAdaptSame — handled by \s+
Double-space after SUSE |SUSE | PSOC (two spaces)~300LinkedInHandled by .* patterns

How to Add a New Rule

  1. Identify the inconsistency in the catalog view:

    sql
    SELECT * FROM `paidteam-data-warehouse.customer_ads_suse.campaign_dimension_catalog_v2`;
  2. Determine which UDF produces the inconsistent value (get_product_line, get_product, or get_subproduct)

  3. Add the normalization rule to the UDF SQL. All three dimension UDFs use a 2-layer pattern: extraction (inner subquery) then normalization (outer CASE):

    sql
    -- get_subproduct: add to the CASE LOWER(TRIM(raw)) block
    WHEN 'new variant' THEN 'Canonical Name'
    
    -- get_product: add to the CASE UPPER(TRIM(raw)) block
    WHEN "NEW_VARIANT" THEN "Canonical"
    
    -- If the raw value isn't being extracted at all, also add the
    -- token to the regex allowlist in each platform branch
  4. Update this page with the new rule in the appropriate table above

  5. Redeploy the UDF to BigQuery and verify via the catalog view

  6. Update the changelog (/changelog/)

Cross-Reference

  • Catalog view: campaign_dimension_catalog_v2 — see all current dimension combinations
  • UDF source files: sql/customer_ads_suse/routines/get_product_line.sql, get_product.sql, get_subproduct.sql
  • Dev-docs: dev-docs/campaign-dimension-mapping/normalization-rules.md — technical implementation notes
  • Project instructions: Reference this page as mandatory reading before UDF changes

SUSE Paid Advertising Data Warehouse V2