BigQuery Data Platform - Pending Work

@NickBrooks-ks3lspecsFeb 23, 2026

arlem

Status: Core infrastructure complete, data syncs and personalization pending Last Updated: February 2026

What’s Been Implemented

The BigQuery data warehouse is operational with the core infrastructure in place:

Component	Status	Notes
GCP Project	Done	`arlem-analytics-warehouse`, australia-southeast1
BigQuery Schema	Done	8 tables: visitors, sessions, events, shopifyorders, metaadsspend, klaviyoprofiles, instagramposts, gscqueries
Cosmos to BigQuery Sync	Done	Hourly sync, 5,846 visitors/8,158 sessions/16,971 events migrated
Shopify Orders Sync	Done	49 orders synced with product categorisation
Meta Ads Sync	Done	286 rows (Feb 2023 - Jul 2024, $1,342.65 spend)
Analytics Views	Done	orderattribution, campaignroas, weeklyperformance, sourcesummary, channel_influence
BigQuery MCP Tools	Done	5 tools in arlem-analytics MCP server
Visitor Segment Hook	Done	`useVisitorSegment()` returns Cold/Interested/Warm/Hot/Customer
Segment Reverse Sync	Done	BigQuery to Cosmos to API response to localStorage

Data Flow:

Website Events --> Cosmos DB --> BigQuery (hourly)
                                    |
External APIs (Shopify, Meta) ------+
                                    |
BigQuery --> Analytics Views --> MCP Tools --> Claude Code
         --> Cosmos --> API --> localStorage --> useVisitorSegment()

Pending Work

1. Data Sync: Klaviyo Profiles

Priority: Medium Effort: 4-6 hours

Sync Klaviyo profiles to BigQuery for email-based identity resolution.

Why it matters:

Link email subscribers to visitor journeys
Enable email-based retargeting
Currently only 0.1% of visitors have email captured

Implementation:

Create scheduledKlaviyoSync.ts (daily timer trigger)
Use existing Klaviyo MCP API patterns
Sync profiles updated since last sync
Link to visitor_id via email match
Insert to klaviyo_profiles table

Table schema (already exists):

klaviyo_profiles (profile_id, email, first_name, last_name, created, subscribed, lists, visitor_id, synced_at)

2. Data Sync: Instagram Posts

Priority: Low Effort: 4-6 hours

Sync Instagram posts with insights to BigQuery.

Why it matters:

Track which posts drive traffic and sales
Currently can’t answer “Which Instagram posts lead to revenue?”

Implementation:

Create scheduledInstagramSync.ts (daily timer trigger)
Use existing Instagram MCP API patterns
Sync posts with engagement metrics
Insert to instagram_posts table

Table schema (already exists):

instagram_posts (post_id, posted_at, caption, media_type, permalink, impressions, reach, engagement, saves, synced_at)

3. Data Sync: Google Search Console

Priority: Medium Effort: 2-4 hours

Current state: Table exists but has 0 rows. Pipeline broken or never enabled.

Why it matters:

Track search performance and SEO health
Required for content gap analysis

Implementation:

Create scheduledGSCSync.ts (daily timer trigger)
Use existing GSC MCP API patterns
Note: GSC data has 3-day delay
Insert to gsc_queries table

Table schema (already exists):

gsc_queries (date, query, page, clicks, impressions, ctr, position, synced_at)

4. Personalization: Segment-Aware Components

Priority: High Effort: 1-2 days

The useVisitorSegment() hook is working but no components use it yet.

Implementation:

Create ForSegment wrapper component
Create ForPropensity wrapper component
Implement segment-specific UI:

- Cold: Brand story banner, trust badges - Interested: “Compare styles” prompt, size guide visibility - Warm: Customer reviews, “Still deciding?” message - Hot: Cart reminder bar, urgency messaging - Customer: Welcome back, cross-sell suggestions

Example usage (from spec):

<ForSegment segments={['Warm', 'Hot', 'Customer']}>
  <CustomerReviews product={product} />
</ForSegment>

5. Personalization: Propensity Scoring

Priority: High Effort: 1-2 days

Add propensity score alongside journey segment for finer-grained personalization.

The validated scoring model:

propensity_score (0-16) =
  Gallery swipes: (20+: +5, 10-19: +3, 5-9: +2, 1-4: +1)
  Sessions: (3+: +3, 2: +2)
  Products viewed: (3+: +2, 2+: +1)
  AU/NZ location: +1
  High-intent source (IG Click, Google Shopping): +2

Key thresholds:

Score	Precision	Recall	Use Case
>= 5	16%	51.5%	Standard personalization
>= 8	34%	27%	Discount offers
>= 10	54%	15%	VIP treatment

Implementation:

Compute propensity_score in BigQuery (already in schema)
Return score in /api/ping response
Store in localStorage as _ps
Create usePropensityScore() hook
Trigger offers based on thresholds

6. Tracking Gaps to Fix

Priority: Critical (for attribution accuracy)

Gap	Impact	Fix
Order-to-visitor linkage	78% of purchases unattributed	Store visitor_id in Shopify order notes via cart attributes
Click ID persistence	fbclid/gclid lost at checkout	Server-side tracking (Meta CAPI, Google Enhanced)
Email capture rate	0.1% of visitors	Exit-intent popup, “Save cart” feature
Checkout events	No visibility between cart and purchase	Add checkoutstarted, checkoutcompleted events

7. Validation Tasks

Priority: Medium Effort: 2-4 hours

Validate Cosmos to BigQuery data integrity
Validate Shopify order counts (49 synced vs actual)
Spot-check 10 random visitor journeys
Compare BigQuery ROAS with Meta dashboard
Document expected differences
Performance test sync functions (< 15 min)
Test segment API response time (< 500ms)
Review and adjust journey stage thresholds

Key Insights from Analysis

These findings from the initial BigQuery analysis inform the pending work:

Finding	Implication
Instagram Click converts 7.5x better than Direct	Shift ad spend to Instagram Click
Gallery swipes 20+ = 50%+ cart rate	Surface gallery earlier, trigger CTAs after swipes
Sunday converts 2x vs Tuesday	Schedule campaigns for Sunday
517 Warm visitors stuck	High-intent, need personalization nudge
86% cart abandonment	Above average but recoverable; address shipping/size uncertainty
TikTok: 355 visitors, 0 carts	Pause or fix targeting
Retargeting: 2% of spend, 0 carts	Increase budget to 30%, segment properly

Files Reference

The original implementation is documented in:

analytics/data/constants.ts - BigQuery config
analytics/types/bigquery.ts - Row types
analytics/repositories/bigquery.ts - Client
analytics/repositories/bigqueryTransformers.ts - Data transforms
analytics/src/functions/scheduledBigQuerySync.ts - Hourly sync
analytics/src/functions/scheduledShopifySync.ts - Shopify orders
analytics/src/functions/scheduledMetaAdsSync.ts - Meta Ads spend
analytics/sql/views/*.sql - Analytics views
mcp/src/arlem-analytics/tools/bigquery.ts - MCP tools
web/lib/hooks/use-visitor-segment.ts - Segment hook

Cost

Current monthly cost: ~$5-10 (BigQuery streaming inserts are the main variable)

BigQuery storage is 12x cheaper than Cosmos DB, so adding capability actually reduced overall costs.