Juniro AI Architecture

Version: 0.1 (Draft) Last Updated: January 2026 Status: Pre-implementation

Out of Scope for v0.1

The following AI features are explicitly not included in this version:

Custom-trained ML models
Real-time personalization engine
Dynamic pricing algorithms
Voice interfaces / speech recognition
Computer vision / image analysis
AI-powered chat support (use off-the-shelf: Intercom, Crisp)
Recommendation engine with collaborative filtering
Sentiment analysis pipeline
Automated content moderation

Overview
Stack Decisions
Architecture
Data Residency
What Gets Embedded
Database Schema
API Endpoints
Implementation
Cost Estimates
Phased Rollout
Engineering Rules

1. Overview

Juniro's AI capabilities enhance discovery and content creation without being core to the product. The platform must work fully without AI (graceful degradation).

Core Principle

AI is a feature enhancement, not the product. Keep it simple, cheap, and swappable.

MVP AI Features

Feature	Description	User Benefit
Semantic Search	Natural language activity search	"art classes for shy 5-year-old"
Review Summaries	AI-generated summary of reviews	Quick decision making
Content Assistant	Help providers write descriptions	Better listings

2. Stack Decisions

LLM Provider: Groq (US-based)

Why Groq:

US-based company (data stays in US)
Extremely fast inference (~500 tokens/sec)
Generous free tier (14,400 requests/day)
OpenAI-compatible API
Llama 3.1 models (open, capable)

Rejected alternatives:

Provider	Reason for rejection
DeepSeek API	China-based, data residency concerns
OpenAI	Expensive for MVP
Anthropic	Expensive for MVP
Self-hosted	Ops burden too high for MVP

Embeddings Provider: Together AI (US-based)

Why Together AI:

US-based company
Dedicated embeddings API
Competitive pricing ($0.008/M tokens)
Wide model selection

Vector Database: pgvector

Why pgvector:

Free (Postgres extension)
Already using Postgres
No additional infrastructure
Good enough for MVP scale
Stays in regional database (data residency)

Summary

┌─────────────────────────────────────────┐
│           MVP AI Stack                  │
├─────────────────────────────────────────┤
│  LLM:        Groq (Llama 3.1 8B)       │
│  Embeddings: Together AI               │
│  Vector DB:  pgvector (in Postgres)    │
│  Cache:      Redis                      │
└─────────────────────────────────────────┘

3. Architecture

High-Level Flow

┌─────────────────────────────────────────────────────────┐
│                      Frontend                           │
│         (No direct AI calls - all via API)             │
└─────────────────────┬───────────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────────┐
│                   Regional API                          │
│  ┌─────────────────────────────────────────────────┐   │
│  │              AI Service Layer                    │   │
│  │  • Strips PII before external calls             │   │
│  │  • Caches responses (cost control)              │   │
│  │  • Rate limits per user                         │   │
│  │  • Falls back gracefully if AI unavailable      │   │
│  └──────────────┬───────────────────┬──────────────┘   │
│                 │                   │                   │
│  ┌──────────────▼──────┐ ┌─────────▼──────────────┐   │
│  │   Regional Postgres  │ │    Redis Cache         │   │
│  │   + pgvector         │ │    (response cache)    │   │
│  │   (embeddings)       │ │                        │   │
│  └──────────────────────┘ └────────────────────────┘   │
└─────────────────────┬───────────────────────────────────┘
                      │
         ┌────────────┼────────────┐
         │            │            │
         ▼            ▼            ▼
   ┌──────────┐ ┌──────────┐ ┌──────────┐
   │   Groq   │ │ Together │ │  Redis   │
   │   (US)   │ │   (US)   │ │  Cache   │
   │   LLM    │ │ Embeddings│ │          │
   └──────────┘ └──────────┘ └──────────┘

Request Flow: Semantic Search

1. User: "art classes for shy 5-year-old near Austin"
                    │
2. API receives request, checks cache
                    │
3. Cache miss → Generate embedding (Together AI)
                    │
4. Vector search in pgvector (top 20 matches)
                    │
5. Optional: LLM formats/ranks results (Groq)
                    │
6. Cache response (1 hour TTL)
                    │
7. Return results to user

Request Flow: Review Summary

1. User views activity page
                    │
2. API checks cache for summary
                    │
3. Cache miss → Fetch reviews from DB
                    │
4. Send reviews to Groq (anonymized)
                    │
5. Cache summary (24 hour TTL)
                    │
6. Return summary to user

4. Data Residency

PII Rules

NEVER send to external AI services:
├── User names
├── Email addresses
├── Phone numbers
├── Physical addresses
├── Child names or identifiable info
├── Payment information
└── Any data that identifies a specific person

OK to send:
├── Activity descriptions (public content)
├── Provider business descriptions (public)
├── Anonymized queries ("classes for 5-year-old")
├── Review text (with names stripped)
└── Help documentation

Regional Considerations

US Region:
├── Groq API (US servers) ✓
├── Together AI (US servers) ✓
└── pgvector in us-central1 ✓

India Region:
├── Groq API (US servers) - OK for non-PII
├── Together AI (US servers) - OK for non-PII
└── pgvector in asia-south1 ✓ (embeddings stay in region)

PII Stripping Implementation

function stripPII(text: string): string {
  return text
    // Remove emails
    .replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL]')
    // Remove phone numbers
    .replace(/(\+\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g, '[PHONE]')
    // Remove names (if tagged)
    .replace(/\[NAME:.*?\]/g, '[NAME]')
}

5. What Gets Embedded

Content	Embed?	Reason
Activity title + description	Yes	Semantic search
Provider business description	Yes	Provider discovery
Help docs / FAQs	Yes	Support Q&A
Category names	Yes	Search enhancement
Reviews	No	Query by activity_id, summarize on-demand
User data	Never	PII - no embeddings
Booking data	Never	PII - no embeddings

Embedding Update Strategy

When to re-embed:
├── Activity created → Embed immediately
├── Activity description updated → Re-embed
├── Provider description updated → Re-embed
└── Batch job: Re-embed all weekly (catch drift)

When NOT to re-embed:
├── Price changes (not in embedding)
├── Schedule changes
├── Booking counts
└── Review additions (reviews not embedded)

6. Database Schema

pgvector Setup

-- Enable extension (both US and IN regions)
CREATE EXTENSION IF NOT EXISTS vector;

-- Activity embeddings
ALTER TABLE activities
ADD COLUMN IF NOT EXISTS embedding vector(768),
ADD COLUMN IF NOT EXISTS embedding_updated_at TIMESTAMP;

-- Provider embeddings
ALTER TABLE providers
ADD COLUMN IF NOT EXISTS embedding vector(768),
ADD COLUMN IF NOT EXISTS embedding_updated_at TIMESTAMP;

-- Help docs table
CREATE TABLE IF NOT EXISTS help_docs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  slug VARCHAR(255) UNIQUE NOT NULL,
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL,
  category VARCHAR(100), -- 'parent' | 'provider' | 'general'
  embedding vector(768),
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Vector indexes (IVFFlat - good balance of speed/accuracy)
CREATE INDEX IF NOT EXISTS idx_activities_embedding
ON activities USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

CREATE INDEX IF NOT EXISTS idx_providers_embedding
ON providers USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 50);

CREATE INDEX IF NOT EXISTS idx_help_docs_embedding
ON help_docs USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 20);

Prisma Schema Addition

model Activity {
  // ... existing fields
  embedding           Unsupported("vector(768)")?
  embeddingUpdatedAt  DateTime? @map("embedding_updated_at")
}

model Provider {
  // ... existing fields
  embedding           Unsupported("vector(768)")?
  embeddingUpdatedAt  DateTime? @map("embedding_updated_at")
}

model HelpDoc {
  id        String   @id @default(uuid())
  slug      String   @unique
  title     String
  content   String
  category  String?
  embedding Unsupported("vector(768)")?
  createdAt DateTime @default(now()) @map("created_at")
  updatedAt DateTime @updatedAt @map("updated_at")

  @@map("help_docs")
}

7. API Endpoints

AI Routes

POST /v1/ai/search          # Semantic activity search
GET  /v1/ai/activities/:id/summary  # Review summary
POST /v1/ai/assist/description      # Provider content help

Request/Response Formats

Semantic Search

// POST /v1/ai/search
// Request
{
  "query": "art classes for shy 5-year-old",
  "filters": {
    "city": "Austin",
    "ageMin": 4,
    "ageMax": 6
  },
  "limit": 20
}

// Response
{
  "success": true,
  "data": {
    "activities": [
      {
        "id": "uuid",
        "title": "Creative Art Explorers",
        "description": "...",
        "relevanceScore": 0.92
      }
    ],
    "query_understanding": "Looking for art activities suitable for a 5-year-old who may be introverted"
  }
}

Review Summary

// GET /v1/ai/activities/:id/summary
// Response
{
  "success": true,
  "data": {
    "summary": "Parents consistently praise the patient instructors and small class sizes. Great for beginners.",
    "highlights": ["Patient instructors", "Small classes", "Beginner-friendly"],
    "reviewCount": 24,
    "cachedAt": "2026-01-08T12:00:00Z"
  }
}

Content Assistant

// POST /v1/ai/assist/description
// Request
{
  "draft": "We teach kids art. Classes on Saturday.",
  "activityType": "art",
  "tone": "friendly"
}

// Response
{
  "success": true,
  "data": {
    "improved": "Spark your child's creativity in our Saturday art classes! Our experienced instructors guide young artists through fun projects...",
    "suggestions": [
      "Add age range",
      "Mention class size",
      "Include what materials are provided"
    ]
  }
}

Rate Limits

Endpoint	Limit	Window
/ai/search	20	per minute
/ai/summary	10	per minute
/ai/assist	5	per minute

8. Implementation

AI Service Layer

// src/services/ai/index.ts

export interface AIProvider {
  complete(params: CompletionParams): Promise<string>
}

export interface EmbeddingProvider {
  embed(text: string): Promise<number[]>
  embedBatch(texts: string[]): Promise<number[][]>
}

// Factory - swap providers easily
export function createAIProvider(): AIProvider {
  return new GroqProvider()
}

export function createEmbeddingProvider(): EmbeddingProvider {
  return new TogetherEmbeddingProvider()
}

Groq Provider

// src/services/ai/providers/groq.ts

const GROQ_BASE = 'https://api.groq.com/openai/v1'

export class GroqProvider implements AIProvider {
  async complete(params: CompletionParams): Promise<string> {
    const res = await fetch(`${GROQ_BASE}/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${config.groq.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'llama-3.1-8b-instant',
        messages: [
          { role: 'system', content: params.system },
          { role: 'user', content: params.prompt },
        ],
        max_tokens: params.maxTokens ?? 500,
        temperature: params.temperature ?? 0.7,
      }),
    })

    if (!res.ok) {
      throw new AIProviderError('Groq API error', res.status)
    }

    const data = await res.json()
    return data.choices[0].message.content
  }
}

Together Embeddings Provider

// src/services/ai/providers/together.ts

const TOGETHER_BASE = 'https://api.together.xyz/v1'

export class TogetherEmbeddingProvider implements EmbeddingProvider {
  async embed(text: string): Promise<number[]> {
    const res = await fetch(`${TOGETHER_BASE}/embeddings`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${config.together.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'togethercomputer/m2-bert-80M-8k-retrieval',
        input: text,
      }),
    })

    const data = await res.json()
    return data.data[0].embedding
  }

  async embedBatch(texts: string[]): Promise<number[][]> {
    const res = await fetch(`${TOGETHER_BASE}/embeddings`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${config.together.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'togethercomputer/m2-bert-80M-8k-retrieval',
        input: texts,
      }),
    })

    const data = await res.json()
    return data.data.map((d: any) => d.embedding)
  }
}

Semantic Search Service

// src/services/ai/search.ts

export async function semanticActivitySearch(
  query: string,
  filters?: SearchFilters
): Promise<SearchResult[]> {
  // 1. Check cache
  const cacheKey = `search:${hash(query, filters)}`
  const cached = await redis.get(cacheKey)
  if (cached) return JSON.parse(cached)

  // 2. Strip PII from query
  const safeQuery = stripPII(query)

  // 3. Generate embedding
  const embedding = await embeddingProvider.embed(safeQuery)

  // 4. Vector search with filters
  const results = await db.$queryRaw<SearchResult[]>`
    SELECT
      id, title, description, category,
      price_cents, age_min, age_max, city,
      1 - (embedding <=> ${embedding}::vector) AS relevance_score
    FROM activities
    WHERE status = 'published'
      AND embedding IS NOT NULL
      ${filters?.city ? Prisma.sql`AND city = ${filters.city}` : Prisma.empty}
      ${filters?.ageMin ? Prisma.sql`AND age_max >= ${filters.ageMin}` : Prisma.empty}
      ${filters?.ageMax ? Prisma.sql`AND age_min <= ${filters.ageMax}` : Prisma.empty}
    ORDER BY embedding <=> ${embedding}::vector
    LIMIT ${filters?.limit ?? 20}
  `

  // 5. Cache results (1 hour)
  await redis.set(cacheKey, JSON.stringify(results), 'EX', 3600)

  return results
}

Graceful Degradation

// Always have fallback when AI is unavailable

export async function searchActivities(query: string, filters?: SearchFilters) {
  try {
    // Try semantic search first
    return await semanticActivitySearch(query, filters)
  } catch (error) {
    logger.warn('AI search failed, falling back to text search', { error })

    // Fallback to basic text search
    return await textSearchActivities(query, filters)
  }
}

9. Cost Estimates

Per-Operation Costs

Operation	Provider	Cost
LLM (Llama 3.1 8B)	Groq	$0.05/M input, $0.08/M output
LLM (Llama 3.1 70B)	Groq	$0.59/M input, $0.79/M output
Embeddings	Together	$0.008/M tokens
Vector search	pgvector	$0 (in Postgres)

Monthly Estimates

Scale	AI Searches	Summaries	Assists	Total Cost
1K users	5K	1K	500	~$5
10K users	50K	10K	2K	~$20-30
50K users	250K	50K	10K	~$100-150
100K users	500K	100K	20K	~$200-300

Cost Control Measures

Aggressive caching - Same query = cached response (1-24h TTL)
Rate limiting - Per-user limits prevent abuse
Smaller models first - Use 8B model, escalate to 70B only if needed
Embedding batching - Batch embed new activities (cheaper)
Response length limits - Cap max_tokens

10. Phased Rollout

Phase 1: MVP (Now)

Stack:
├── Groq API (Llama 3.1 8B)
├── Together AI (embeddings)
├── pgvector (in regional Postgres)
└── Redis (response cache)

Features:
├── Semantic search (basic)
├── Review summaries (cached)
└── Provider content assistant

Cost: ~$20-30/month

Phase 2: Enhanced (When needed)

Additions:
├── Larger model for complex queries (Llama 3.1 70B)
├── Query understanding / intent detection
├── Search result re-ranking
└── A/B testing framework for AI features

Trigger: User feedback indicates search quality issues

Phase 3: Scale (500K+ users)

Options:
├── Self-hosted models (if cost > $500/month)
│   ├── Ollama + Llama 3.1 on GCP
│   ├── One GPU per region
│   └── ~$1,000/month total
│
└── Or stay on APIs (if cost-effective)
    └── Volume discounts from providers

Phase 4: Advanced (Future)

Consider when product-market fit proven:
├── Custom fine-tuned models
├── Real-time personalization
├── Recommendation engine
├── Dynamic pricing
└── Voice/vision AI

11. Engineering Rules

Must Do

Never send PII to external AI services
Always have fallback - Platform works without AI
Cache aggressively - Identical queries return cached responses
Rate limit all AI endpoints - Prevent abuse and cost overruns
Log AI usage - Track costs, latency, error rates
Abstract providers - Easy to swap Groq ↔ Together ↔ self-hosted

Must Not Do

Never store AI responses as source of truth - AI can hallucinate
Never make AI required for core flows - Booking, payment must work without AI
Never expose API keys to frontend - All AI calls go through backend
Never skip PII stripping - Even for "internal" calls
Never trust AI for safety-critical decisions - Human review required

Monitoring

// Track these metrics
const aiMetrics = {
  // Latency
  'ai.search.latency_ms': histogram,
  'ai.summary.latency_ms': histogram,

  // Costs
  'ai.tokens.input': counter,
  'ai.tokens.output': counter,

  // Errors
  'ai.errors.provider': counter,
  'ai.errors.timeout': counter,

  // Cache
  'ai.cache.hit': counter,
  'ai.cache.miss': counter,
}

Environment Variables

# AI - LLM (Groq)
GROQ_API_KEY=gsk_...

# AI - Embeddings (Together AI)
TOGETHER_API_KEY=...

# AI - Feature flags
AI_SEARCH_ENABLED=true
AI_SUMMARIES_ENABLED=true
AI_ASSIST_ENABLED=true

# AI - Rate limits
AI_SEARCH_RATE_LIMIT=20
AI_SUMMARY_RATE_LIMIT=10
AI_ASSIST_RATE_LIMIT=5

API Architecture - Backend API design
AI Components - Frontend AI component specs

Out of Scope for v0.1​

Table of Contents​

1. Overview​

Core Principle​

MVP AI Features​

2. Stack Decisions​

LLM Provider: Groq (US-based)​

Embeddings Provider: Together AI (US-based)​

Vector Database: pgvector​

Summary​

3. Architecture​

High-Level Flow​

Request Flow: Semantic Search​

Request Flow: Review Summary​

4. Data Residency​

PII Rules​

Regional Considerations​

PII Stripping Implementation​

5. What Gets Embedded​

Embedding Update Strategy​

6. Database Schema​

pgvector Setup​

Prisma Schema Addition​

7. API Endpoints​

AI Routes​

Request/Response Formats​

Semantic Search​

Review Summary​

Content Assistant​

Rate Limits​

8. Implementation​

AI Service Layer​

Groq Provider​

Together Embeddings Provider​

Semantic Search Service​

Graceful Degradation​

9. Cost Estimates​

Per-Operation Costs​

Monthly Estimates​

Cost Control Measures​

10. Phased Rollout​

Phase 1: MVP (Now)​

Phase 2: Enhanced (When needed)​

Phase 3: Scale (500K+ users)​

Phase 4: Advanced (Future)​

11. Engineering Rules​

Must Do​

Must Not Do​

Monitoring​

Environment Variables​

Related Documents​