Skip to main content

Juniro AI Architecture

Version: 0.1 (Draft) Last Updated: January 2026 Status: Pre-implementation


Out of Scope for v0.1

The following AI features are explicitly not included in this version:

  • Custom-trained ML models
  • Real-time personalization engine
  • Dynamic pricing algorithms
  • Voice interfaces / speech recognition
  • Computer vision / image analysis
  • AI-powered chat support (use off-the-shelf: Intercom, Crisp)
  • Recommendation engine with collaborative filtering
  • Sentiment analysis pipeline
  • Automated content moderation

Table of Contents

  1. Overview
  2. Stack Decisions
  3. Architecture
  4. Data Residency
  5. What Gets Embedded
  6. Database Schema
  7. API Endpoints
  8. Implementation
  9. Cost Estimates
  10. Phased Rollout
  11. Engineering Rules

1. Overview

Juniro's AI capabilities enhance discovery and content creation without being core to the product. The platform must work fully without AI (graceful degradation).

Core Principle

AI is a feature enhancement, not the product. Keep it simple, cheap, and swappable.

MVP AI Features

FeatureDescriptionUser Benefit
Semantic SearchNatural language activity search"art classes for shy 5-year-old"
Review SummariesAI-generated summary of reviewsQuick decision making
Content AssistantHelp providers write descriptionsBetter listings

2. Stack Decisions

LLM Provider: Groq (US-based)

Why Groq:

  • US-based company (data stays in US)
  • Extremely fast inference (~500 tokens/sec)
  • Generous free tier (14,400 requests/day)
  • OpenAI-compatible API
  • Llama 3.1 models (open, capable)

Rejected alternatives:

ProviderReason for rejection
DeepSeek APIChina-based, data residency concerns
OpenAIExpensive for MVP
AnthropicExpensive for MVP
Self-hostedOps burden too high for MVP

Embeddings Provider: Together AI (US-based)

Why Together AI:

  • US-based company
  • Dedicated embeddings API
  • Competitive pricing ($0.008/M tokens)
  • Wide model selection

Vector Database: pgvector

Why pgvector:

  • Free (Postgres extension)
  • Already using Postgres
  • No additional infrastructure
  • Good enough for MVP scale
  • Stays in regional database (data residency)

Summary

┌─────────────────────────────────────────┐
│ MVP AI Stack │
├─────────────────────────────────────────┤
│ LLM: Groq (Llama 3.1 8B) │
│ Embeddings: Together AI │
│ Vector DB: pgvector (in Postgres) │
│ Cache: Redis │
└─────────────────────────────────────────┘

3. Architecture

High-Level Flow

┌─────────────────────────────────────────────────────────┐
│ Frontend │
│ (No direct AI calls - all via API) │
└─────────────────────┬───────────────────────────────────┘

┌─────────────────────▼───────────────────────────────────┐
│ Regional API │
│ ┌─────────────────────────────────────────────────┐ │
│ │ AI Service Layer │ │
│ │ • Strips PII before external calls │ │
│ │ • Caches responses (cost control) │ │
│ │ • Rate limits per user │ │
│ │ • Falls back gracefully if AI unavailable │ │
│ └──────────────┬───────────────────┬──────────────┘ │
│ │ │ │
│ ┌──────────────▼──────┐ ┌─────────▼──────────────┐ │
│ │ Regional Postgres │ │ Redis Cache │ │
│ │ + pgvector │ │ (response cache) │ │
│ │ (embeddings) │ │ │ │
│ └──────────────────────┘ └────────────────────────┘ │
└─────────────────────┬───────────────────────────────────┘

┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Groq │ │ Together │ │ Redis │
│ (US) │ │ (US) │ │ Cache │
│ LLM │ │ Embeddings│ │ │
└──────────┘ └──────────┘ └──────────┘
1. User: "art classes for shy 5-year-old near Austin"

2. API receives request, checks cache

3. Cache miss → Generate embedding (Together AI)

4. Vector search in pgvector (top 20 matches)

5. Optional: LLM formats/ranks results (Groq)

6. Cache response (1 hour TTL)

7. Return results to user

Request Flow: Review Summary

1. User views activity page

2. API checks cache for summary

3. Cache miss → Fetch reviews from DB

4. Send reviews to Groq (anonymized)

5. Cache summary (24 hour TTL)

6. Return summary to user

4. Data Residency

PII Rules

NEVER send to external AI services:
├── User names
├── Email addresses
├── Phone numbers
├── Physical addresses
├── Child names or identifiable info
├── Payment information
└── Any data that identifies a specific person

OK to send:
├── Activity descriptions (public content)
├── Provider business descriptions (public)
├── Anonymized queries ("classes for 5-year-old")
├── Review text (with names stripped)
└── Help documentation

Regional Considerations

US Region:
├── Groq API (US servers) ✓
├── Together AI (US servers) ✓
└── pgvector in us-central1 ✓

India Region:
├── Groq API (US servers) - OK for non-PII
├── Together AI (US servers) - OK for non-PII
└── pgvector in asia-south1 ✓ (embeddings stay in region)

PII Stripping Implementation

function stripPII(text: string): string {
return text
// Remove emails
.replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL]')
// Remove phone numbers
.replace(/(\+\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g, '[PHONE]')
// Remove names (if tagged)
.replace(/\[NAME:.*?\]/g, '[NAME]')
}

5. What Gets Embedded

ContentEmbed?Reason
Activity title + descriptionYesSemantic search
Provider business descriptionYesProvider discovery
Help docs / FAQsYesSupport Q&A
Category namesYesSearch enhancement
ReviewsNoQuery by activity_id, summarize on-demand
User dataNeverPII - no embeddings
Booking dataNeverPII - no embeddings

Embedding Update Strategy

When to re-embed:
├── Activity created → Embed immediately
├── Activity description updated → Re-embed
├── Provider description updated → Re-embed
└── Batch job: Re-embed all weekly (catch drift)

When NOT to re-embed:
├── Price changes (not in embedding)
├── Schedule changes
├── Booking counts
└── Review additions (reviews not embedded)

6. Database Schema

pgvector Setup

-- Enable extension (both US and IN regions)
CREATE EXTENSION IF NOT EXISTS vector;

-- Activity embeddings
ALTER TABLE activities
ADD COLUMN IF NOT EXISTS embedding vector(768),
ADD COLUMN IF NOT EXISTS embedding_updated_at TIMESTAMP;

-- Provider embeddings
ALTER TABLE providers
ADD COLUMN IF NOT EXISTS embedding vector(768),
ADD COLUMN IF NOT EXISTS embedding_updated_at TIMESTAMP;

-- Help docs table
CREATE TABLE IF NOT EXISTS help_docs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug VARCHAR(255) UNIQUE NOT NULL,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
category VARCHAR(100), -- 'parent' | 'provider' | 'general'
embedding vector(768),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);

-- Vector indexes (IVFFlat - good balance of speed/accuracy)
CREATE INDEX IF NOT EXISTS idx_activities_embedding
ON activities USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

CREATE INDEX IF NOT EXISTS idx_providers_embedding
ON providers USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 50);

CREATE INDEX IF NOT EXISTS idx_help_docs_embedding
ON help_docs USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 20);

Prisma Schema Addition

model Activity {
// ... existing fields
embedding Unsupported("vector(768)")?
embeddingUpdatedAt DateTime? @map("embedding_updated_at")
}

model Provider {
// ... existing fields
embedding Unsupported("vector(768)")?
embeddingUpdatedAt DateTime? @map("embedding_updated_at")
}

model HelpDoc {
id String @id @default(uuid())
slug String @unique
title String
content String
category String?
embedding Unsupported("vector(768)")?
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")

@@map("help_docs")
}

7. API Endpoints

AI Routes

POST /v1/ai/search          # Semantic activity search
GET /v1/ai/activities/:id/summary # Review summary
POST /v1/ai/assist/description # Provider content help

Request/Response Formats

// POST /v1/ai/search
// Request
{
"query": "art classes for shy 5-year-old",
"filters": {
"city": "Austin",
"ageMin": 4,
"ageMax": 6
},
"limit": 20
}

// Response
{
"success": true,
"data": {
"activities": [
{
"id": "uuid",
"title": "Creative Art Explorers",
"description": "...",
"relevanceScore": 0.92
}
],
"query_understanding": "Looking for art activities suitable for a 5-year-old who may be introverted"
}
}

Review Summary

// GET /v1/ai/activities/:id/summary
// Response
{
"success": true,
"data": {
"summary": "Parents consistently praise the patient instructors and small class sizes. Great for beginners.",
"highlights": ["Patient instructors", "Small classes", "Beginner-friendly"],
"reviewCount": 24,
"cachedAt": "2026-01-08T12:00:00Z"
}
}

Content Assistant

// POST /v1/ai/assist/description
// Request
{
"draft": "We teach kids art. Classes on Saturday.",
"activityType": "art",
"tone": "friendly"
}

// Response
{
"success": true,
"data": {
"improved": "Spark your child's creativity in our Saturday art classes! Our experienced instructors guide young artists through fun projects...",
"suggestions": [
"Add age range",
"Mention class size",
"Include what materials are provided"
]
}
}

Rate Limits

EndpointLimitWindow
/ai/search20per minute
/ai/summary10per minute
/ai/assist5per minute

8. Implementation

AI Service Layer

// src/services/ai/index.ts

export interface AIProvider {
complete(params: CompletionParams): Promise<string>
}

export interface EmbeddingProvider {
embed(text: string): Promise<number[]>
embedBatch(texts: string[]): Promise<number[][]>
}

// Factory - swap providers easily
export function createAIProvider(): AIProvider {
return new GroqProvider()
}

export function createEmbeddingProvider(): EmbeddingProvider {
return new TogetherEmbeddingProvider()
}

Groq Provider

// src/services/ai/providers/groq.ts

const GROQ_BASE = 'https://api.groq.com/openai/v1'

export class GroqProvider implements AIProvider {
async complete(params: CompletionParams): Promise<string> {
const res = await fetch(`${GROQ_BASE}/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${config.groq.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'llama-3.1-8b-instant',
messages: [
{ role: 'system', content: params.system },
{ role: 'user', content: params.prompt },
],
max_tokens: params.maxTokens ?? 500,
temperature: params.temperature ?? 0.7,
}),
})

if (!res.ok) {
throw new AIProviderError('Groq API error', res.status)
}

const data = await res.json()
return data.choices[0].message.content
}
}

Together Embeddings Provider

// src/services/ai/providers/together.ts

const TOGETHER_BASE = 'https://api.together.xyz/v1'

export class TogetherEmbeddingProvider implements EmbeddingProvider {
async embed(text: string): Promise<number[]> {
const res = await fetch(`${TOGETHER_BASE}/embeddings`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${config.together.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'togethercomputer/m2-bert-80M-8k-retrieval',
input: text,
}),
})

const data = await res.json()
return data.data[0].embedding
}

async embedBatch(texts: string[]): Promise<number[][]> {
const res = await fetch(`${TOGETHER_BASE}/embeddings`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${config.together.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'togethercomputer/m2-bert-80M-8k-retrieval',
input: texts,
}),
})

const data = await res.json()
return data.data.map((d: any) => d.embedding)
}
}

Semantic Search Service

// src/services/ai/search.ts

export async function semanticActivitySearch(
query: string,
filters?: SearchFilters
): Promise<SearchResult[]> {
// 1. Check cache
const cacheKey = `search:${hash(query, filters)}`
const cached = await redis.get(cacheKey)
if (cached) return JSON.parse(cached)

// 2. Strip PII from query
const safeQuery = stripPII(query)

// 3. Generate embedding
const embedding = await embeddingProvider.embed(safeQuery)

// 4. Vector search with filters
const results = await db.$queryRaw<SearchResult[]>`
SELECT
id, title, description, category,
price_cents, age_min, age_max, city,
1 - (embedding <=> ${embedding}::vector) AS relevance_score
FROM activities
WHERE status = 'published'
AND embedding IS NOT NULL
${filters?.city ? Prisma.sql`AND city = ${filters.city}` : Prisma.empty}
${filters?.ageMin ? Prisma.sql`AND age_max >= ${filters.ageMin}` : Prisma.empty}
${filters?.ageMax ? Prisma.sql`AND age_min <= ${filters.ageMax}` : Prisma.empty}
ORDER BY embedding <=> ${embedding}::vector
LIMIT ${filters?.limit ?? 20}
`

// 5. Cache results (1 hour)
await redis.set(cacheKey, JSON.stringify(results), 'EX', 3600)

return results
}

Graceful Degradation

// Always have fallback when AI is unavailable

export async function searchActivities(query: string, filters?: SearchFilters) {
try {
// Try semantic search first
return await semanticActivitySearch(query, filters)
} catch (error) {
logger.warn('AI search failed, falling back to text search', { error })

// Fallback to basic text search
return await textSearchActivities(query, filters)
}
}

9. Cost Estimates

Per-Operation Costs

OperationProviderCost
LLM (Llama 3.1 8B)Groq$0.05/M input, $0.08/M output
LLM (Llama 3.1 70B)Groq$0.59/M input, $0.79/M output
EmbeddingsTogether$0.008/M tokens
Vector searchpgvector$0 (in Postgres)

Monthly Estimates

ScaleAI SearchesSummariesAssistsTotal Cost
1K users5K1K500~$5
10K users50K10K2K~$20-30
50K users250K50K10K~$100-150
100K users500K100K20K~$200-300

Cost Control Measures

  1. Aggressive caching - Same query = cached response (1-24h TTL)
  2. Rate limiting - Per-user limits prevent abuse
  3. Smaller models first - Use 8B model, escalate to 70B only if needed
  4. Embedding batching - Batch embed new activities (cheaper)
  5. Response length limits - Cap max_tokens

10. Phased Rollout

Phase 1: MVP (Now)

Stack:
├── Groq API (Llama 3.1 8B)
├── Together AI (embeddings)
├── pgvector (in regional Postgres)
└── Redis (response cache)

Features:
├── Semantic search (basic)
├── Review summaries (cached)
└── Provider content assistant

Cost: ~$20-30/month

Phase 2: Enhanced (When needed)

Additions:
├── Larger model for complex queries (Llama 3.1 70B)
├── Query understanding / intent detection
├── Search result re-ranking
└── A/B testing framework for AI features

Trigger: User feedback indicates search quality issues

Phase 3: Scale (500K+ users)

Options:
├── Self-hosted models (if cost > $500/month)
│ ├── Ollama + Llama 3.1 on GCP
│ ├── One GPU per region
│ └── ~$1,000/month total

└── Or stay on APIs (if cost-effective)
└── Volume discounts from providers

Phase 4: Advanced (Future)

Consider when product-market fit proven:
├── Custom fine-tuned models
├── Real-time personalization
├── Recommendation engine
├── Dynamic pricing
└── Voice/vision AI

11. Engineering Rules

Must Do

  1. Never send PII to external AI services
  2. Always have fallback - Platform works without AI
  3. Cache aggressively - Identical queries return cached responses
  4. Rate limit all AI endpoints - Prevent abuse and cost overruns
  5. Log AI usage - Track costs, latency, error rates
  6. Abstract providers - Easy to swap Groq ↔ Together ↔ self-hosted

Must Not Do

  1. Never store AI responses as source of truth - AI can hallucinate
  2. Never make AI required for core flows - Booking, payment must work without AI
  3. Never expose API keys to frontend - All AI calls go through backend
  4. Never skip PII stripping - Even for "internal" calls
  5. Never trust AI for safety-critical decisions - Human review required

Monitoring

// Track these metrics
const aiMetrics = {
// Latency
'ai.search.latency_ms': histogram,
'ai.summary.latency_ms': histogram,

// Costs
'ai.tokens.input': counter,
'ai.tokens.output': counter,

// Errors
'ai.errors.provider': counter,
'ai.errors.timeout': counter,

// Cache
'ai.cache.hit': counter,
'ai.cache.miss': counter,
}

Environment Variables

# AI - LLM (Groq)
GROQ_API_KEY=gsk_...

# AI - Embeddings (Together AI)
TOGETHER_API_KEY=...

# AI - Feature flags
AI_SEARCH_ENABLED=true
AI_SUMMARIES_ENABLED=true
AI_ASSIST_ENABLED=true

# AI - Rate limits
AI_SEARCH_RATE_LIMIT=20
AI_SUMMARY_RATE_LIMIT=10
AI_ASSIST_RATE_LIMIT=5