Juniro AI Architecture
Version: 0.1 (Draft) Last Updated: January 2026 Status: Pre-implementation
Out of Scope for v0.1
The following AI features are explicitly not included in this version:
- Custom-trained ML models
- Real-time personalization engine
- Dynamic pricing algorithms
- Voice interfaces / speech recognition
- Computer vision / image analysis
- AI-powered chat support (use off-the-shelf: Intercom, Crisp)
- Recommendation engine with collaborative filtering
- Sentiment analysis pipeline
- Automated content moderation
Table of Contents
- Overview
- Stack Decisions
- Architecture
- Data Residency
- What Gets Embedded
- Database Schema
- API Endpoints
- Implementation
- Cost Estimates
- Phased Rollout
- Engineering Rules
1. Overview
Juniro's AI capabilities enhance discovery and content creation without being core to the product. The platform must work fully without AI (graceful degradation).
Core Principle
AI is a feature enhancement, not the product. Keep it simple, cheap, and swappable.
MVP AI Features
| Feature | Description | User Benefit |
|---|---|---|
| Semantic Search | Natural language activity search | "art classes for shy 5-year-old" |
| Review Summaries | AI-generated summary of reviews | Quick decision making |
| Content Assistant | Help providers write descriptions | Better listings |
2. Stack Decisions
LLM Provider: Groq (US-based)
Why Groq:
- US-based company (data stays in US)
- Extremely fast inference (~500 tokens/sec)
- Generous free tier (14,400 requests/day)
- OpenAI-compatible API
- Llama 3.1 models (open, capable)
Rejected alternatives:
| Provider | Reason for rejection |
|---|---|
| DeepSeek API | China-based, data residency concerns |
| OpenAI | Expensive for MVP |
| Anthropic | Expensive for MVP |
| Self-hosted | Ops burden too high for MVP |
Embeddings Provider: Together AI (US-based)
Why Together AI:
- US-based company
- Dedicated embeddings API
- Competitive pricing ($0.008/M tokens)
- Wide model selection
Vector Database: pgvector
Why pgvector:
- Free (Postgres extension)
- Already using Postgres
- No additional infrastructure
- Good enough for MVP scale
- Stays in regional database (data residency)
Summary
┌─────────────────────────────────────────┐
│ MVP AI Stack │
├─────────────────────────────────────────┤
│ LLM: Groq (Llama 3.1 8B) │
│ Embeddings: Together AI │
│ Vector DB: pgvector (in Postgres) │
│ Cache: Redis │
└─────────────────────────────────────────┘
3. Architecture
High-Level Flow
┌─────────────────────────────────────────────────────────┐
│ Frontend │
│ (No direct AI calls - all via API) │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────┐
│ Regional API │
│ ┌─────────────────────────────────────────────────┐ │
│ │ AI Service Layer │ │
│ │ • Strips PII before external calls │ │
│ │ • Caches responses (cost control) │ │
│ │ • Rate limits per user │ │
│ │ • Falls back gracefully if AI unavailable │ │
│ └──────────────┬───────────────────┬──────────────┘ │
│ │ │ │
│ ┌──────────────▼──────┐ ┌─────────▼──────────────┐ │
│ │ Regional Postgres │ │ Redis Cache │ │
│ │ + pgvector │ │ (response cache) │ │
│ │ (embeddings) │ │ │ │
│ └──────────────────────┘ └────────────────────────┘ │
└─────────────────────┬───────────────────────────────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Groq │ │ Together │ │ Redis │
│ (US) │ │ (US) │ │ Cache │
│ LLM │ │ Embeddings│ │ │
└───── ─────┘ └──────────┘ └──────────┘
Request Flow: Semantic Search
1. User: "art classes for shy 5-year-old near Austin"
│
2. API receives request, checks cache
│
3. Cache miss → Generate embedding (Together AI)
│
4. Vector search in pgvector (top 20 matches)
│
5. Optional: LLM formats/ranks results (Groq)
│
6. Cache response (1 hour TTL)
│
7. Return results to user
Request Flow: Review Summary
1. User views activity page
│
2. API checks cache for summary
│
3. Cache miss → Fetch reviews from DB
│
4. Send reviews to Groq (anonymized)
│
5. Cache summary (24 hour TTL)
│
6. Return summary to user
4. Data Residency
PII Rules
NEVER send to external AI services:
├── User names
├── Email addresses
├── Phone numbers
├── Physical addresses
├── Child names or identifiable info
├── Payment information
└── Any data that identifies a specific person
OK to send:
├── Activity descriptions (public content)
├── Provider business descriptions (public)
├── Anonymized queries ("classes for 5-year-old")
├── Review text (with names stripped)
└── Help documentation
Regional Considerations
US Region:
├── Groq API (US servers) ✓
├── Together AI (US servers) ✓
└── pgvector in us-central1 ✓
India Region:
├── Groq API (US servers) - OK for non-PII
├── Together AI (US servers) - OK for non-PII
└── pgvector in asia-south1 ✓ (embeddings stay in region)
PII Stripping Implementation
function stripPII(text: string): string {
return text
// Remove emails
.replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL]')
// Remove phone numbers
.replace(/(\+\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g, '[PHONE]')
// Remove names (if tagged)
.replace(/\[NAME:.*?\]/g, '[NAME]')
}
5. What Gets Embedded
| Content | Embed? | Reason |
|---|---|---|
| Activity title + description | Yes | Semantic search |
| Provider business description | Yes | Provider discovery |
| Help docs / FAQs | Yes | Support Q&A |
| Category names | Yes | Search enhancement |
| Reviews | No | Query by activity_id, summarize on-demand |
| User data | Never | PII - no embeddings |
| Booking data | Never | PII - no embeddings |
Embedding Update Strategy
When to re-embed:
├── Activity created → Embed immediately
├── Activity description updated → Re-embed
├── Provider description updated → Re-embed
└── Batch job: Re-embed all weekly (catch drift)
When NOT to re-embed:
├── Price changes (not in embedding)
├── Schedule changes
├── Booking counts
└── Review additions (reviews not embedded)
6. Database Schema
pgvector Setup
-- Enable extension (both US and IN regions)
CREATE EXTENSION IF NOT EXISTS vector;
-- Activity embeddings
ALTER TABLE activities
ADD COLUMN IF NOT EXISTS embedding vector(768),
ADD COLUMN IF NOT EXISTS embedding_updated_at TIMESTAMP;
-- Provider embeddings
ALTER TABLE providers
ADD COLUMN IF NOT EXISTS embedding vector(768),
ADD COLUMN IF NOT EXISTS embedding_updated_at TIMESTAMP;
-- Help docs table
CREATE TABLE IF NOT EXISTS help_docs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug VARCHAR(255) UNIQUE NOT NULL,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
category VARCHAR(100), -- 'parent' | 'provider' | 'general'
embedding vector(768),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Vector indexes (IVFFlat - good balance of speed/accuracy)
CREATE INDEX IF NOT EXISTS idx_activities_embedding
ON activities USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
CREATE INDEX IF NOT EXISTS idx_providers_embedding
ON providers USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 50);
CREATE INDEX IF NOT EXISTS idx_help_docs_embedding
ON help_docs USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 20);
Prisma Schema Addition
model Activity {
// ... existing fields
embedding Unsupported("vector(768)")?
embeddingUpdatedAt DateTime? @map("embedding_updated_at")
}
model Provider {
// ... existing fields
embedding Unsupported("vector(768)")?
embeddingUpdatedAt DateTime? @map("embedding_updated_at")
}
model HelpDoc {
id String @id @default(uuid())
slug String @unique
title String
content String
category String?
embedding Unsupported("vector(768)")?
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
@@map("help_docs")
}
7. API Endpoints
AI Routes
POST /v1/ai/search # Semantic activity search
GET /v1/ai/activities/:id/summary # Review summary
POST /v1/ai/assist/description # Provider content help
Request/Response Formats
Semantic Search
// POST /v1/ai/search
// Request
{
"query": "art classes for shy 5-year-old",
"filters": {
"city": "Austin",
"ageMin": 4,
"ageMax": 6
},
"limit": 20
}
// Response
{
"success": true,
"data": {
"activities": [
{
"id": "uuid",
"title": "Creative Art Explorers",
"description": "...",
"relevanceScore": 0.92
}
],
"query_understanding": "Looking for art activities suitable for a 5-year-old who may be introverted"
}
}
Review Summary
// GET /v1/ai/activities/:id/summary
// Response
{
"success": true,
"data": {
"summary": "Parents consistently praise the patient instructors and small class sizes. Great for beginners.",
"highlights": ["Patient instructors", "Small classes", "Beginner-friendly"],
"reviewCount": 24,
"cachedAt": "2026-01-08T12:00:00Z"
}
}
Content Assistant
// POST /v1/ai/assist/description
// Request
{
"draft": "We teach kids art. Classes on Saturday.",
"activityType": "art",
"tone": "friendly"
}
// Response
{
"success": true,
"data": {
"improved": "Spark your child's creativity in our Saturday art classes! Our experienced instructors guide young artists through fun projects...",
"suggestions": [
"Add age range",
"Mention class size",
"Include what materials are provided"
]
}
}
Rate Limits
| Endpoint | Limit | Window |
|---|---|---|
| /ai/search | 20 | per minute |
| /ai/summary | 10 | per minute |
| /ai/assist | 5 | per minute |
8. Implementation
AI Service Layer
// src/services/ai/index.ts
export interface AIProvider {
complete(params: CompletionParams): Promise<string>
}
export interface EmbeddingProvider {
embed(text: string): Promise<number[]>
embedBatch(texts: string[]): Promise<number[][]>
}
// Factory - swap providers easily
export function createAIProvider(): AIProvider {
return new GroqProvider()
}
export function createEmbeddingProvider(): EmbeddingProvider {
return new TogetherEmbeddingProvider()
}
Groq Provider
// src/services/ai/providers/groq.ts
const GROQ_BASE = 'https://api.groq.com/openai/v1'
export class GroqProvider implements AIProvider {
async complete(params: CompletionParams): Promise<string> {
const res = await fetch(`${GROQ_BASE}/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${config.groq.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'llama-3.1-8b-instant',
messages: [
{ role: 'system', content: params.system },
{ role: 'user', content: params.prompt },
],
max_tokens: params.maxTokens ?? 500,
temperature: params.temperature ?? 0.7,
}),
})
if (!res.ok) {
throw new AIProviderError('Groq API error', res.status)
}
const data = await res.json()
return data.choices[0].message.content
}
}
Together Embeddings Provider
// src/services/ai/providers/together.ts
const TOGETHER_BASE = 'https://api.together.xyz/v1'
export class TogetherEmbeddingProvider implements EmbeddingProvider {
async embed(text: string): Promise<number[]> {
const res = await fetch(`${TOGETHER_BASE}/embeddings`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${config.together.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'togethercomputer/m2-bert-80M-8k-retrieval',
input: text,
}),
})
const data = await res.json()
return data.data[0].embedding
}
async embedBatch(texts: string[]): Promise<number[][]> {
const res = await fetch(`${TOGETHER_BASE}/embeddings`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${config.together.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'togethercomputer/m2-bert-80M-8k-retrieval',
input: texts,
}),
})
const data = await res.json()
return data.data.map((d: any) => d.embedding)
}
}
Semantic Search Service
// src/services/ai/search.ts
export async function semanticActivitySearch(
query: string,
filters?: SearchFilters
): Promise<SearchResult[]> {
// 1. Check cache
const cacheKey = `search:${hash(query, filters)}`
const cached = await redis.get(cacheKey)
if (cached) return JSON.parse(cached)
// 2. Strip PII from query
const safeQuery = stripPII(query)
// 3. Generate embedding
const embedding = await embeddingProvider.embed(safeQuery)
// 4. Vector search with filters
const results = await db.$queryRaw<SearchResult[]>`
SELECT
id, title, description, category,
price_cents, age_min, age_max, city,
1 - (embedding <=> ${embedding}::vector) AS relevance_score
FROM activities
WHERE status = 'published'
AND embedding IS NOT NULL
${filters?.city ? Prisma.sql`` : Prisma.empty}
${filters?.ageMin ? Prisma.sql`` : Prisma.empty}
${filters?.ageMax ? Prisma.sql`` : Prisma.empty}
ORDER BY embedding <=> ${embedding}::vector
LIMIT ${filters?.limit ?? 20}
`
// 5. Cache results (1 hour)
await redis.set(cacheKey, JSON.stringify(results), 'EX', 3600)
return results
}
Graceful Degradation
// Always have fallback when AI is unavailable
export async function searchActivities(query: string, filters?: SearchFilters) {
try {
// Try semantic search first
return await semanticActivitySearch(query, filters)
} catch (error) {
logger.warn('AI search failed, falling back to text search', { error })
// Fallback to basic text search
return await textSearchActivities(query, filters)
}
}
9. Cost Estimates
Per-Operation Costs
| Operation | Provider | Cost |
|---|---|---|
| LLM (Llama 3.1 8B) | Groq | $0.05/M input, $0.08/M output |
| LLM (Llama 3.1 70B) | Groq | $0.59/M input, $0.79/M output |
| Embeddings | Together | $0.008/M tokens |
| Vector search | pgvector | $0 (in Postgres) |
Monthly Estimates
| Scale | AI Searches | Summaries | Assists | Total Cost |
|---|---|---|---|---|
| 1K users | 5K | 1K | 500 | ~$5 |
| 10K users | 50K | 10K | 2K | ~$20-30 |
| 50K users | 250K | 50K | 10K | ~$100-150 |
| 100K users | 500K | 100K | 20K | ~$200-300 |
Cost Control Measures
- Aggressive caching - Same query = cached response (1-24h TTL)
- Rate limiting - Per-user limits prevent abuse
- Smaller models first - Use 8B model, escalate to 70B only if needed
- Embedding batching - Batch embed new activities (cheaper)
- Response length limits - Cap max_tokens
10. Phased Rollout
Phase 1: MVP (Now)
Stack:
├── Groq API (Llama 3.1 8B)
├── Together AI (embeddings)
├── pgvector (in regional Postgres)
└── Redis (response cache)
Features:
├── Semantic search (basic)
├── Review summaries (cached)
└── Provider content assistant
Cost: ~$20-30/month
Phase 2: Enhanced (When needed)
Additions:
├── Larger model for complex queries (Llama 3.1 70B)
├── Query understanding / intent detection
├── Search result re-ranking
└── A/B testing framework for AI features
Trigger: User feedback indicates search quality issues
Phase 3: Scale (500K+ users)
Options:
├── Self-hosted models (if cost > $500/month)
│ ├── Ollama + Llama 3.1 on GCP
│ ├── One GPU per region
│ └── ~$1,000/month total
│
└── Or stay on APIs (if cost-effective)
└── Volume discounts from providers
Phase 4: Advanced (Future)
Consider when product-market fit proven:
├── Custom fine-tuned models
├── Real-time personalization
├── Recommendation engine
├── Dynamic pricing
└── Voice/vision AI
11. Engineering Rules
Must Do
- Never send PII to external AI services
- Always have fallback - Platform works without AI
- Cache aggressively - Identical queries return cached responses
- Rate limit all AI endpoints - Prevent abuse and cost overruns
- Log AI usage - Track costs, latency, error rates
- Abstract providers - Easy to swap Groq ↔ Together ↔ self-hosted
Must Not Do
- Never store AI responses as source of truth - AI can hallucinate
- Never make AI required for core flows - Booking, payment must work without AI
- Never expose API keys to frontend - All AI calls go through backend
- Never skip PII stripping - Even for "internal" calls
- Never trust AI for safety-critical decisions - Human review required
Monitoring
// Track these metrics
const aiMetrics = {
// Latency
'ai.search.latency_ms': histogram,
'ai.summary.latency_ms': histogram,
// Costs
'ai.tokens.input': counter,
'ai.tokens.output': counter,
// Errors
'ai.errors.provider': counter,
'ai.errors.timeout': counter,
// Cache
'ai.cache.hit': counter,
'ai.cache.miss': counter,
}
Environment Variables
# AI - LLM (Groq)
GROQ_API_KEY=gsk_...
# AI - Embeddings (Together AI)
TOGETHER_API_KEY=...
# AI - Feature flags
AI_SEARCH_ENABLED=true
AI_SUMMARIES_ENABLED=true
AI_ASSIST_ENABLED=true
# AI - Rate limits
AI_SEARCH_RATE_LIMIT=20
AI_SUMMARY_RATE_LIMIT=10
AI_ASSIST_RATE_LIMIT=5
Related Documents
- API Architecture - Backend API design
- AI Components - Frontend AI component specs