Design Decisions¶

Key architectural decisions and the rationale behind them.

1. Microservices Architecture¶

Decision: Separate API (Node.js) from workers (Python)

Rationale: - Language choice: Node.js excels at I/O and HTTP, Python excels at ML/AI - Independent scaling: API can scale separately from workers - Technology isolation: Don't force Python ML code to run on JavaScript - Team efficiency: Teams can work independently on their domain

Trade-offs: - ✅ Better technology match - ✅ Independent scaling - ❌ More operational complexity - ❌ Network latency between services

Alternatives considered: - Monolith (faster to build, harder to scale) - Kubernetes (overkill for initial scale)

2. Redis Streams for Messaging¶

Decision: Use Redis Streams for async job queue instead of RabbitMQ or Kafka

Rationale: - Lightweight: No separate infrastructure, uses existing Redis - Fast: Sub-millisecond latency for job pickup - Simple: Consumer groups built-in for job distribution - Dual-use: Can also stream real-time events to clients

Trade-offs: - ✅ Simple and fast - ✅ Real-time event support - ❌ Not suitable for massive scale (billions of messages) - ❌ All data in memory (need persistence config)

Alternatives considered: - RabbitMQ (more features, more overhead) - Kafka (designed for scale, too complex initially) - In-memory queue (no durability)

3. PostgreSQL for Primary Storage¶

Decision: Use PostgreSQL as single source of truth instead of NoSQL

Rationale: - ACID compliance: Transactions ensure data consistency - Flexible schema: JSONB columns for semi-structured data - Query power: Complex queries for analytics - Proven: Battle-tested in production at scale

Trade-offs: - ✅ Strong consistency guarantees - ✅ Rich query language - ❌ Scaling horizontally is harder - ❌ Schema changes require migrations

Alternatives considered: - MongoDB (more flexible, weaker consistency) - DynamoDB (serverless, vendor lock-in) - SQLite (simpler, not suitable for multi-user)

4. JWT for Authentication¶

Decision: Use JWT tokens instead of session cookies

Rationale: - Stateless: API servers don't need session storage - Scalable: Each request is independent - Cross-origin: Works with CORS and mobile apps - Revokable: Token blacklist in Redis if needed

Trade-offs: - ✅ Scales to many API servers - ✅ Works with mobile/SPA clients - ❌ Token size increases request size - ❌ Can't revoke immediately without Redis check

Alternatives considered: - Server sessions (requires sticky sessions) - OAuth2 (more complex, better for third-party) - API keys (less secure, good for server-to-server)

5. Vercel for Hosting¶

Decision: Deploy to Vercel instead of self-managed servers

Rationale: - Zero-ops: No server management needed - Global CDN: Automatic edge caching - Preview deployments: Test changes before production - Easy integration: Git-based deployments

Trade-offs: - ✅ Minimal ops overhead - ✅ Excellent for static/Next.js sites - ❌ Limited to their runtime (Node.js, Python) - ❌ Cold start latency (can be mitigated)

Alternatives considered: - AWS EC2 (full control, more ops work) - Fly.io (similar benefits, less ecosystem) - Google Cloud Run (good option, Vercel simpler)

6. MkDocs for Documentation¶

Decision: Use MkDocs + Material theme instead of custom site

Rationale: - Content-first: Focus on documentation, not tooling - GitHub-native: Markdown in version control - Search built-in: Full-text search out of the box - Mobile-responsive: Theme handles all devices - Fast: Static site generation, CDN ready

Trade-offs: - ✅ Simple to maintain - ✅ Version-controlled content - ❌ Limited customization without code - ❌ Can't easily add interactive features

Alternatives considered: - Docusaurus (more flexible, React-based) - Sphinx (powerful, Python ecosystem) - Custom Next.js site (full control, maintenance burden)

7. Rate Limiting per User¶

Decision: Implement rate limiting at 100 requests/minute per user

Rationale: - Fair usage: Prevents single user from monopolizing resources - Cost control: Limits runaway API calls - DDoS mitigation: Makes attacks less effective - Predictable: Users know their limits upfront

Trade-offs: - ✅ Prevents abuse - ✅ Controls costs - ❌ Some users may hit limits legitimately - ❌ Requires monitoring to adjust

Decision criteria: - Measured actual usage patterns - Set limit 10x typical heavy user - Higher limit for premium tiers planned

8. Document Processing Pipeline¶

Decision: Queue-based async processing instead of synchronous

Rationale: - Long-running: Document processing can take 5-30 seconds - Scalability: Workers can process in parallel - Reliability: Failed jobs can be retried - User experience: Immediate response to user, progress updated async

Trade-offs: - ✅ Better UX (instant response) - ✅ Scales to many concurrent requests - ❌ More complex (eventual consistency) - ❌ Webhook/polling needed for results

Alternatives considered: - Synchronous processing (simpler, slower) - Timeout with result polling (less elegant)

9. Full-Text Search in PostgreSQL¶

Decision: Use PostgreSQL full-text search instead of Elasticsearch

Rationale: - Simpler ops: One less service to manage - Adequate performance: Sufficient for most use cases - Cost: No separate search infrastructure - Consistency: Results always match database state

Trade-offs: - ✅ Minimal infrastructure - ✅ Always consistent - ❌ Scaling search independently is harder - ❌ Less powerful than Elasticsearch

When to revisit: - If search latency becomes issue - If need faceted search / analytics - If > 100M documents

10. Environment-Based Configuration¶

Decision: Use environment variables for all configuration

Rationale: - Security: Secrets not in code - Flexibility: Change behavior without code changes - Standard: Industry best practice (12-factor app) - Container-friendly: Works well with Docker

Configuration levels:

.env (local development)
.env.test (test environment)
.env.staging (staging environment)
GitHub Secrets (production)

Trade-offs: - ✅ Secure by default - ✅ Follows best practices - ❌ More environment setup needed - ❌ Easy to forget to set required vars

Summary Table¶

Decision	Choice	Alternative	Reason
Architecture	Microservices	Monolith	Tech variety, independent scaling
Messaging	Redis Streams	RabbitMQ/Kafka	Simple, fast, dual-use
Database	PostgreSQL	MongoDB/DynamoDB	ACID, query power, proven scale
Auth	JWT	Sessions/OAuth	Stateless, scalable
Hosting	Vercel	AWS/GCP	Zero-ops, git integration
Docs	MkDocs	Custom/Docusaurus	Content-first, simple
Rate Limit	Per-user/min	Per-IP/hour	Fair, user-based
Processing	Async queue	Sync	Better UX, scalability
Search	PostgreSQL FTS	Elasticsearch	Simpler ops
Config	Env vars	Files	Security, standard

Evolution Plan¶

These decisions are good for the current stage (MVP → early scale). As the product grows:

Month 1-3 (Current) - Current architecture works well - Monitor performance metrics - Adjust rate limits based on real usage

Month 3-6 (Early Traction) - Consider caching layer if database queries slow - Possibly add Elasticsearch if search latency becomes issue - Load test with 1000 concurrent users

Month 6-12 (Scaling) - Move to Kubernetes if horizontal scaling critical - Consider managed database (RDS) if ops become burden - Add CDN for large file distribution

Year 1+ (Scale) - Multi-region deployment - Separate read/write database replicas - Advanced caching strategy

System Design - How decisions are implemented
Data Model - Database structure
Architecture Overview - High-level view