Skip to content

Design Decisions

Key architectural decisions and the rationale behind them.

1. Microservices Architecture

Decision: Separate API (Node.js) from workers (Python)

Rationale: - Language choice: Node.js excels at I/O and HTTP, Python excels at ML/AI - Independent scaling: API can scale separately from workers - Technology isolation: Don't force Python ML code to run on JavaScript - Team efficiency: Teams can work independently on their domain

Trade-offs: - ✅ Better technology match - ✅ Independent scaling - ❌ More operational complexity - ❌ Network latency between services

Alternatives considered: - Monolith (faster to build, harder to scale) - Kubernetes (overkill for initial scale)


2. Redis Streams for Messaging

Decision: Use Redis Streams for async job queue instead of RabbitMQ or Kafka

Rationale: - Lightweight: No separate infrastructure, uses existing Redis - Fast: Sub-millisecond latency for job pickup - Simple: Consumer groups built-in for job distribution - Dual-use: Can also stream real-time events to clients

Trade-offs: - ✅ Simple and fast - ✅ Real-time event support - ❌ Not suitable for massive scale (billions of messages) - ❌ All data in memory (need persistence config)

Alternatives considered: - RabbitMQ (more features, more overhead) - Kafka (designed for scale, too complex initially) - In-memory queue (no durability)


3. PostgreSQL for Primary Storage

Decision: Use PostgreSQL as single source of truth instead of NoSQL

Rationale: - ACID compliance: Transactions ensure data consistency - Flexible schema: JSONB columns for semi-structured data - Query power: Complex queries for analytics - Proven: Battle-tested in production at scale

Trade-offs: - ✅ Strong consistency guarantees - ✅ Rich query language - ❌ Scaling horizontally is harder - ❌ Schema changes require migrations

Alternatives considered: - MongoDB (more flexible, weaker consistency) - DynamoDB (serverless, vendor lock-in) - SQLite (simpler, not suitable for multi-user)


4. JWT for Authentication

Decision: Use JWT tokens instead of session cookies

Rationale: - Stateless: API servers don't need session storage - Scalable: Each request is independent - Cross-origin: Works with CORS and mobile apps - Revokable: Token blacklist in Redis if needed

Trade-offs: - ✅ Scales to many API servers - ✅ Works with mobile/SPA clients - ❌ Token size increases request size - ❌ Can't revoke immediately without Redis check

Alternatives considered: - Server sessions (requires sticky sessions) - OAuth2 (more complex, better for third-party) - API keys (less secure, good for server-to-server)


5. Vercel for Hosting

Decision: Deploy to Vercel instead of self-managed servers

Rationale: - Zero-ops: No server management needed - Global CDN: Automatic edge caching - Preview deployments: Test changes before production - Easy integration: Git-based deployments

Trade-offs: - ✅ Minimal ops overhead - ✅ Excellent for static/Next.js sites - ❌ Limited to their runtime (Node.js, Python) - ❌ Cold start latency (can be mitigated)

Alternatives considered: - AWS EC2 (full control, more ops work) - Fly.io (similar benefits, less ecosystem) - Google Cloud Run (good option, Vercel simpler)


6. MkDocs for Documentation

Decision: Use MkDocs + Material theme instead of custom site

Rationale: - Content-first: Focus on documentation, not tooling - GitHub-native: Markdown in version control - Search built-in: Full-text search out of the box - Mobile-responsive: Theme handles all devices - Fast: Static site generation, CDN ready

Trade-offs: - ✅ Simple to maintain - ✅ Version-controlled content - ❌ Limited customization without code - ❌ Can't easily add interactive features

Alternatives considered: - Docusaurus (more flexible, React-based) - Sphinx (powerful, Python ecosystem) - Custom Next.js site (full control, maintenance burden)


7. Rate Limiting per User

Decision: Implement rate limiting at 100 requests/minute per user

Rationale: - Fair usage: Prevents single user from monopolizing resources - Cost control: Limits runaway API calls - DDoS mitigation: Makes attacks less effective - Predictable: Users know their limits upfront

Trade-offs: - ✅ Prevents abuse - ✅ Controls costs - ❌ Some users may hit limits legitimately - ❌ Requires monitoring to adjust

Decision criteria: - Measured actual usage patterns - Set limit 10x typical heavy user - Higher limit for premium tiers planned


8. Document Processing Pipeline

Decision: Queue-based async processing instead of synchronous

Rationale: - Long-running: Document processing can take 5-30 seconds - Scalability: Workers can process in parallel - Reliability: Failed jobs can be retried - User experience: Immediate response to user, progress updated async

Trade-offs: - ✅ Better UX (instant response) - ✅ Scales to many concurrent requests - ❌ More complex (eventual consistency) - ❌ Webhook/polling needed for results

Alternatives considered: - Synchronous processing (simpler, slower) - Timeout with result polling (less elegant)


9. Full-Text Search in PostgreSQL

Decision: Use PostgreSQL full-text search instead of Elasticsearch

Rationale: - Simpler ops: One less service to manage - Adequate performance: Sufficient for most use cases - Cost: No separate search infrastructure - Consistency: Results always match database state

Trade-offs: - ✅ Minimal infrastructure - ✅ Always consistent - ❌ Scaling search independently is harder - ❌ Less powerful than Elasticsearch

When to revisit: - If search latency becomes issue - If need faceted search / analytics - If > 100M documents


10. Environment-Based Configuration

Decision: Use environment variables for all configuration

Rationale: - Security: Secrets not in code - Flexibility: Change behavior without code changes - Standard: Industry best practice (12-factor app) - Container-friendly: Works well with Docker

Configuration levels:

.env (local development)
.env.test (test environment)
.env.staging (staging environment)
GitHub Secrets (production)

Trade-offs: - ✅ Secure by default - ✅ Follows best practices - ❌ More environment setup needed - ❌ Easy to forget to set required vars


Summary Table

Decision Choice Alternative Reason
Architecture Microservices Monolith Tech variety, independent scaling
Messaging Redis Streams RabbitMQ/Kafka Simple, fast, dual-use
Database PostgreSQL MongoDB/DynamoDB ACID, query power, proven scale
Auth JWT Sessions/OAuth Stateless, scalable
Hosting Vercel AWS/GCP Zero-ops, git integration
Docs MkDocs Custom/Docusaurus Content-first, simple
Rate Limit Per-user/min Per-IP/hour Fair, user-based
Processing Async queue Sync Better UX, scalability
Search PostgreSQL FTS Elasticsearch Simpler ops
Config Env vars Files Security, standard

Evolution Plan

These decisions are good for the current stage (MVP → early scale). As the product grows:

Month 1-3 (Current) - Current architecture works well - Monitor performance metrics - Adjust rate limits based on real usage

Month 3-6 (Early Traction) - Consider caching layer if database queries slow - Possibly add Elasticsearch if search latency becomes issue - Load test with 1000 concurrent users

Month 6-12 (Scaling) - Move to Kubernetes if horizontal scaling critical - Consider managed database (RDS) if ops become burden - Add CDN for large file distribution

Year 1+ (Scale) - Multi-region deployment - Separate read/write database replicas - Advanced caching strategy