Caura: From Seven Questions to a Watchlist — My CS50 Final Project
1/1/20252 min read
I built Caura in 2022 for my Harvard cs50 course final project, a quiz-driven movie/TV recommender. Users answer seven questions. Caura fetches, cleans, and caches IMDb data to deliver fast recommendations with trailers, ratings, and a Wikipedia link. Accounts support secure password resets, profile edits, and deletion. It’s mobile-first and API-quota-aware.
Demo video: Scroll down, you will eventually find it.
Why I built it
Streaming paralysis is real. People spend minutes scrolling and still watch nothing. I wanted a zero-to-recommendation flow that feels like a conversation, not a search. CS50 was the perfect forcing function to ship something end-to-end: auth, UI, data pipelines, and production-style constraints like rate limits and latency.
What the user experiences
Create account with username, email, password.
Password reset flow is simple: provide email + new password, verify code, done.
Add IMDb API key via a guided tooltip.
Answer seven questions. Each question has a quick “?” explainer.
Get recommendations you can cycle through with Get another recommendation.
Each item shows release date, plot, rating, trailer, and a Wikipedia link for deeper context.
Profile management: change username, email, API key, and password. Passwords are encrypted.
Quick access bar: top genres, “best of the best”, title search.
Mobile-ready UI.
Delete account / log out in one tap.
Architecture at a glance
[Client] → [Flask API] → [IMDb API]
│ │ │
│ ├─ Clean/normalize metadata
│ ├─ Store in local SQL cache
│ └─ Serve responses from cache first
│
└─ UI renders details, trailer, actions
Stack:
Backend: Python (Flask)
Data: IMDb API → cleaned → local SQL cache
Frontend: HTML/CSS/JS + jQuery/AJAX
Auth & Profile: username/email/password, API key storage
Security: password hashing; minimal PII, GDPR-aware practices
The caching playbook: beating API quotas
IMDb’s free tier is tight. Each “get another recommendation” can trigger multiple calls. That does not scale.
Strategy:
First time a combination of quiz answers is requested, Caura fetches data from IMDb, cleans it, and stores it locally.
Subsequent users with similar tastes get results instantly from cache.
This cuts latency and preserves the daily API quota for genuinely new requests.
Result: fewer external calls, faster UX, and a dataset that improves as more users interact.
Data model and filtering
Normalization: titles, plots, genres, release dates, ratings, trailer links.
Filtering: basic quality gates to keep junk out.
Mapping: quiz answers → candidate sets → ranked list.
Ranking: rule-based for CS50 scope. Future work: learned scoring.
Security and privacy posture
Passwords are hashed. No plaintext.
Minimal PII. Email and API key stored with care.
User control. Edit profile or delete account anytime.
Operational caution. Avoid logging sensitive fields. Keep stack lean.
UX systems
Seven-question flow with inline help.
Single CTA: “Get another recommendation.”
Trailer inline to reduce tab-hopping.
Wikipedia “More info” to answer the “is this for me?” question fast.
Mobile-first layout for real-world usage.
What I learned shipping Caura
APIs are contracts, not firehoses. Respect quotas from day one. Caching is not a retrofit.
Latency drives trust. Sub-second responses feel “smart,” even with a simple rules engine.
Recovery flows matter. Password reset was small to build and big for perceived quality.
Explainability > mystery. Tooltips on the quiz reduce drop-offs.
Roadmap
Smarter ranking: train a lightweight model on interactions to personalize beyond rules.
Session prefetching: speculate the next two candidates.
Lists & sharing: save favorites and share “tonight’s pick.”
Observability: add metrics on cold vs hot path, cache hit rate, and SLA.
Rate-limit guardrails: background refresh jobs and exponential backoff.
Privacy hardening: secrets vaulting and structured redaction in logs.
Developer notes
Flask keeps the surface area small. Great for teaching clean routes and separation of concerns.
jQuery/AJAX is sufficient for CS50 scope. A SPA can come later.
Local SQL cache beats re-hydrating from the network on every click.
Schema discipline upfront makes data cleaning predictable.
Demo and code
Video walkthrough: https://www.youtube.com/watch?v=psAxxOfCzxQ
Code: request on email if you want a redacted educational copy.
Credits
Built for CS50x. Thanks to the course staff and community for the push to build something real.
