Caura: From Seven Questions to a Watchlist — My CS50 Final Project

1/1/20252 min read

I built Caura in 2022 for my Harvard cs50 course final project, a quiz-driven movie/TV recommender. Users answer seven questions. Caura fetches, cleans, and caches IMDb data to deliver fast recommendations with trailers, ratings, and a Wikipedia link. Accounts support secure password resets, profile edits, and deletion. It’s mobile-first and API-quota-aware.
Demo video: Scroll down, you will eventually find it.

Why I built it

Streaming paralysis is real. People spend minutes scrolling and still watch nothing. I wanted a zero-to-recommendation flow that feels like a conversation, not a search. CS50 was the perfect forcing function to ship something end-to-end: auth, UI, data pipelines, and production-style constraints like rate limits and latency.

What the user experiences

  1. Create account with username, email, password.

  2. Password reset flow is simple: provide email + new password, verify code, done.

  3. Add IMDb API key via a guided tooltip.

  4. Answer seven questions. Each question has a quick “?” explainer.

  5. Get recommendations you can cycle through with Get another recommendation.

  6. Each item shows release date, plot, rating, trailer, and a Wikipedia link for deeper context.

  7. Profile management: change username, email, API key, and password. Passwords are encrypted.

  8. Quick access bar: top genres, “best of the best”, title search.

  9. Mobile-ready UI.

  10. Delete account / log out in one tap.

Architecture at a glance

[Client] → [Flask API] → [IMDb API]

│ │ │

│ ├─ Clean/normalize metadata

│ ├─ Store in local SQL cache

│ └─ Serve responses from cache first

└─ UI renders details, trailer, actions

Stack:

  • Backend: Python (Flask)

  • Data: IMDb API → cleaned → local SQL cache

  • Frontend: HTML/CSS/JS + jQuery/AJAX

  • Auth & Profile: username/email/password, API key storage

  • Security: password hashing; minimal PII, GDPR-aware practices

The caching playbook: beating API quotas

IMDb’s free tier is tight. Each “get another recommendation” can trigger multiple calls. That does not scale.

Strategy:

  • First time a combination of quiz answers is requested, Caura fetches data from IMDb, cleans it, and stores it locally.

  • Subsequent users with similar tastes get results instantly from cache.

  • This cuts latency and preserves the daily API quota for genuinely new requests.

Result: fewer external calls, faster UX, and a dataset that improves as more users interact.

Data model and filtering

  • Normalization: titles, plots, genres, release dates, ratings, trailer links.

  • Filtering: basic quality gates to keep junk out.

  • Mapping: quiz answers → candidate sets → ranked list.

  • Ranking: rule-based for CS50 scope. Future work: learned scoring.

Security and privacy posture

  • Passwords are hashed. No plaintext.

  • Minimal PII. Email and API key stored with care.

  • User control. Edit profile or delete account anytime.

  • Operational caution. Avoid logging sensitive fields. Keep stack lean.

UX systems

  • Seven-question flow with inline help.

  • Single CTA: “Get another recommendation.”

  • Trailer inline to reduce tab-hopping.

  • Wikipedia “More info” to answer the “is this for me?” question fast.

  • Mobile-first layout for real-world usage.

What I learned shipping Caura

  • APIs are contracts, not firehoses. Respect quotas from day one. Caching is not a retrofit.

  • Latency drives trust. Sub-second responses feel “smart,” even with a simple rules engine.

  • Recovery flows matter. Password reset was small to build and big for perceived quality.

  • Explainability > mystery. Tooltips on the quiz reduce drop-offs.

Roadmap

  • Smarter ranking: train a lightweight model on interactions to personalize beyond rules.

  • Session prefetching: speculate the next two candidates.

  • Lists & sharing: save favorites and share “tonight’s pick.”

  • Observability: add metrics on cold vs hot path, cache hit rate, and SLA.

  • Rate-limit guardrails: background refresh jobs and exponential backoff.

  • Privacy hardening: secrets vaulting and structured redaction in logs.

Developer notes

  • Flask keeps the surface area small. Great for teaching clean routes and separation of concerns.

  • jQuery/AJAX is sufficient for CS50 scope. A SPA can come later.

  • Local SQL cache beats re-hydrating from the network on every click.

  • Schema discipline upfront makes data cleaning predictable.

Demo and code

Credits

Built for CS50x. Thanks to the course staff and community for the push to build something real.