- Read the question carefully (twice).
- Break the task into the smallest steps.
- Sketch or write pseudocode before coding.
- Start small — test as you go.
- Check your solution with different cases.
Overview
Build a tiny URL-shortening service. Given a long URL, your program returns a short code (e.g., aZ3fQ9). When someone provides that short code, your program returns the original long URL.
Key constraint: No duplicate short codes may exist. If a code is already in use, your program must not reuse it for a different URL.
Learning objectives
You should be able to:
- Design simple data structures to map keys to values.
- Implement a collision-safe short-code generator.
- Validate and normalize user input (URLs).
- Write tests that verify uniqueness, idempotency, and lookup correctness.
Success criteria (what “done” looks like)
shorten(url)returns a 6–10 character short code and never creates a duplicate code.- Re-shortening the same URL returns the same code (idempotent behavior).
resolve(code)returns the original URL that was shortened.- Input validation: rejects empty strings and clearly invalid URLs.
- A test suite that passes: uniqueness under load, idempotency, lookup, and error cases.
Constraints & rules
- Codes are URL-safe: only
[A–Z][a–z][0–9]. - Codes must be unique. Handle collisions robustly.
- Keep state in memory (dictionary) for the core task.
Extension: add persistence with a simple file or SQLite. - Deterministic behavior for the same input (idempotency): if URL
Xwas shortened before, return the same code.
Starter design
Data structures
url_to_code: dict[str, str]– map long URL → short codecode_to_url: dict[str, str]– map short code → long URL
Core functions (implement these)
normalize_url(url: str) -> strgenerate_code(n: int = 6) -> strshorten(url: str) -> strresolve(code: str) -> str
Skeleton (Python)
Paste this into shortener.py. Do not change function names or signatures.
import re
import string
import random
from urllib.parse import urlparse
# In-memory stores
url_to_code: dict[str, str] = {}
code_to_url: dict[str, str] = {}
ALPHABET = string.ascii_letters + string.digits
CODE_LEN = 6
def normalize_url(url: str) -> str:
"""
Return a cleaned, canonical-looking URL or raise ValueError.
Rules:
- Must have scheme (http/https). If missing, assume https.
- Must have netloc (domain).
- Trim whitespace.
"""
if not isinstance(url, str):
raise ValueError("URL must be a string")
url = url.strip()
if not url:
raise ValueError("Empty URL")
# Prepend https:// if missing scheme
parsed = urlparse(url)
if not parsed.scheme:
url = "https://" + url
parsed = urlparse(url)
if parsed.scheme not in {"http", "https"}:
raise ValueError("Unsupported scheme")
if not parsed.netloc:
raise ValueError("Missing domain")
# Optional: lowercase scheme + host; keep path/query as-is
canonical = f"{parsed.scheme.lower()}://{parsed.netloc.lower()}{parsed.path}"
if parsed.params:
canonical += f";{parsed.params}"
if parsed.query:
canonical += f"?{parsed.query}"
if parsed.fragment:
canonical += f"#{parsed.fragment}"
return canonical
def generate_code(n: int = CODE_LEN) -> str:
"""
Generate a random code using ALPHABET. Collision-safe logic
is enforced in shorten() (this function only makes a candidate).
"""
return "".join(random.choices(ALPHABET, k=n))
def shorten(url: str) -> str:
"""
Return a short code for the given URL.
Requirements:
- Idempotent: the same normalized URL gets the same code.
- No duplicate codes: if a generated code exists, regenerate.
"""
normalized = normalize_url(url)
# Idempotency: if URL already shortened, return existing code
if normalized in url_to_code:
return url_to_code[normalized]
# Generate until an unused code is found (collision handling)
# This loop guarantees uniqueness even under rare collisions.
attempts = 0
while True:
attempts += 1
if attempts > 10_000:
raise RuntimeError("Too many collisions; increase code length.")
candidate = generate_code(CODE_LEN)
if candidate not in code_to_url:
# Reserve the code
code_to_url[candidate] = normalized
url_to_code[normalized] = candidate
return candidate
def resolve(code: str) -> str:
"""
Given a short code, return the original URL.
Raise KeyError if the code does not exist.
"""
if not isinstance(code, str) or not code:
raise ValueError("Code must be a non-empty string")
if not re.fullmatch(r"[A-Za-z0-9]+", code):
raise ValueError("Invalid code characters")
return code_to_url<pre><code>
Manual usage example (REPL)
>>> from shortener import shorten, resolve
>>> c = shorten("https://www.example.com/some/long/path?x=1")
>>> c
'aZ3fQ9' # example
>>> resolve(c)
'https://www.example.com/some/long/path?x=1'
>>> shorten("www.example.com/some/long/path?x=1") # idempotent after normalization
'aZ3fQ9'
Tests (save as test_shortener.py)
Run with python -m pytest -q or python -m unittest discover.
import re
from shortener import shorten, resolve, url_to_code, code_to_url, normalize_url, CODE_LEN
def test_normalization_adds_scheme_and_lowers_host():
u = normalize_url("Example.com/Path")
assert u.startswith("https://")
assert "example.com" in u
def test_shorten_returns_fixed_length_alnum_code():
code = shorten("https://example.com/alpha")
assert len(code) == CODE_LEN
assert re.fullmatch(r"[A-Za-z0-9]{%d}" % CODE_LEN, code)
def test_idempotent_same_url_same_code():
a = shorten("https://example.com/idempotent")
b = shorten("example.com/idempotent")
assert a == b
def test_resolve_round_trip():
code = shorten("https://example.com/roundtrip")
assert resolve(code) == "https://example.com/roundtrip"
def test_reject_resolve_unknown_code():
import pytest
with pytest.raises(KeyError):
resolve("NoSuchCode123")
def test_reject_invalid_inputs():
import pytest
with pytest.raises(ValueError):
normalize_url("")
with pytest.raises(ValueError):
resolve("bad-code!") # invalid characters
def test_uniqueness_under_load():
# Create many URLs; ensure all codes are unique
codes = set()
for i in range(2000):
url = f"https://example.org/resource/{i}"
code = shorten(url)
assert code not in codes
codes.add(code)
Stretch goals (optional)
- Deterministic codes (Base62 of an auto-increment id):
Maintain a counternext_id, encode it in Base62 for the code. Guarantees uniqueness without random collisions. - Custom code support:
Allow users to request a custom code (validate and ensure it’s unused). - Persistence:
Saveurl_to_codeandcode_to_urlto a JSON file on exit and load on start, or use SQLite with a unique index oncode. - CLI or Web API:
- CLI:
python cli.py shorten <url>andpython cli.py resolve <code>. - Web: tiny Flask app with
POST /shortenandGET /<code>; enforce a unique DB constraint oncode.
- CLI:
Hints & guidance
- Uniqueness enforcement lives at the point of insertion. Even with random codes, a
whileloop plus a set/dict membership check prevents duplicates. - Idempotency reduces storage and improves UX; normalize before looking up or inserting.
- Length vs collisions: Longer codes reduce collision probability. If you ever hit many collisions, increase
CODE_LEN. - Validation: Use
urllib.parse.urlparseto check scheme and netloc; be strict and clear in error messages.