- Read the question carefully (twice).
- Break the task into the smallest steps.
- Sketch or write pseudocode before coding.
- Start small — test as you go.
- Check your solution with different cases.
Overview
Write a function that takes a sentence (string) and returns the shortest word it contains.
- If there’s a tie, return the first shortest word that appears.
- Ignore leading/trailing punctuation around words (e.g., commas, periods).
- Treat internal apostrophes as part of a word (e.g.,
don'tis one word). - Treat hyphens as separators (e.g.,
well-known→well,known).
Learning objectives
You should be able to:
- Split and clean text reliably.
- Handle edge cases (punctuation, multiple spaces, ties, empty input).
- Write small, testable functions.
Success criteria
shortest_word(sentence)returns a single string: the first shortest word.- Works with arbitrary spacing and punctuation.
- Raises
ValueErrorfor empty/whitespace-only input or when no words are found. - Includes tests that pass.
Constraints & rules
- Do not use external libraries.
- Words are sequences of letters with optional internal apostrophes.
- Hyphens split words.
- Case-insensitive for measuring length, but return the original casing of the chosen word.
Starter design
Function to implement
def shortest_word(sentence: str) -> str:
"""Return the first shortest word in the sentence.
Rules:
- Words are letter sequences possibly containing internal apostrophes.
- Hyphens split words into separate tokens.
- Leading/trailing punctuation is ignored.
- If multiple words share the minimum length, return the first one.
- Raise ValueError if no valid words exist.
"""
...
Skeleton (Python) — shortest.py
import re
_WORD_RE = re.compile(r"[A-Za-z]+(?:'[A-Za-z]+)?")
def _tokenize(sentence: str) -> list[str]:
"""
Split on hyphens first, then find word-like tokens:
- A word is letters with an optional internal apostrophe block, e.g., don't, it's.
- Leading/trailing punctuation is ignored by the regex.
"""
parts = []
for chunk in sentence.replace("-", " ").split():
parts.extend(_WORD_RE.findall(chunk))
return parts
def shortest_word(sentence: str) -> str:
if not isinstance(sentence, str):
raise ValueError("Input must be a string")
sentence = sentence.strip()
if not sentence:
raise ValueError("Empty sentence")
words = _tokenize(sentence)
if not words:
raise ValueError("No words found")
# Find the first word with minimal length (case-insensitive for length)
min_len = None
answer = None
for w in words:
L = len(w)
if min_len is None or L < min_len:
min_len = L
answer = w
return answer
Quick manual examples
>>> from shortest import shortest_word
>>> shortest_word("The quick brown fox.")
'The'
>>> shortest_word("An, apple; a day!")
'a'
>>> shortest_word("It's time to test.")
"to"
>>> shortest_word("A well-known fact.")
'A'
Tests — test_shortest.py
Run with python -m pytest -q (or python -m unittest discover if you prefer).
import pytest
from shortest import shortest_word
def test_basic():
assert shortest_word("The quick brown fox.") == "The"
def test_tie_returns_first():
# 'An' and 'a' both present; first shortest is 'a'
assert shortest_word("An, apple; a day!") == "a"
def test_apostrophes_kept_inside():
assert shortest_word("Don't stop believing") == "stop" # stop(4) vs Don't(5)
def test_hyphen_splits():
assert shortest_word("A well-known fact") == "A"
def test_whitespace_only_raises():
with pytest.raises(ValueError):
shortest_word(" \t ")
def test_no_words_raises():
with pytest.raises(ValueError):
shortest_word("... --- !!!")
def test_mixed_punctuation():
assert shortest_word("Hello, world!!! This—is—a—test.") == "a"
def test_case_irrelevant_for_length():
assert shortest_word("BIG small TINY") == "BIG" # first with min length 3
Stretch goals (optional)
- Unicode letters: Support non-ASCII letters (é, ł, ñ). Hint: use
regexcharacter classes like\p{L}if allowed, or Python’srewith a broader approach viaunicodedata. - Return all shortest words: Change the API to return a list of all shortest words in order of appearance.
- Word index: Also return the start index (character position) of the chosen word.
- Ignore stopwords: Skip very common words (
a,the,and) when choosing the shortest.