Problem Set 4: Shortest Word — Find the shortest word in a sentence

Read the question carefully (twice).
Break the task into the smallest steps.
Sketch or write pseudocode before coding.
Start small — test as you go.
Check your solution with different cases.

Overview

Write a function that takes a sentence (string) and returns the shortest word it contains.

If there’s a tie, return the first shortest word that appears.
Ignore leading/trailing punctuation around words (e.g., commas, periods).
Treat internal apostrophes as part of a word (e.g., don't is one word).
Treat hyphens as separators (e.g., well-known → well, known).

Learning objectives

You should be able to:

Split and clean text reliably.
Handle edge cases (punctuation, multiple spaces, ties, empty input).
Write small, testable functions.

Success criteria

shortest_word(sentence) returns a single string: the first shortest word.
Works with arbitrary spacing and punctuation.
Raises ValueError for empty/whitespace-only input or when no words are found.
Includes tests that pass.

Constraints & rules

Do not use external libraries.
Words are sequences of letters with optional internal apostrophes.
Hyphens split words.
Case-insensitive for measuring length, but return the original casing of the chosen word.

Starter design

Function to implement

def shortest_word(sentence: str) -> str:
    """Return the first shortest word in the sentence.

    Rules:
      - Words are letter sequences possibly containing internal apostrophes.
      - Hyphens split words into separate tokens.
      - Leading/trailing punctuation is ignored.
      - If multiple words share the minimum length, return the first one.
      - Raise ValueError if no valid words exist.
    """
    ...

Skeleton (Python) — `shortest.py`

import re

_WORD_RE = re.compile(r"[A-Za-z]+(?:'[A-Za-z]+)?")

def _tokenize(sentence: str) -> list[str]:
    """
    Split on hyphens first, then find word-like tokens:
    - A word is letters with an optional internal apostrophe block, e.g., don't, it's.
    - Leading/trailing punctuation is ignored by the regex.
    """
    parts = []
    for chunk in sentence.replace("-", " ").split():
        parts.extend(_WORD_RE.findall(chunk))
    return parts

def shortest_word(sentence: str) -> str:
    if not isinstance(sentence, str):
        raise ValueError("Input must be a string")
    sentence = sentence.strip()
    if not sentence:
        raise ValueError("Empty sentence")

    words = _tokenize(sentence)
    if not words:
        raise ValueError("No words found")

    # Find the first word with minimal length (case-insensitive for length)
    min_len = None
    answer = None
    for w in words:
        L = len(w)
        if min_len is None or L < min_len:
            min_len = L
            answer = w
    return answer

Quick manual examples

>>> from shortest import shortest_word
>>> shortest_word("The quick brown fox.")
'The'
>>> shortest_word("An, apple; a day!")
'a'
>>> shortest_word("It's time to test.")
"to"
>>> shortest_word("A well-known fact.")
'A'

Tests — `test_shortest.py`

Run with python -m pytest -q (or python -m unittest discover if you prefer).

import pytest
from shortest import shortest_word

def test_basic():
    assert shortest_word("The quick brown fox.") == "The"

def test_tie_returns_first():
    # 'An' and 'a' both present; first shortest is 'a'
    assert shortest_word("An, apple; a day!") == "a"

def test_apostrophes_kept_inside():
    assert shortest_word("Don't stop believing") == "stop"  # stop(4) vs Don't(5)

def test_hyphen_splits():
    assert shortest_word("A well-known fact") == "A"

def test_whitespace_only_raises():
    with pytest.raises(ValueError):
        shortest_word("   \t  ")

def test_no_words_raises():
    with pytest.raises(ValueError):
        shortest_word("... --- !!!")

def test_mixed_punctuation():
    assert shortest_word("Hello, world!!! This—is—a—test.") == "a"

def test_case_irrelevant_for_length():
    assert shortest_word("BIG small TINY") == "BIG"  # first with min length 3

Stretch goals (optional)

Unicode letters: Support non-ASCII letters (é, ł, ñ). Hint: use regex character classes like \p{L} if allowed, or Python’s re with a broader approach via unicodedata.
Return all shortest words: Change the API to return a list of all shortest words in order of appearance.
Word index: Also return the start index (character position) of the chosen word.
Ignore stopwords: Skip very common words (a, the, and) when choosing the shortest.

Problem Set 4: Shortest Word — Find the shortest word in a sentence

How to Approach Problem Sets

Quick Hints

Overview

Learning objectives

Success criteria

Constraints & rules

Starter design

Function to implement

Skeleton (Python) — shortest.py

Quick manual examples

Tests — test_shortest.py

Stretch goals (optional)

Skeleton (Python) — `shortest.py`

Tests — `test_shortest.py`