Problem Set 4: Shortest Word — Find the shortest word in a sentence

When you start a new problem set, your first instinct might be to open your computer and begin typing code right away. While this can feel productive, it often leads to frustration when things don't work as expected. Instead, take a few minutes to slow down and plan.

Here are some helpful strategies:

  1. Understand the problem clearly
    • Read the instructions carefully — twice if needed.
    • Ask yourself: What exactly is being asked?
  2. Break the problem into smaller steps
    • Think about the smallest possible actions the computer will need to perform.
    • For example: If the task is to find the first recurring letter in a word, what steps must happen first?
  3. Try solving it on paper first
    • Write out your steps in plain language (pseudocode).
    • Test your steps with a simple example before touching the keyboard.
  4. Translate your steps into code
    • Start small — write only a few lines at a time and test often.
    • Don't worry about perfection at first; get a working version, then improve it.
  5. Check your solution
    • Run it with different examples, including edge cases.
    • Ask: Does this solve the problem in all situations?

  • Read the question carefully (twice).
  • Break the task into the smallest steps.
  • Sketch or write pseudocode before coding.
  • Start small — test as you go.
  • Check your solution with different cases.

Overview

Write a function that takes a sentence (string) and returns the shortest word it contains.

  • If there’s a tie, return the first shortest word that appears.
  • Ignore leading/trailing punctuation around words (e.g., commas, periods).
  • Treat internal apostrophes as part of a word (e.g., don't is one word).
  • Treat hyphens as separators (e.g., well-knownwell, known).

Learning objectives

You should be able to:

  • Split and clean text reliably.
  • Handle edge cases (punctuation, multiple spaces, ties, empty input).
  • Write small, testable functions.

Success criteria

  • shortest_word(sentence) returns a single string: the first shortest word.
  • Works with arbitrary spacing and punctuation.
  • Raises ValueError for empty/whitespace-only input or when no words are found.
  • Includes tests that pass.

Constraints & rules

  • Do not use external libraries.
  • Words are sequences of letters with optional internal apostrophes.
  • Hyphens split words.
  • Case-insensitive for measuring length, but return the original casing of the chosen word.

Starter design

Function to implement

def shortest_word(sentence: str) -> str:
    """Return the first shortest word in the sentence.

    Rules:
      - Words are letter sequences possibly containing internal apostrophes.
      - Hyphens split words into separate tokens.
      - Leading/trailing punctuation is ignored.
      - If multiple words share the minimum length, return the first one.
      - Raise ValueError if no valid words exist.
    """
    ...

Skeleton (Python) — shortest.py

import re

_WORD_RE = re.compile(r"[A-Za-z]+(?:'[A-Za-z]+)?")

def _tokenize(sentence: str) -> list[str]:
    """
    Split on hyphens first, then find word-like tokens:
    - A word is letters with an optional internal apostrophe block, e.g., don't, it's.
    - Leading/trailing punctuation is ignored by the regex.
    """
    parts = []
    for chunk in sentence.replace("-", " ").split():
        parts.extend(_WORD_RE.findall(chunk))
    return parts

def shortest_word(sentence: str) -> str:
    if not isinstance(sentence, str):
        raise ValueError("Input must be a string")
    sentence = sentence.strip()
    if not sentence:
        raise ValueError("Empty sentence")

    words = _tokenize(sentence)
    if not words:
        raise ValueError("No words found")

    # Find the first word with minimal length (case-insensitive for length)
    min_len = None
    answer = None
    for w in words:
        L = len(w)
        if min_len is None or L < min_len:
            min_len = L
            answer = w
    return answer

Quick manual examples

>>> from shortest import shortest_word
>>> shortest_word("The quick brown fox.")
'The'
>>> shortest_word("An, apple; a day!")
'a'
>>> shortest_word("It's time to test.")
"to"
>>> shortest_word("A well-known fact.")
'A'

Tests — test_shortest.py

Run with python -m pytest -q (or python -m unittest discover if you prefer).

import pytest
from shortest import shortest_word

def test_basic():
    assert shortest_word("The quick brown fox.") == "The"

def test_tie_returns_first():
    # 'An' and 'a' both present; first shortest is 'a'
    assert shortest_word("An, apple; a day!") == "a"

def test_apostrophes_kept_inside():
    assert shortest_word("Don't stop believing") == "stop"  # stop(4) vs Don't(5)

def test_hyphen_splits():
    assert shortest_word("A well-known fact") == "A"

def test_whitespace_only_raises():
    with pytest.raises(ValueError):
        shortest_word("   \t  ")

def test_no_words_raises():
    with pytest.raises(ValueError):
        shortest_word("... --- !!!")

def test_mixed_punctuation():
    assert shortest_word("Hello, world!!! This—is—a—test.") == "a"

def test_case_irrelevant_for_length():
    assert shortest_word("BIG small TINY") == "BIG"  # first with min length 3

Stretch goals (optional)

  1. Unicode letters: Support non-ASCII letters (é, ł, ñ). Hint: use regex character classes like \p{L} if allowed, or Python’s re with a broader approach via unicodedata.
  2. Return all shortest words: Change the API to return a list of all shortest words in order of appearance.
  3. Word index: Also return the start index (character position) of the chosen word.
  4. Ignore stopwords: Skip very common words (a, the, and) when choosing the shortest.