B2.1.2 Construct programs that can extract and manipulate substrings.
• Writing of programs that accurately identify and extract substrings from given strings, demonstrating the ability to perform various manipulations, such as altering, concatenating or replacing
The Big Idea
In programming, a substring is a contiguous sequence of characters within a string. Mastery of substring manipulation is fundamental to many tasks in software development—from parsing user input to handling file paths, processing text data, and implementing algorithms.
To construct programs that extract and manipulate substrings means to:
- Identify relevant sections of a string based on indices or patterns,
- Modify those substrings (e.g. replace, capitalize, reverse),
- Combine substrings to form new strings.
This topic lies at the heart of string processing and builds foundational skills for text analysis, regular expressions, and data cleaning.
Key Concepts
1. Substring Extraction
The ability to select a portion of a string using character positions (indices).
In Python:
name = "MacKenty" sub = name[0:3] # 'Mac'name[start:end]returns a substring starting at indexstartand ending beforeend.In Java:
String name = "MacKenty"; String sub = name.substring(0, 3); // "Mac"
2. Altering Substrings
Changing parts of a string: transforming case, trimming, or slicing dynamically.
Python example:
message = "Hello, World!" print(message.lower()) # "hello, world!" print(message.replace("World", "IB")) # "Hello, IB!"Java example:
String message = "Hello, World!"; System.out.println(message.toLowerCase()); // "hello, world!" System.out.println(message.replace("World", "IB")); // "Hello, IB!"
3. Concatenation
Combining strings together, possibly using extracted substrings.
Python:
greeting = "Hello" name = "Bill" full = greeting + ", " + name + "!"Java:
String greeting = "Hello"; String name = "Bill"; String full = greeting + ", " + name + "!";
4. Replacing and Reformatting
Replacing a part of a string with another value or format.
Python:
url = "https://example.com" url = url.replace("https", "http")Java:
String url = "https://example.com"; url = url.replace("https", "http");
Practical Worked Examples
Python: Masking an Email Address
email = "[email protected]"
at_index = email.index("@")
masked = email[0] + "***" + email[at_index:]
print(masked) # s***@example.com
Java: Extracting a File Extension
String filename = "report.pdf";
int dotIndex = filename.lastIndexOf(".");
String extension = filename.substring(dotIndex + 1);
System.out.println(extension); // "pdf"
Python: Reversing Words in a Sentence
sentence = "IB Computer Science"
words = sentence.split(" ") # ['IB', 'Computer', 'Science']
reversed_sentence = " ".join(words[::-1])
print(reversed_sentence) # "Science Computer IB"
Java: Splitting and Capitalizing Words
String sentence = "ib computer science";
String[] words = sentence.split(" ");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].substring(0,1).toUpperCase() + words[i].substring(1);
}
String titleCase = String.join(" ", words);
System.out.println(titleCase); // "Ib Computer Science"
Applications
- Data Cleaning: Remove unwanted substrings from log files or input fields.
- Parsing User Input: Extract names, emails, or dates from user submissions.
- Search and Replace: Modify configuration files, search logs, or format outputs.
- URL or Path Manipulation: Extract filenames, domains, or query parameters.
Common Errors
- Off-by-one errors: Indexing errors are common in substring operations.
- Immutable Strings: In most languages like Java and Python, strings are immutable—so methods like
replace()return a new string. - Out-of-bounds access: Always check that your indices are within the bounds of the string length.
Summary
Constructing programs that manipulate substrings is a fundamental programming skill. It demands a clear understanding of string indexing, slicing, and built-in string methods in your chosen language. Whether you're sanitizing data, formatting outputs, or building more complex text algorithms, mastering this area will serve you well across nearly every field of computer science.