String Manipulation Mastery: Every Pattern From Palindromes to Anagram Matching
Strings are in 40% of coding interview problems. They look simple but hide deep patterns β rolling hash, Z-algorithm, Rabin-Karp, and more. This guide goes from fundamentals to advanced string techniques with full implementations.
Strings are deceptive. Every student can reverse a string or check a palindrome. But string problems in medium-hard interviews involve rolling hashes, sliding windows on character frequencies, and prefix patterns that trip up everyone who only knows the basics.
This guide covers the full spectrum β from patterns you'll use in every interview to advanced techniques that make O(nΒ²) solutions into O(n).
Fundamental Operations to Know Cold
s = "Hello, World!"
# Length
len(s) # 13
# Access and slicing
s[0] # 'H'
s[-1] # '!'
s[7:12] # 'World'
s[::-1] # '!dlroW ,olleH' (reverse)
s[::2] # 'Hlo ol!'
# Search
s.find("World") # 7 (first occurrence, -1 if not found)
s.index("World") # 7 (raises ValueError if not found)
s.count("l") # 3
"World" in s # True
# Modify (strings are immutable β returns new string)
s.lower() # 'hello, world!'
s.upper() # 'HELLO, WORLD!'
s.strip() # removes leading/trailing whitespace
s.lstrip("H") # 'ello, World!'
s.replace("World", "Python") # 'Hello, Python!'
# Split/join
s.split(", ") # ['Hello', 'World!']
" ".join(["Hello", "World"]) # 'Hello World'
# Check type
"abc123".isalnum() # True
" ".isspace() # True
Pattern 1: Two Pointers for Palindromes
def is_palindrome(s):
# Keep only alphanumeric, lowercase
clean = [c.lower() for c in s if c.isalnum()]
return clean == clean[::-1]
# Optimised: no extra list
def is_palindrome_optimised(s):
left, right = 0, len(s) - 1
while left < right:
while left < right and not s[left].isalnum():
left += 1
while left < right and not s[right].isalnum():
right -= 1
if s[left].lower() != s[right].lower():
return False
left += 1; right -= 1
return True
Longest Palindromic Substring (Expand Around Center)
def longest_palindrome(s):
n = len(s)
start, max_len = 0, 1
def expand(left, right):
while left >= 0 and right < n and s[left] == s[right]:
left -= 1; right += 1
return left + 1, right - 1 # valid palindrome boundaries
for i in range(n):
# Odd length: "aba"
l, r = expand(i, i)
if r - l + 1 > max_len:
start, max_len = l, r - l + 1
# Even length: "abba"
if i + 1 < n:
l, r = expand(i, i + 1)
if r - l + 1 > max_len:
start, max_len = l, r - l + 1
return s[start:start + max_len]
Why 2i - 1 centers? A string of length n has n odd-length centers and n-1 even-length centers. Total: 2n - 1 centers to check. Each expansion is O(n) in worst case β O(nΒ²) total.
Pattern 2: Hash Map for Anagram Problems
The canonical string interview pattern: character frequency as a hash map key.
def group_anagrams(strs):
from collections import defaultdict
groups = defaultdict(list)
for s in strs:
# Sorted string is the canonical anagram key
key = tuple(sorted(s)) # "eat" β ('a', 'e', 't')
groups[key].append(s)
return list(groups.values())
# Alternative: character count as key (faster for long strings)
def group_anagrams_v2(strs):
from collections import defaultdict
groups = defaultdict(list)
for s in strs:
count = [0] * 26
for c in s:
count[ord(c) - ord('a')] += 1
groups[tuple(count)].append(s)
return list(groups.values())
Minimum Window Substring (Sliding Window + Frequency Map)
from collections import Counter
def min_window(s, t):
if not t or not s:
return ""
need = Counter(t)
have = {}
satisfied = 0 # chars in window meeting their required count
required = len(need) # distinct chars we need to satisfy
left = 0
min_len = float('inf')
result = ""
for right, char in enumerate(s):
have[char] = have.get(char, 0) + 1
if char in need and have[char] == need[char]:
satisfied += 1
# Shrink window while all requirements are met
while satisfied == required:
window_len = right - left + 1
if window_len < min_len:
min_len = window_len
result = s[left:right + 1]
left_char = s[left]
have[left_char] -= 1
if left_char in need and have[left_char] < need[left_char]:
satisfied -= 1
left += 1
return result
Pattern 3: KMP Algorithm (Pattern Matching in O(n+m))
Naive substring search is O(nΓm). KMP is O(n+m) by preprocessing the pattern to skip redundant comparisons.
def kmp_search(text, pattern):
def build_lps(pattern):
"""Longest Proper Prefix which is also Suffix."""
m = len(pattern)
lps = [0] * m
length = 0 # length of previous longest prefix-suffix
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
elif length != 0:
length = lps[length - 1] # don't increment i
else:
lps[i] = 0
i += 1
return lps
n, m = len(text), len(pattern)
if m == 0:
return 0
lps = build_lps(pattern)
occurrences = []
i = j = 0 # i for text, j for pattern
while i < n:
if text[i] == pattern[j]:
i += 1; j += 1
if j == m:
occurrences.append(i - j) # found at index i-j
j = lps[j - 1]
elif i < n and text[i] != pattern[j]:
if j != 0:
j = lps[j - 1] # skip characters we know will match
else:
i += 1
return occurrences
LPS array (Longest Proper Prefix which is also Suffix): For pattern "AABAAB": lps = [0, 1, 0, 1, 2, 3]
"If mismatch at j, don't restart from j=0. Use lps[j-1] to skip the part we know matches."
Pattern 4: Rabin-Karp (Rolling Hash)
Rabin-Karp finds pattern matches using hashing. Key application: detecting duplicate substrings.
def rabin_karp(text, pattern):
n, m = len(text), len(pattern)
if m > n:
return []
BASE = 31
MOD = 10**9 + 7
# Precompute hash of pattern and first window
def char_val(c):
return ord(c) - ord('a') + 1
pattern_hash = 0
window_hash = 0
power = 1 # BASE^(m-1) % MOD
for i in range(m):
pattern_hash = (pattern_hash * BASE + char_val(pattern[i])) % MOD
window_hash = (window_hash * BASE + char_val(text[i])) % MOD
if i < m - 1:
power = (power * BASE) % MOD
results = []
for i in range(n - m + 1):
if window_hash == pattern_hash:
if text[i:i+m] == pattern: # verify (handles hash collision)
results.append(i)
if i < n - m:
# Roll the window: remove leftmost, add rightmost
window_hash = (window_hash - char_val(text[i]) * power) % MOD
window_hash = (window_hash * BASE + char_val(text[i + m])) % MOD
window_hash = (window_hash + MOD) % MOD # ensure positive
return results
Key application β Longest Duplicate Substring: Binary search on length + Rabin-Karp to check if a substring of that length appears twice. O(n log n).
Pattern 5: Z-Algorithm (Linear Prefix Matching)
The Z-array: Z[i] = length of the longest substring starting at s[i] that is also a prefix of s.
def z_function(s):
n = len(s)
z = [0] * n
z[0] = n
left = right = 0
for i in range(1, n):
if i < right:
z[i] = min(right - i, z[i - left])
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
if i + z[i] > right:
left, right = i, i + z[i]
return z
# Pattern matching using Z-function
def z_search(text, pattern):
combined = pattern + "#" + text # '#' acts as separator
z = z_function(combined)
m = len(pattern)
occurrences = []
for i in range(m + 1, len(combined)):
if z[i] == m:
occurrences.append(i - m - 1)
return occurrences
Pattern 6: Character Frequency Sliding Window
Template for "find substring with property on character counts":
def longest_substring_k_distinct(s, k):
"""Longest substring with at most k distinct characters."""
freq = {}
left = 0
max_len = 0
for right, char in enumerate(s):
freq[char] = freq.get(char, 0) + 1
# Shrink window if constraint violated
while len(freq) > k:
left_char = s[left]
freq[left_char] -= 1
if freq[left_char] == 0:
del freq[left_char]
left += 1
max_len = max(max_len, right - left + 1)
return max_len
def longest_substring_no_repeat(s):
"""Classic - longest substring without repeating characters."""
seen = {}
left = 0
max_len = 0
for right, char in enumerate(s):
if char in seen and seen[char] >= left:
left = seen[char] + 1
seen[char] = right
max_len = max(max_len, right - left + 1)
return max_len
Interview-Ready Quick Wins
# Check if one string is rotation of another: s1 is rotation of s2
# iff s1 is a substring of s2+s2
def is_rotation(s1, s2):
return len(s1) == len(s2) and s1 in (s2 + s2)
# Count distinct characters without Counter
distinct = len(set(s))
# Check if all characters are unique (constant space)
def all_unique(s):
return len(s) == len(set(s))
# Most frequent character
from collections import Counter
most_common_char = Counter(s).most_common(1)[0][0]
# Compress string: "aabcccdddd" β "a2bc3d4"
def compress(s):
result = []
i = 0
while i < len(s):
count = 1
while i + count < len(s) and s[i + count] == s[i]:
count += 1
result.append(s[i])
if count > 1:
result.append(str(count))
i += count
return "".join(result)
Complexity Overview
| Algorithm | Time | Space | Use Case | |---|---|---|---| | Naive substring search | O(nΓm) | O(1) | Short patterns | | KMP | O(n+m) | O(m) | Single pattern, many searches | | Rabin-Karp | O(n+m) avg | O(1) | Multiple patterns, duplicate detection | | Z-algorithm | O(n+m) | O(n+m) | Prefix matching, pattern search | | Expand around center | O(nΒ²) | O(1) | Palindromes | | Manacher's algorithm | O(n) | O(n) | Palindromes (optimal) |
Practice order: Valid Anagram β Group Anagrams β Longest Substring Without Repeating β Minimum Window Substring β Longest Palindromic Substring. Each one builds on the previous. Filter by "Strings" in Practice.
Ready to practice what you just learned?
Apply these concepts with AI-powered tools built for CS students.