3.1 KiB
1.5. Levenshtein distance
Given two strings, check if they are one edit (or zero edits) away. Edits can be insertion, substitution or deletion
First idea
Difference in length must be at most 1
if abs(len(s1) - len(s2)) > 1:
return False
if s1 == s2:
return True
BCR is O(n) time, O(1) space.
This 'edit away' is Levenshtein distance, task is to check if Levenshtein distance <= 1.
Can we do s1 - s2? Or s1.difference(s2)? Not in python.
Second idea
From the Internet, this is O(1) time:
def levenshtein(s1, s2):
return len("".join(s1.rsplit(s2))) <= 1
But this splits the first word with the second word, it doesn't work if there's a gap/addition in the middle of the string.
Third idea
Keep a boolean flag that must be at most 1. Iterate through the smallest string, check for differences, if difference: flag += 1, if flag == 2, return False.
If equal lengths
Same length, match s1[i] vs s2[i]. if difference: flag += 1, if flag == 2, return False. if iteration done, return True
Unequal lengths
Iterate through the smallest string, match s1[i] vs s2[i]. If difference, the smallest string must have a character deletion. flag = 1, and keep matching s1[i] vs s2[i+1] (where s1 is the small string). If the smallest string is iterated and no errors, return True
Tests:
- ("", "") -> True, correct
- ("an", "a") -> True, correct
- ("a", "an") -> True, correct
- ("ane", "ae") -> True, correct
- ("ane", "ne") -> True, correct
- ("asdfgh", "asdgh") -> True, correct
- ("asdf", "asgf") -> True, correct
- ("asdfg", "adfgh") -> False, corrrect
- ("an", "ne") -> False, correct
Tests passed, O(min(len(s1), len(s2)))) time, O(1) space. Possibly code can be compressed but I don't know how.
Solution
First solution, 3 different cases for equal lengths, len1 < len2, len2 < len1. Seeing this, my code:
elif len1 < len2: # s1 shorter than s2
for i in range(len1):
if not flag:
if s1[i] != s2[i]:
if flag:
return False
flag = True
else: # there's already one difference
if s1[i] != s2[i + 1]:
return False
return True
elif len2 < len1: # s2 shorter than s1
for i in range(len2):
if not flag:
if s1[i] != s2[i]:
if flag:
return False
flag = True
else: # there's already one difference
if s1[i + 1] != s2[i]:
return False
return True
can be merged into one:
else:
index1 = 0
index2 = 0
while index1 < len1 and index2 < len2:
if s1[index1] != s2[index2]: # difference found
if flag:
return False
flag = True
## need to check for shorter string here...
The solution creates 2 new strings, short and long (using O(2n) space), and updates indexes accordingly (see java file).
What's better, create 2 new strings (O(n) space) and make code more compact, or use just 3 variables and 10 more lines of code? Don't know.