3.3 KiB
1.1. Is unique
Implement an algorithm to determine if a S has all unique characters. What if you can't use additional data structures
example: s = "abdceb"
1.1.1. First idea
def is_unique(s):
return len(s) == len(set(s)) # complexity O(n)? from set
Complexity of set()? It uses a hash table with O(1) average in insertion/lookup, so O(n) for all characters.
Tests:
- s = "" -> 0==0, True, correct
- s = "a" -> 1==1, True, correct
- s = "ab" -> 2==2, True, correct
- s = "aa" -> 2!=1, False, correct
1.1.2. Non-optimal / brute force idea
- compare 'a', 'bdceb', no repeated characters
- compare 'b', 'adceb', repeated character -> break, False
don't need to compare to previous chars, only from i onwards. Runtime O(n2)
1.1.3. Improved idea
- compare 'a', 'bdceb', no repeated characters
- compare 'b', 'dceb', repeated character -> break, False
- else, compare with all, each step a smaller comparison
Test same as before, runtime still O(n2). not improving
1.1.4. Expand first idea
The function set() uses a hash table, equivalent to a dictionary in python. Actually a dictionary is a data structure that uses a hash function to map keys to values, but python may change the implementation in the future. A dictionary is just a mapping of a key to a value.
Using hash tables / dictionaries in this case might work.
def is_unique(s):
mapping = dict()
for letter in s:
if letter in mapping: # letter already present
return False
mapping[letter] = 1 # any value, doesn't matter
return True
A hash table takes O(1) average in lookup/insertion, so O(n) for all.
Solution
Always ask if the string is an ASCII string or an Unicode string. if Unicode, we'll need more storage.
One solution is to create an array of booleans, where the flag at index i indicates whether character i in the alphabet is present in the string. The second time we see character i, return False.
Also immediately return False if the string length is bigger than the number of unique characters in the alphabet, which is 128 in ASCII. But check this with the interviewer.
ane: Also immediately return True if the string length is 0 or 1.
boolean is_unique(String s) {
if (str.length() > 128) return false;
if (str.length() <= 1) return true; // my idea
boolean[] char_set = new boolean[128];
for (int i = 0; i < str.length(); i++) {
int val = str.charAt(i);
if(char_set[val]) return false;
char_set[val] = true;
}
return true;
}
Time complexity is O(1), because it'll be at most 128 steps, and space complexity O(1) because we store 128 values. If the string can have any length, then O(n) runtime and O(n) space.
ane: this is my hash table but without the values... there's no need to create a table if we don't need the values.
We can reduce the space usage by a factor of 8 by using a bit vector. We can assume the string only uses lowercase, and just use an int. (see is_unique_checker() in is_unique.py)