ctci/01. Arrays and strings/1.1. is_unique.md

3.3 KiB

1.1. Is unique

Implement an algorithm to determine if a S has all unique characters. What if you can't use additional data structures

example: s = "abdceb"

1.1.1. First idea

def is_unique(s):
    return len(s) == len(set(s)) # complexity O(n)? from set

Complexity of set()? It uses a hash table with O(1) average in insertion/lookup, so O(n) for all characters.

Tests:

  • s = "" -> 0==0, True, correct
  • s = "a" -> 1==1, True, correct
  • s = "ab" -> 2==2, True, correct
  • s = "aa" -> 2!=1, False, correct

1.1.2. Non-optimal / brute force idea

  • compare 'a', 'bdceb', no repeated characters
  • compare 'b', 'adceb', repeated character -> break, False

don't need to compare to previous chars, only from i onwards. Runtime O(n2)

1.1.3. Improved idea

  • compare 'a', 'bdceb', no repeated characters
  • compare 'b', 'dceb', repeated character -> break, False
  • else, compare with all, each step a smaller comparison

Test same as before, runtime still O(n2). not improving

1.1.4. Expand first idea

The function set() uses a hash table, equivalent to a dictionary in python. Actually a dictionary is a data structure that uses a hash function to map keys to values, but python may change the implementation in the future. A dictionary is just a mapping of a key to a value.

Using hash tables / dictionaries in this case might work.

def is_unique(s):
    mapping = dict()
    for letter in s:
        if letter in mapping: # letter already present
            return False
        mapping[letter] = 1 # any value, doesn't matter
    return True

A hash table takes O(1) average in lookup/insertion, so O(n) for all.

Solution

Always ask if the string is an ASCII string or an Unicode string. if Unicode, we'll need more storage.

One solution is to create an array of booleans, where the flag at index i indicates whether character i in the alphabet is present in the string. The second time we see character i, return False.

Also immediately return False if the string length is bigger than the number of unique characters in the alphabet, which is 128 in ASCII. But check this with the interviewer.

ane: Also immediately return True if the string length is 0 or 1.

boolean is_unique(String s) {
    if (str.length() > 128) return false;
    if (str.length() <= 1) return true; // my idea

    boolean[] char_set = new boolean[128];
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i);
        if(char_set[val]) return false;
        char_set[val] = true;
    }
    return true;
}

Time complexity is O(1), because it'll be at most 128 steps, and space complexity O(1) because we store 128 values. If the string can have any length, then O(n) runtime and O(n) space.

ane: this is my hash table but without the values... there's no need to create a table if we don't need the values.

We can reduce the space usage by a factor of 8 by using a bit vector. We can assume the string only uses lowercase, and just use an int. (see is_unique_checker() in is_unique.py)

Run C++ in vscode