ctci/01. Arrays and strings/1.6. string_compression.md

1.6 KiB

1.6. String compression

Compression of count of repeated characters. If the compressed smaller isn't smaller than the original string, return the original string. The string only has lowercase and uppercase letters

example:

"aabcccccaaa" -> "a2b1c5a3"

return compr if len(compr) < slen else s

First idea

O(n), create a new empty string and add to that. Iterate through the string, keep count and add to the string. Start with compr=s[0] and count=1, and start at index=1.

def compress(s):
    slen = len(s)
    if slen < 3:
        return s
    compr = s[0]
    count = 1
    for i in range(1, slen):
        if len(compr) >= slen:
            return s
        if s[i] == compr[-1]:
            count += 1
            if i == slen - 1:
                compr += str(count)
        else:
            compr += str(count)
            compr += s[i]
            count = 1
            if i == slen - 1:
                compr += str(count)
    return compr if len(compr) < slen else s

If character changes, update count in compr and reset to 1. If the string ends, add the count at the end.

Tests:

  • "" -> "", correct
  • "a" -> "a", correct
  • "aa" -> "a2", correct
  • "aabbb" -> "a2b3", correct
  • "aabbbc" -> "aabbbc", correct

Remember that if len(compr) < len(s), return s. This has O(n) time, O(n) space.

Solution

Uses StringBuilder as optimal solution, because in the original (my) case, runtime is O(n + k2), where k is the number of character sequences. An optimization is, while creating the compressed string, if it gets bigger than the original one, stop and return s.