ctci/01. Arrays and strings/1.6. string_compression.md

54 lines
1.6 KiB
Markdown

# 1.6. String compression
## Compression of count of repeated characters. If the compressed smaller isn't smaller than the original string, return the original string. The string only has lowercase and uppercase letters
> example:
"aabcccccaaa" -> "a2b1c5a3"
```python
return compr if len(compr) < slen else s
```
## First idea
O(n), create a new empty string and add to that. Iterate through the string, keep count and add to the string. Start with `compr=s[0]` and `count=1`, and start at index=1.
```python
def compress(s):
slen = len(s)
if slen < 3:
return s
compr = s[0]
count = 1
for i in range(1, slen):
if len(compr) >= slen:
return s
if s[i] == compr[-1]:
count += 1
if i == slen - 1:
compr += str(count)
else:
compr += str(count)
compr += s[i]
count = 1
if i == slen - 1:
compr += str(count)
return compr if len(compr) < slen else s
```
If character changes, update count in `compr` and reset to 1. If the string ends, add the count at the end.
> Tests:
* "" -> "", correct
* "a" -> "a", correct
* "aa" -> "a2", correct
* "aabbb" -> "a2b3", correct
* "aabbbc" -> "aabbbc", correct
Remember that `if len(compr) < len(s), return s`. This has O(n) time, O(n) space.
## Solution
Uses `StringBuilder` as optimal solution, because in the original (my) case, runtime is O(n + k<sup>2</sup>), where `k` is the number of character sequences. An optimization is, while creating the compressed string, if it gets bigger than the original one, stop and return s.