# 17.13. Re-space ## The spaces from a sentence have been removed. Most words are present in the dictionary but some aren't. Given a dictionary (a list of strings) and the document (a string), unconcatenate the document in a way that minimizes the number of unrecognized characters > example: 'jesslookedjustliketimherbrother' -> 'jess looked just like tim her brother' ## First idea This is complicated... I could start first by unconcatenating the sentences considering they're all present in the dictionary, and then optimize. How do I unconcatenate in the first place? .tolower() first * Easiest case: 'iamalive' -> 'i am alive', only one possibility * Concatenation: 'theylookedible' -> 'they look edible', but 'looked' and 'edible' concatenate. I could do swaps of n-grams, start with the longest words and go downwards: > example: 'iamalive' 1. swap of len(s) = 8, 'iamalive' in dict? no 2. swap = 7, 'iamaliv', 'amalive' in dict? no 3. swap = 6, 'iamali', 'amaliv', 'malive' in dict? no 4. swap = 5, ยก'iamal', ..., 'alive'. 'alive' found and saved. How? 5. swap = 4, nothing 6. swap = 3, nothing 7. swap = 2, 'am' found 8. swap = 1, 'i' found. All the string is 'covered', done. > example: 'theylookedible' 1. swap of len(s) = 10, nothing 2. wap = 6, 'looked', 'edible' saved 3. swap = 4, 'they', 'look' 4. swap = 3, 'the' 5. swap = 2, 'he' found 6. swap = 1, nothing How to resolve the conflict of swap = 6? We could look for 'look' and 'ible'. If we find none, choose one randomly and 4 unrecognized words. If we find 'look', assign 'look' and 'edible' and discard 'looked'. How to save these? Don't know. This algorithm is O(n2) Also, once the substrings are saved, how to check for string completeness to break the loop and return?