chapter 10 done ✔️

2020-10-13 17:47:33 +02:00 · 2020-10-13 17:47:33 +02:00 · 86605d68aa
parent f52d0cde54
commit 86605d68aa
10 changed files with 176 additions and 7 deletions
--- a/searching/10.10.
+++ b/searching/10.10.
@ -0,0 +1,25 @@
+# 10.10. Rank for stream
+
+> You are reading in a stream of integers. Periodically, you want to look up the rank of a number x (the number of values less than or equal to x). Implement the data structures and algorithms to support these operations: track (int x), called each time a number is generated, and getRankOfNumber(int x), returning the number of values less than or equal to x
+
+Example
+
+* Stream (in order of appearance): 5, 1, 4, 4, 5, 9, 7, 13, 3
+* getRankOfNumber(1) = 0
+* getRankOfNumber(3) = 1
+* getRankOfNumber(4) = 3
+
+During the streaming, the data must be saved somehow, sorted. We can create an array where at each index, the number of occurrences of that number appear. To account for all integers, we can create a dynamic array or a static array of 4B.
+
+* First step: [0, 0, 0, 0, 1]
+* Second step: [1, 0, 0, 0, 1]
+* Third step: [1, 0, 0, 1, 1]
+* and so on
+
+`track(x)` just does `arr[x] += 1` if it's static, or enlarge the array and then do that. `getRankOfNumber(x)` does `sum(arr[:x])`.
+
+## Solution
+
+Use a binary search tree where each node stores some additional data. `track(x)` will run in O(log(n)) time, where n is the size of the tree. To find the rank of a number, we can do an in-order traversal, keeping a counter as we traverse. By the time we find `x`, `counter` will equal the number of elements less than x.
+
+See example in book page 412.
--- a/searching/10.3.
+++ b/searching/10.3.
@ -0,0 +1,55 @@
+# 10.3. Search in rotated array
+
+> Given a sorted array of n integers that has been rotated an unknown number of times, find an element in the array. The array was originally sorted in increasing order.
+
+* Input: find 5 in {15, 16, 19, 20, 25, 1, 3, 4, 5, 7, 10, 14}
+* Output: 8 (the index of 5 in the array)
+
+If the first element is larger than our number, start search from the end. If not, start from the beginning.
+
+```python
+def find_in_rotated(ar: list, el: int) -> int:
+    ar = ar[::-1] if el < ar[0] else ar
+    for i, num in enumerate(ar):
+        if el == num:
+            return i
+    return None
+```
+
+In the worst case scenario it takes O(n/2), O(1) space.
+
+## Hints
+
+> Modify binary search for this purpose
+
+My implementation was brute force. This one uses binary search and has runtime of O(n)
+
+```python
+def rotated_binary_search(arr, key):
+    N = len(arr)
+    L = 0
+    R = N - 1
+    while L <= R:
+        M = L + ((R - L) / 2)
+        if (arr[M] == key):
+            return M
+        # the bottom half is sorted
+        if (arr[L] <= arr[M]):
+            if (arr[L] <= key and key < arr[M]):
+                R = M - 1
+            else:
+                L = M + 1
+        # the upper half is sorted
+        else:
+            if (arr[M] < key and key <= arr[R]):
+                L = M + 1
+            else:
+                R = M - 1
+    return -1
+
+arr = [15, 16, 19, 20, 25, 1, 3, 4, 5, 7, 10, 14]
+x = 10
+result = rotated_binary_search(arr, x)
+```
+
+If there are duplicates, the runtime might be O(n) because we will have to search both the left and right sides of the array.
--- a/searching/10.4.
+++ b/searching/10.4.
@ -0,0 +1,7 @@
+# 10.4. Sorted search, no size
+
+> You have an array-like data structure Listy which lacks a size method. It has an .elementAt(i) method returning the element at index i in O(1) time. If i is negative or bigger than the size, it returns -1. Given a Listy containing sorted, positive integers, find the index at which an element x occurs.
+
+Initial approach would to check one by one starting from the beginning. O(n). Another is to try random checkups in the beginning, arr.elementAt(i) for i=10000, 1000, 100, 10. If x were to be bigger, start with a bigger i. It should give -1 until it's not 1 anymore. For example arr.elementAt(100)=X. If elem < X, do binary search between len(0, 100).  If elem >= X, check arr.elementAt(200) and repeat, arr.elementAt(300) and repeat and so on.
+
+Hints suggest an exponential backoff, starting with 2,4,8,16,etc until it hits -1. Then do binary search. Runtime will be O(log(n) + log(n)). First part for finding the length, second for binary search.
--- a/searching/10.5.
+++ b/searching/10.5.
@ -0,0 +1,15 @@
+# 10.5. Sparse search
+
+> Given a sorted array of strings that is interspersed with empty strings, find the location of a given string
+
+Example:
+
+* array = ['at', '', '', '', 'ball', '', '', 'car', '', '', 'dad, '', '']
+* word = ball
+* Output: 4
+
+We don't know how many empty strings there are between words. We could sort it to get rid of empty strings, and search in the sorted array. This would take O(nlog(n)) I believe for sorting + O(log(n)) for searching = O(nlog(n)). We would implement the binary search so that if the middle point is empty, calculate middle point between left and middle again until we find a word.
+
+The solution mentions doing binary search and if the middle point is empty, move mid to the closest non-empty string. The worst case scenario is O(n), since you can have an array full of empty strings except one.
+
+Consider what happens if the word to be searched is an empty string, should the algorithm return an error? Discuss with the interviewer
--- a/searching/10.6.
+++ b/searching/10.6.
@ -0,0 +1,13 @@
+# 10.6. Sort big file
+
+> Given a 20GB file with one string per line, sort the file.
+
+I would distribute the file in many machines and do a sort of MapReduce, sort each portion separatedly (map) and then join merged files (reduce). Should take O(n log(n)) for each portion, and then O(n) to merge sorted files? This would me a **merge sort**.
+
+A **quick sort** could also work, picking a random element in the array and partition the array around it, such that all numbers lower than the partitioning element come before all elements greater than it
+
+## Solution
+
+If an interviewer gives such a high data size, it's implying **you cannot have all this data in memory**.
+
+Divide the file into chunks which are XMBs big, where X is the amount of memory we have available. Each chunk is sorted separately and then saved back to the file system. Then we merge the chunks one by one. This is known as **external sort**.
--- a/searching/10.7.
+++ b/searching/10.7.
@ -0,0 +1,9 @@
+# 10.7. Missing int
+
+> Given an input file with 4B non-negative integers, generate an integer not contained in the file. Assume you have 1GB memory available for this task. Follow up question: what if you only have 10MB of memory? Assume that all values are distinct and there are 1B non-negative integers
+
+I would load 1GB of data into memory. Then, with Python, I would create a set of an index from 1 to 4B and obtain the difference with my index and the input file. Do the same for 4 chunks, the difference with each is the set with all numbers not present in the input file. This would be O(n) space complexity and O(n/available_memory) time complexity.
+
+## Solution
+
+It involves bit vectors, indexing. Quite complicated to understand.
--- a/searching/10.8.
+++ b/searching/10.8.
@ -0,0 +1,9 @@
+# 10.8. Find duplicates
+
+> With an array with all numbers from 1 to N where N is at most 32000, the array can have duplicate entries and you do not know what N is. With only 4KB of memory, print all duplicate elements in the array
+
+I would load 4KB of the data each time, keep a hash table with each integer and False if it has never been seen and True if it has. At each pass, do `if visited[num]: return num; else visited[num] = True`. The hash table will be big, 32000 at most. And the worst case runtime O(n). A boolean array of length 32000 works too.
+
+## Solution
+
+It involves bit vectors, creating a bit vector with 32000 bits, iterating the array and flagging each element v by setting v to 1. When coming across a duplicate element, print it. Same as my approach but using a bit vector instead of an array, I am not sure of the difference in efficiency between a bit vector and an array.
--- a/sorted_matrix_search.md
+++ b/sorted_matrix_search.md
@ -1,8 +1,8 @@
 # 10.9. Sorted matrix search

-## Given an M x N matrix in which each row and each column is sorted in ascending order, write a method to find an element
+> Given an M x N matrix in which each row and each column is sorted in ascending order, write a method to find an element

-> example
+Example

 ```python
 A = [[1,2,4,5,7],
@ -258,4 +258,4 @@ Keep track of possible rows and cols, where num > first elem, and num < last ele

 If we compare `num` to the center number in the matrix, we can eliminate roughly one quarter. hm.

-p410, I don't really understand it. We can discard quarters of the matrix by checking the middle values in the matrix and recursively check the lower left quadrant and the upper right quadrant.
+p410, I don't really understand it. We can discard quarters of the matrix by checking the middle values in the matrix and recursively check the lower left quadrant and the upper right quadrant.
--- a/searching/README.md
+++ b/searching/README.md
@ -30,8 +30,7 @@ Java algorithm in book.

 Pick a random element and partition the array, such that all numbers lower than the partitioning element come before all elements greater than it. Repeatedly partitioning the array (and its sub-arrays) around an element, the array is eventually sorted. But as the partitioned element is not guaranteed to be the median or close to it, the sorting could be very slow. That is why the worst case rutime is O(n<sup>2</sup>).

-
-```python
+```pseudocode
 a = [3, 5, 2, 4, 1, 6, 7], left=0, right=6

 pivot = a[3] = 4
@ -77,6 +76,28 @@ This is a sorting algorithm that distributes the elements of an array into a num

 In **binary search**, we look for an element x in a sorted array by first comparing x to the midpoint of the array. If x is lower, we search the left half. and vice versa.

-See code in book.
+See code in book, see below for Python:
+
+```python
+def binarySearch(arr, x):
+    l = 0
+    r = len(arr) - 1
+    while l <= r:
+        mid = l + (r - l) // 2;
+        if arr[mid] == x:
+            return mid
+  
+        # If x is greater, ignore left half
+        elif arr[mid] < x:
+            l = mid + 1
+        # If x is smaller, ignore right half
+        else:
+            r = mid - 1
+    return -1
+
+arr = [2, 3, 4, 10, 40]
+x = 10
+result = binarySearch(arr, x)
+```

 There are more algorithms than just binary search, use binary trees or hash tables too.
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 # Cracking the coding interview exercises and notes

-If you can't afford to buy the book, you can find a free pdf [here](http://ahmed-badawy.com/blog/wp-content/uploads/2018/10/Cracking-the-Coding-Interview-6th-Edition-189-Programming-Questions-and-Solutions.pdf) (Updated as of 2020.03.14).
+If you can't afford to buy the book, you can find a free pdf [here](https://github.com/alxerg/Books-1/blob/master/Cracking%20the%20Coding%20Interview%2C%206th%20Edition%20189%20Programming%20Questions%20and%20Solutions.pdf) (last updated on 2020.10.09).

 ## Introduction

@ -92,7 +92,22 @@ If you can't afford to buy the book, you can find a free pdf [here](http://ahmed
 * Sorted merge
 * Anagrams
 * Search in rotated array
+* Sorted search, no size
 * Sorted matrix search
+* Sparse search
+* Sort big file
+* Missing int
+* Find duplicates
+* Sorted matrix search
+* Rank for stream
+
+## Chapter 11 Testing
+
+* Mistake
+* Random crashes
+* Chess test
+* Test a pen
+* Test an ATM

 ## Chapter 16 Moderate problems