# 14. Databases ## Basic SQL ```sql SELECT CITY FROM STATION WHERE CODE = '123' ``` Get distinct cities: ```sql SELECT DISTINCT CITY FROM STATION WHERE CODE = '123' ``` Count number of distinct cities ```sql SELECT count(DISTINCT CITY) FROM STATION WHERE CODE = '123' ``` Apart from count, there is also `sum`, `avg`, `ceil`, `floor`. Get the shortest city name and longest city name ```sql SELECT CITY, LENGTH(CITY) FROM STATION ORDER BY LENGTH(CITY), CITY asc limit 1; SELECT CITY, LENGTH(CITY) FROM STATION ORDER BY LENGTH(CITY), CITY desc limit 1; ``` Get cities ending with vowels ```sql SELECT DISTINCT CITY FROM STATION WHERE city REGEXP "[aeiou]$" ``` Select substring of column. The first two characters. 1 means first character, 2 means length. SQL has an index system starting from 1. To get the last ones, use -X. ```sql SELECT SUBSTR(NAME, 1, 2) FROM TABLE ``` Output a string for each case ```sql SELECT CASE WHEN A >= (B + C) OR B >= (A + C) OR C >= (A + B) THEN 'Not A Triangle' WHEN A = B AND A = C THEN 'Equilateral' WHEN A = B OR B = C OR A = C THEN 'Isosceles' ELSE 'Scalene' END FROM TRIANGLES; ``` Get the list of top earning employees, and the number of employees that earn this amount ```sql SELECT earnings, count(*) FROM Employee GROUP BY earnings ORDER BY earnings desc limit 1; ``` ```sql SELECT COUNTRY.Continent, FLOOR(AVG(CITY.Population)) FROM CITY INNER JOIN COUNTRY ON CITY.CountryCode=COUNTRY.Code GROUP BY COUNTRY.Continent; ``` ## Book SQL ```sql SELECT CourseName, TeacherName FROM Courses, Teachers WHERE Courses.TeacherID = Teachers.TeacherID ``` Normalized databases are designed to minimize redundancy, while denormalized databases are designed to optimize read time. We can denormalize the databases by storing redundant data and avoid doing many joins. As an example, we have this database. * indicates a primary key. ```sql Courses: CourseID*, CourseName, TeacherID Teachers: TeacherID*, TeacherName Students: StudentID*, StudentName StudentCourses: CourseID*, StudentID* ``` ### Query 1: student enrollment > Get a list of all students and how many courses each student is enrolled in ```sql SELECT StudentName, Students.StudentID, count(StudentCourses.CourseID) as [Cnt] FROM Students LEFT JOIN StudentCourses ON Students.StudentID = StudentCourses.StudentID GROUP BY Students.StudentID, Students.StudentName ``` For reasons and incorrect implementations and their justification, see chaper 14 in book. ### Query 2: Teacher class size > Get a list of all teachers and how many students they teach. If a teacher teaches the same student in two courses, double count the student. Sort the list in descending order of the number of students a teacher teaches ```sql SELECT TeacherName, isnull(StudentSize.Number, 0) FROM Teachers LEFT JOIN (SELECT TeacherID, count(StudentCourses.CourseID) AS [Number] FROM Courses INNER JOIN StudentCourses ON Courses.CourseID = StudentCourses.CourseID GROUP BY Courses.TeacherID) StudentSize ON Teachers.TeacherID = StudentSize.TeacherID ORDER BY StudentSize.Number DESC ``` ## Small database design How to design a small database 1. Handle ambiguity: understand exactly what you need to design, consult with the interviewer 2. Define the core objects: typically each core object translates into a table 3. Analyze relationships: how tables are connected to each other 4. Investigate actions: walk through the common actions that will be taken and understand how to store and retrieve the relevant data. ## Large database design When designing a large, scalable database, joins are generally very slow. You must *denormalize* your data. Duplicate the relevant data in many tables.