ctci/14. Databases
anebz f3f2c3dea4 sql cheatsheet and basic commands 2020-10-27 10:06:40 +01:00
..
14.1. Multiple apartments.md chapters 12-15, theory done 2020-10-19 14:16:31 +02:00
README.md sql cheatsheet and basic commands 2020-10-27 10:06:40 +01:00
cheatsheet.md sql cheatsheet and basic commands 2020-10-27 10:06:40 +01:00

README.md

14. Databases

Basic SQL

SELECT CITY FROM STATION WHERE CODE = '123'

Get distinct cities:

SELECT DISTINCT CITY FROM STATION WHERE CODE = '123'

Count number of distinct cities

SELECT count(DISTINCT CITY) FROM STATION WHERE CODE = '123'

Apart from count, there is also sum, avg, ceil, floor.

Get the shortest city name and longest city name

SELECT CITY, LENGTH(CITY) FROM STATION
ORDER BY LENGTH(CITY), CITY asc 
limit 1;

SELECT CITY, LENGTH(CITY) FROM STATION
ORDER BY LENGTH(CITY), CITY desc 
limit 1;

Get cities ending with vowels

SELECT DISTINCT CITY FROM STATION WHERE city REGEXP "[aeiou]$"

Select substring of column. The first two characters. 1 means first character, 2 means length. SQL has an index system starting from 1. To get the last ones, use -X.

SELECT SUBSTR(NAME, 1, 2) FROM TABLE

Output a string for each case

SELECT
    CASE
        WHEN A >= (B + C) OR B >= (A + C) OR C >= (A + B) THEN 'Not A Triangle'
        WHEN A = B AND A = C THEN 'Equilateral'
        WHEN A = B OR B = C OR A = C THEN 'Isosceles'
        ELSE 'Scalene'
    END
FROM TRIANGLES;

Get the list of top earning employees, and the number of employees that earn this amount

SELECT earnings, count(*) FROM Employee
GROUP BY earnings
ORDER BY earnings desc
limit 1;
SELECT COUNTRY.Continent, FLOOR(AVG(CITY.Population)) 
FROM CITY INNER JOIN COUNTRY 
ON CITY.CountryCode=COUNTRY.Code
GROUP BY COUNTRY.Continent;

Book SQL

SELECT CourseName, TeacherName FROM Courses, Teachers WHERE Courses.TeacherID = Teachers.TeacherID

Normalized databases are designed to minimize redundancy, while denormalized databases are designed to optimize read time. We can denormalize the databases by storing redundant data and avoid doing many joins.

As an example, we have this database. * indicates a primary key.

Courses: CourseID*, CourseName, TeacherID
Teachers: TeacherID*, TeacherName
Students: StudentID*, StudentName
StudentCourses: CourseID*, StudentID*

Query 1: student enrollment

Get a list of all students and how many courses each student is enrolled in

SELECT StudentName, Students.StudentID, count(StudentCourses.CourseID) as [Cnt]
FROM Students LEFT JOIN StudentCourses
ON Students.StudentID = StudentCourses.StudentID
GROUP BY Students.StudentID, Students.StudentName

For reasons and incorrect implementations and their justification, see chaper 14 in book.

Query 2: Teacher class size

Get a list of all teachers and how many students they teach. If a teacher teaches the same student in two courses, double count the student. Sort the list in descending order of the number of students a teacher teaches

SELECT TeacherName, isnull(StudentSize.Number, 0)
FROM Teachers LEFT JOIN 
    (SELECT TeacherID, count(StudentCourses.CourseID) AS [Number]
    FROM Courses INNER JOIN StudentCourses
    ON Courses.CourseID = StudentCourses.CourseID
    GROUP BY Courses.TeacherID) StudentSize
ON Teachers.TeacherID = StudentSize.TeacherID
ORDER BY StudentSize.Number DESC

Small database design

How to design a small database

  1. Handle ambiguity: understand exactly what you need to design, consult with the interviewer
  2. Define the core objects: typically each core object translates into a table
  3. Analyze relationships: how tables are connected to each other
  4. Investigate actions: walk through the common actions that will be taken and understand how to store and retrieve the relevant data.

Large database design

When designing a large, scalable database, joins are generally very slow. You must denormalize your data. Duplicate the relevant data in many tables.