Introduction to SQL Queries

January 27, 2020

This article is a written version of Rithm School’s SQL Query lecture. It follows our Introduction to Databases so the content here assumes some understanding of databases.

The goals of this lecture are:

  1. Learn core querying
  2. Learn about insertion, updating, and deletion
SQL Data Manipulation Language (DML)

With the practice we've gotten thus far, we know how to use SQL statements to communicate with databases.

SQL itself can be broken down into different categories of available statements. These different categories of statements are called sub-languages.

DML, or Data Manipulation Language, is one of these sub-languages. It is used to change records in the database that we're communicating with.

In the course of our work, most of the DML we'll be using will be related to CRUD (Create Read Update Delete) operations on rows.

The CRUD operations and their corresponding corresponds can be found below:

Letter Verb SQL Command
C Create INSERT INTO
R Read SELECT … FROM
U Update UPDATE … SET
D Delete DELETE FROM
SELECT

SELECT is the statement that we use to retreive rows from a table. It is also the most flexible and powerful command in SQL!

SELECT statements have subclauses, which, when performed in a specific order, allow us to tailor our queries to bring back exactly the data we want.

When used with the FROM keyword, we can specify the table that we'd like to retrieve records from:

-- Select all rows and all columns from the books table
SELECT * FROM books;

In the case above, by using the *, we're selecting all columns from every book in the books table.

💡Using the * (wildcard) character is generally not a good practice. Remember: we want as predictable behavior as possible in our code!

Imagine having an application where we've used the * character to bring back all columns. Later on, someone updates the database and adds a column for the Social Security Number of our users.

Because we haven't been explicit with what is being returned, we've now opened ourselves up to leaking private information we never intended to.

Always strive to be discerning and purposeful by selecting only the columns we actually need!
SELECT Certain Columns

If we do not want to return every single column in the database, and only care about a select few, we can narrow down the columns that are returned by specifying the column names we'd like returned:

-- Select two columns of all rows from the books table
SELECT title, author FROM books;

With the query above, the only columns returned in our results will be title and author.

We'll see shortly that we can retrieve rows from multiple tables and join the results together.

💡It's considered a good practice to "qualify" our column names with the table name. To qualify a column name, we prepend the table name before our column. From the example above, instead of title, we would write books.title.

When we get into more complex queries that draw from multiple tables, this qualification will help us easily read and identify which columns are coming from which table.
WHERE

WHERE allows us to filter the records that we retrieve using the SELECT statement.

For example, say we only want to return books that are less than $10:

-- Select all columns from cheap books
SELECT * FROM books WHERE price < 10;

Of all the individual records within the books table, our query will return only those that have a price greater than $10.

GROUP BY

Sometimes it's the case that we'd like to combine the records returned from our query into groups based upon some shared value.

The GROUP BY subclause allows us to do so. Using it will reduce the rows returned around a common column value.

For example, say we want to find all books that are over $10, but group them according to the author:

SELECT author, COUNT(*)
  FROM books
  GROUP BY author;

In this case, we are returned something that might look like this:

author count
Ernest Hemingway 24
James Joyce 9
F. Scott Fitzgerald 15

Even though we have multiple individual records in our books table that match our condition of being over $10, since we've used the GROUP BY subclause, those records are combined into one row, based on the author column value.

HAVING

The HAVING subclause allows us to take our filtering even further: when used after GROUP BY, we can indicate that we only want to return records whose column value satisfies certain criteria.

For example, say of the previous query where we grouped by author, we only want to return from those groups those authors whose book count is greater than 2:

SELECT author, COUNT(*)
  FROM books
  GROUP BY author
  HAVING COUNT(*) > 2;

In this case, we'd expect to only return Ernest Hemingway, since he's the only author in our database with a COUNT(*) greater than 2.

ORDER BY

As the name indicates, ORDER BY orders the rows that are returned to us.

By default, ORDER BY sorts records in ascending order, from A => Z.

SELECT title, author, price
  FROM books
  ORDER BY author;

The above query would retun to us the title, author, and price columns, sorted by the author's name.

LIMIT

If we'd like to limit the number of rows returned to us for a given query, we add the LIMIT subclause onto the very end:

-- Select 5 rows and three columns from the books table
SELECT title, author, price
  FROM books
  ORDER BY price
  LIMIT 5;
OFFSET

OFFSET allows us to skip rows.

For example, with the following query, the rows that are returned will not include the first row that would normally be returned:

SELECT title, author, price
  FROM books
  ORDER BY price
  OFFSET 1;
Pagination with LIMIT + OFFSET

When used together, the LIMIT and OFFSET subclauses can paginate our results.

Say, for example, we have hundreds of books in our books table. The user would like to see all of them, page by page, in our application.

To correspond to the first "page" of 50 books from our database, we'd use the following query:

SELECT title, author, price
  FROM books
  ORDER BY price
  LIMIT 50;

When the user clicks on something to navigate to the next 50 results in the database, we'd run the following query:

SELECT title, author, price
  FROM books
  ORDER BY price
  OFFSET 50
  LIMIT 50;

The above would skip the first 50 results from our table and return the rows 51-100, which is perfect since our user has already seen those -- they want to see the next 50 records!

SQL Operators

In SQL, as in most languages, there are reserved keywords: IN, NOT IN, BETWEEN, AND, and OR.

These SQL keywords, also known as operators, allow us to create more complex queries when used with the WHERE clause.

IN

IN allows us to return only rows whose column value exists within a provided collection of values:

-- Select all books if the id is 1, 12 or 30
SELECT * FROM books WHERE id IN (1, 12, 30);
AND and OR

AND and OR both allow us to combine multiple filtering conditions on the rows returned:

SELECT title, author
  FROM books
  WHERE price < 10
    AND page_count < 150;
SELECT title, author, publication_date, price
  FROM books
  WHERE publication_date > '01-01-2017'
    OR price > 100;
BETWEEN

Using BETWEEN paired withAND, we can define a range of values for a column to be filtered against:

-- Select all books of moderate length
SELECT *
  FROM books
  WHERE page_count BETWEEN 300 AND 500;

SQL Aggregates

SQL Aggregates are used to aggregate (or combine) rows together in order to extract data from the finished combination.

We've seen one example of an aggregate already: COUNT(*), which returns the total number of rows of a specified value.

Additional functions include: AVG, SUM, MIN, and MAX.

MIN and MAX

Both MIN and MAX are used to find the lowest and highest values for a given column of the rows that are queried.

For example, if we'd like to return the highest page count for all books:

-- Find the highest page_count from all books
SELECT MAX(page_count) FROM books;

And if we'd like to find the lowest price of all books:

-- Find lowest price
SELECT MIN(price) FROM books;
SUM and AVG

SUM can be used to return the total number of pages of all books in our table:

-- Find total number of pages
SELECT SUM(page_count) FROM books;

AVG allows us to find the average of all values in the column of our choosing.

In the case below, we will be returned the average price of all books:

-- Find the average price of all books
SELECT AVG(price) FROM books;

GROUP BY, HAVING and Aggregates

It is common to use GROUP BY and HAVING with aggregates.

-- Order authors from most to least prolific
SELECT author, SUM(page_count) AS total
  FROM books
  GROUP BY author
  ORDER BY total DESC;

In the above query, we are returning the columns author and total. The AS clause is used to alias (or nickname) a column of the SUM of all pages of the rows returned.

What we get back are all books, grouped by the author and ordered by the higheset most of pages to fewest.

Adding Rows to a Table with INSERT

To add new rows to a table, we use the INSERT clause:

-- Add a new book with title and author
INSERT INTO books (title, author)
  VALUES ('The Iliad', 'Homer');

In the case above, we are creating a brand new row in our table where the title column has a value of 'The Iliad' and the author column has a value of 'Homer'.

In the case that we'd like to add multiple records at one time:

-- Add three books, titles only
INSERT INTO books (title) VALUES
  ('War and Peace'),
  ('Emma'),
  ('Treasure Island');

Updating Existing Rows with UPDATE

To update a row in a table, we use the UPDATE clause:

-- Set one book's author column to 'Jane Austen'
UPDATE books SET author = 'Jane Austen' WHERE title = 'Emma';

Or, in the case we'd like to update all rows:

-- Set all books' author column to 'Ernest Hemingway'
UPDATE books SET author = 'Ernest Hemingway';

Deleting Rows with DELETE

To delete rows within a table, we use the DELETE clause:

-- Delete all books where the title is 'Emma'
DELETE FROM books WHERE title = 'Emma';

Finally, if we'd like to delete every row in our table:

-- Delete all books
DELETE FROM books;