Introduction to SQL Queries
January 27, 2020
The goals of this lecture are:
- Learn core querying
- Learn about insertion, updating, and deletion
SQL Data Manipulation Language (DML)
With the practice we've gotten thus far, we know how to use SQL statements to communicate with databases.
SQL itself can be broken down into different categories of available statements. These different categories of statements are called sub-languages.
DML, or Data Manipulation Language, is one of these sub-languages. It is used to change records in the database that we're communicating with.
In the course of our work, most of the DML we'll be using will be related to CRUD (Create Read Update Delete) operations on rows.
The CRUD operations and their corresponding corresponds can be found below:
Letter | Verb | SQL Command |
---|---|---|
C | Create | INSERT INTO |
R | Read | SELECT … FROM |
U | Update | UPDATE … SET |
D | Delete | DELETE FROM |
SELECT
SELECT
is the statement that we use to retreive rows from a table. It is also the most flexible and powerful command in SQL!
SELECT
statements have subclauses, which, when performed in a specific order, allow us to tailor our queries to bring back exactly the data we want.
When used with the FROM
keyword, we can specify the table that we'd like to retrieve records from:
-- Select all rows and all columns from the books table
SELECT * FROM books;
In the case above, by using the *
, we're selecting all columns from every book in the books
table.
*
(wildcard) character is generally not a good practice. Remember: we want as predictable behavior as possible in our code!
Imagine having an application where we've used the
*
character to bring back all columns. Later on, someone updates the database and adds a column for the Social Security Number of our users.
Because we haven't been explicit with what is being returned, we've now opened ourselves up to leaking private information we never intended to.
Always strive to be discerning and purposeful by selecting only the columns we actually need!
SELECT Certain Columns
If we do not want to return every single column in the database, and only care about a select few, we can narrow down the columns that are returned by specifying the column names we'd like returned:
-- Select two columns of all rows from the books table
SELECT title, author FROM books;
With the query above, the only columns returned in our results will be title
and author
.
We'll see shortly that we can retrieve rows from multiple tables and join the results together.
title
, we would write books.title
.
When we get into more complex queries that draw from multiple tables, this qualification will help us easily read and identify which columns are coming from which table.
WHERE
WHERE
allows us to filter the records that we retrieve using the SELECT
statement.
For example, say we only want to return books that are less than $10:
-- Select all columns from cheap books
SELECT * FROM books WHERE price < 10;
Of all the individual records within the books
table, our query will return only those that have a price greater than $10.
GROUP BY
Sometimes it's the case that we'd like to combine the records returned from our query into groups based upon some shared value.
The GROUP BY
subclause allows us to do so. Using it will reduce the rows returned around a common column value.
For example, say we want to find all books that are over $10, but group them according to the author:
SELECT author, COUNT(*)
FROM books
GROUP BY author;
In this case, we are returned something that might look like this:
author | count |
---|---|
Ernest Hemingway | 24 |
James Joyce | 9 |
F. Scott Fitzgerald | 15 |
Even though we have multiple individual records in our books table that match our condition of being over $10, since we've used the GROUP BY
subclause, those records are combined into one row, based on the author
column value.
HAVING
The HAVING
subclause allows us to take our filtering even further: when used after GROUP BY
, we can indicate that we only want to return records whose column value satisfies certain criteria.
For example, say of the previous query where we grouped by author
, we only want to return from those groups those authors whose book count is greater than 2:
SELECT author, COUNT(*)
FROM books
GROUP BY author
HAVING COUNT(*) > 2;
In this case, we'd expect to only return Ernest Hemingway, since he's the only author in our database with a COUNT(*)
greater than 2.
ORDER BY
As the name indicates, ORDER BY
orders the rows that are returned to us.
By default, ORDER BY
sorts records in ascending order, from A => Z.
SELECT title, author, price
FROM books
ORDER BY author;
The above query would retun to us the title
, author
, and price
columns, sorted by the author's name.
LIMIT
If we'd like to limit the number of rows returned to us for a given query, we add the LIMIT
subclause onto the very end:
-- Select 5 rows and three columns from the books table
SELECT title, author, price
FROM books
ORDER BY price
LIMIT 5;
OFFSET
OFFSET
allows us to skip rows.
For example, with the following query, the rows that are returned will not include the first row that would normally be returned:
SELECT title, author, price
FROM books
ORDER BY price
OFFSET 1;
Pagination with LIMIT + OFFSET
When used together, the LIMIT
and OFFSET
subclauses can paginate our results.
Say, for example, we have hundreds of books in our books
table. The user would like to see all of them, page by page, in our application.
To correspond to the first "page" of 50 books from our database, we'd use the following query:
SELECT title, author, price
FROM books
ORDER BY price
LIMIT 50;
When the user clicks on something to navigate to the next 50 results in the database, we'd run the following query:
SELECT title, author, price
FROM books
ORDER BY price
OFFSET 50
LIMIT 50;
The above would skip the first 50 results from our table and return the rows 51-100, which is perfect since our user has already seen those -- they want to see the next 50 records!
SQL Operators
In SQL, as in most languages, there are reserved keywords: IN
, NOT IN
, BETWEEN
, AND
, and OR
.
These SQL keywords, also known as operators, allow us to create more complex queries when used with the WHERE
clause.
IN
IN
allows us to return only rows whose column value exists within a provided collection of values:
-- Select all books if the id is 1, 12 or 30
SELECT * FROM books WHERE id IN (1, 12, 30);
AND and OR
AND
and OR
both allow us to combine multiple filtering conditions on the rows returned:
SELECT title, author
FROM books
WHERE price < 10
AND page_count < 150;
SELECT title, author, publication_date, price
FROM books
WHERE publication_date > '01-01-2017'
OR price > 100;
BETWEEN
Using BETWEEN
paired withAND
, we can define a range of values for a column to be filtered against:
-- Select all books of moderate length
SELECT *
FROM books
WHERE page_count BETWEEN 300 AND 500;
SQL Aggregates
SQL Aggregates are used to aggregate (or combine) rows together in order to extract data from the finished combination.
We've seen one example of an aggregate already: COUNT(*)
, which returns the total number of rows of a specified value.
Additional functions include: AVG
, SUM
, MIN
, and MAX
.
MIN and MAX
Both MIN
and MAX
are used to find the lowest and highest values for a given column of the rows that are queried.
For example, if we'd like to return the highest page count for all books:
-- Find the highest page_count from all books
SELECT MAX(page_count) FROM books;
And if we'd like to find the lowest price of all books:
-- Find lowest price
SELECT MIN(price) FROM books;
SUM and AVG
SUM
can be used to return the total number of pages of all books in our table:
-- Find total number of pages
SELECT SUM(page_count) FROM books;
AVG
allows us to find the average of all values in the column of our choosing.
In the case below, we will be returned the average price of all books:
-- Find the average price of all books
SELECT AVG(price) FROM books;
GROUP BY, HAVING and Aggregates
It is common to use GROUP BY
and HAVING
with aggregates.
-- Order authors from most to least prolific
SELECT author, SUM(page_count) AS total
FROM books
GROUP BY author
ORDER BY total DESC;
In the above query, we are returning the columns author
and total
. The AS
clause is used to alias (or nickname) a column of the SUM
of all pages of the rows returned.
What we get back are all books, grouped by the author and ordered by the higheset most of pages to fewest.
Adding Rows to a Table with INSERT
To add new rows to a table, we use the INSERT
clause:
-- Add a new book with title and author
INSERT INTO books (title, author)
VALUES ('The Iliad', 'Homer');
In the case above, we are creating a brand new row in our table where the title
column has a value of 'The Iliad' and the author
column has a value of 'Homer'.
In the case that we'd like to add multiple records at one time:
-- Add three books, titles only
INSERT INTO books (title) VALUES
('War and Peace'),
('Emma'),
('Treasure Island');
Updating Existing Rows with UPDATE
To update a row in a table, we use the UPDATE
clause:
-- Set one book's author column to 'Jane Austen'
UPDATE books SET author = 'Jane Austen' WHERE title = 'Emma';
Or, in the case we'd like to update all rows:
-- Set all books' author column to 'Ernest Hemingway'
UPDATE books SET author = 'Ernest Hemingway';
Deleting Rows with DELETE
To delete rows within a table, we use the DELETE
clause:
-- Delete all books where the title is 'Emma'
DELETE FROM books WHERE title = 'Emma';
Finally, if we'd like to delete every row in our table:
-- Delete all books
DELETE FROM books;