Introduction to Postgres in Node
February 12, 2020
The goals of this lecture are:
- Use
pg
to connect and execute SQL queries - Understand what SQL injection is and how to prevent it
- Examine CRUD on a single resource with Express and
pg
The Node SQL Ecosystem
Before we dive into pg
as a library, let's first review the options we have when we've determined that we want to use SQL in our Node project.
Let's cover each of our options, from most "full-featured" to least.
ORMs
Previously in our work with Python, we learned about ORMs. ORMs (Object Relational Mappers) are our "heavy-lifters", handling for us the mapping of our relational database to model classes. These relationships are what allow us to write our code in the Object-Oriented way we've become accustomed to.
In Node, a popular choice for this category is Sequelize.
Query Builders
A step below ORMs sit our query builders. Query builders provide us with a thin layer of abstraction over our database, allowing us to use query-like methods in place of raw SQL.
We can even chain methods to create more complex queries, which should feel familiar to the JavaScript (particularly jQuery) that we've written previously:
const student = await knex.first("fname", "lname")
.from("students")
.where("id", id);
Knex, used above, is one of the available options within this category.
Drivers
Finally, we have our third and "lowest level" option, and the one that we'll be covering today: drivers.
Drivers allow us to connect directly to our database and send in SQL queries to our database. If we think back to our previous work in Python, the library psycopg2
did exactly this.
The Node driver that we'll be using (and covering in the remainder of this post) is pg
.
pg
Installing pg
To install pg
, we run the following command within our project:
$ npm install pg
pg + Database Configuration
As we always do, we first need to set up pg
to understand what database it's going to be communicating with.
To do so, we'll create a separate db.js
file:
/** Database setup for users. */
const { Client } = require("pg");
let DB_URI;
if (process.env.NODE_ENV === "test") {
DB_URI = "postgresql:///users_test";
} else {
DB_URI = "postgresql:///users";
}
let db = new Client({
connectionString: DB_URI
});
db.connect();
module.exports = db;
The code above does several important things for us:
- Identifies Database(s) Specifies which database to connect to, based on our current environment [
Lines 7-11
]. - Establishes Database Connection Passing in the address of our database to a new instance of
Client
, we establish a connection to our database [Lines 13-17
]. - Export Database Connection We export the database connection we've established for us in the rest of our Node application [
Line 19
].
pg Queries
Now that we've established a connection with our database, let's look at how to execute some SQL queries.
To start, at the top of the file we want to make queries, we need to be sure to import our database connection:
const db = require("../db");
In our example, we will be working in our routes.js
file, which will hold all of our application's routes.
In the snippet below, we're creating a route to get all of the users in our database. When a user visits our /all
endpoint, we run a SQL query and return the results as JSON:
router.get("/all", function (req, res, next) {
const results = db.query(`SELECT * FROM users`);
return res.json(results.rows);
});
But there's a problem here! This res
won't actually return any users, even if they exist as rows in our users
table.
Database Queries are Asynchronous
In order to return the results of a query, we have to make sure we (a)wait for the operation to be completed before we return it.
With a minor refactor to include async
and await
, our query will return the results we'd expect:
router.get("/all", async function (req, res, next) {
const results = await db.query(`SELECT * FROM users`);
return res.json(results.rows);
});
Query String Interpolation
Sometimes it's the case that we'd like to send dynamic information to our queries.
Perhaps our end user is searching for a particular keyword or column value, and we'd like to incorporate that directly into the query we're executing:
router.get("/search", async function (req, res, next) {
const type = req.body.type;
const results = await db.query(
`SELECT id, name, type
FROM users
WHERE type='${type}'`);
return res.json(results.rows);
});
In the case above, we're accepting a type
of employee that has been submitted by the user. This query would return exactly what we'd expect, but there is a huge problem with this code.
SQL Injection
SQL injection is a technique where an attacker executes undesirable SQL statements on a database.
If we look back to our previous example, as long as the user behaves in the way that we think (i.e., sending in an innocent string value), there is no problem.
However, we should always be coding defensively. If we assume instead that our user is not a good actor, but a malicious one, we discover that our use of direct string interpolation allows a little too much freedom by way of query execution.
For example, if our user searches for a type
of "staff", everything works fine.
But, if they instead set the type
to "bwah-hah'; DELETE FROM users; --"
, we have a problem.
The resulting query would be as follows:
SELECT id, type, name
FROM users
WHERE type='bwah-hah'; DELETE FROM users; --'
By placing the user input directly into our query, we are also giving them the ability to hijack the query, terminate it early, and finish it off with their own subclause (in this case, DELETE
).
How can we avoid being the victims of a SQL injection attack?
Parameterized Queries
By using a feature called parameterized queries, we can ensure that any values injected into our queries that come from our users are sanitized and therefore safe to run against our database.
In the past, when we've worked with an ORM to communicate SQL to our database, we haven't had to worry about this. That's because the ORM, doing many things for us, also took care of this sanitization.
Instead of sending in the user-submitted values directly, we instead will write placeholders in the query, and then send in an array of the actual values as our second argument, as seen below:
router.get("/users", async function (req, res, next) {
const type = req.body.type;
const results = await db.query(`SELECT id, name, type
FROM users
WHERE type=$1`, [type]);
return res.json(results.rows);
});
Take note of the parameterized query on Lines 4-6, which is the first argument to our db.query()
method.
We've used a value placeholder, which is designated by the $1
. We've also passed in an array of the value, corresponding to the placeholder, as the second argument of the function.
If we were including more than one variable, our placeholders would continue on with $2
, $3
, etc., and those values would also appear -- in order -- in the array that follows.
We can see this in action in the route snippet below:
router.post("/", async function (req, res, next) {
const { name, type } = req.body;
const result = await db.query(
`INSERT INTO users (name, type)
VALUES ($1, $2)
RETURNING id, name, type`,
[name, type]);
return res.status(201).json(result.rows[0]);
});
Notice that we're using two variables (with placeholders $1
and $2
in our query), whose actual values we're passing in as the second argument on Line 8.
RETURN
clause to specify which columns we'd like returned in our response. Similar to what we've learned about using *
, it follows that it's always better to be explicit and specific about the values you'd like to return, rather than returning everything in one go.
Committing Database Changes
During our work with SQLAlchemy, we got into the habit of having to explicitly commit our changes. This was because SQLAlchemy put the work that we did into a database transaction.
With pg
, this is not the case: every action we make against our database is committed automatically.
Testing Our Database
As we saw in our setup for pg
within our db.js
file at the beginning of this post, we want to make sure that we have a separate database for testing purposes.
To do so, we put some conditional logic at the top of our db.js
file that indicates if our process is running in a test environment, we will use the test database, rather than production:
const { Client } = require("pg");
const DB_URI = (process.env.NODE_ENV === "test")
? "postgresql:///cats_test"
: "postgresql:///cats";
let db = new Client({
connectionString: DB_URI
});
db.connect();
module.exports = db;
Setting Up and Tearing Down Test Database
Before writing any of our actual tests, in our test.js
file, we need to be sure that we're setting the NODE_ENV
to test
, so that we connect with the test database rather than our production database.
After that, we import any dependencies we need, as well as our applciation and our database connection:
// connect to right DB --- set before loading db.js
process.env.NODE_ENV = "test";
// npm packages
const request = require("supertest");
// app
const app = require("../app");
const db = require("../db");
Once we've set up our database connection, as well as brought in everything we need, we can start writing our tests.
beforeEach
With beforeEach
, we specify something that we want to occur before each one of our tests is run. In this case, we'd like to create a single record in the test database that we can work with:
let testUser;
beforeEach(async function() {
let result = await db.query(`
INSERT INTO
cats (name) VALUES ('testUser')
RETURNING id, name`);
testUser = result.rows[0];
});
afterEach and afterAll
Using afterEach
, we clean up our test database by deleting all records that were inadvertantly created during the course of our last test:
afterEach(async function() {
// delete any data created by test
await db.query("DELETE FROM cats");
});
Finally, we use afterAll
to close our database connection once all of our tests have finished running:
afterAll(async function() {
// close db connection
await db.end();
});
Testing CRUD Actions
Let's look at some example tests, given what we've covered so far.
In the case that we'd like to ensure that our route is returning all users, our test would look like the following:
describe("GET /users", function() {
test("Gets a list of 1 user", async function() {
const response = await request(app).get(`/users`);
expect(response.statusCode).toEqual(200);
expect(response.body).toEqual({
users: [testUser]
});
});
});
And, if we'd like to test an endpoint for a specific user:
describe("GET /users/:id", function() {
test("Gets a single user", async function() {
const response = await request(app).get(`/users/${testUser.id}`);
expect(response.statusCode).toEqual(200);
expect(response.body).toEqual({user: testUser});
});
test("Responds with 404 if can't find user", async function() {
const response = await request(app).get(`/users/0`);
expect(response.statusCode).toEqual(404);
});
});
To test the creation of a user:
describe("POST /users", function() {
test("Creates a new user", async function() {
const response = await request(app)
.post(`/users`)
.send({
name: "Alissa"
});
expect(response.statusCode).toEqual(201);
expect(response.body).toEqual({
user: {id: expect.any(Number), name: "Alissa"}
});
});
});
To test the update of a user:
describe("PATCH /users/:id", function() {
test("Updates a single user", async function() {
const response = await request(app)
.patch(`/users/${testUser.id}`)
.send({
name: "Troll"
});
expect(response.statusCode).toEqual(200);
expect(response.body).toEqual({
user: {id: testUser.id, name: "Troll"}
});
});
test("Responds with 404 if can't find user", async function() {
const response = await request(app).patch(`/users/0`);
expect(response.statusCode).toEqual(404);
});
});
And, last but not least, testing to ensure user deletion works as expected:
describe("DELETE /users/:id", function() {
test("Deletes a single a user", async function() {
const response = await request(app)
.delete(`/users/${testUser.id}`);
expect(response.statusCode).toEqual(200);
expect(response.body).toEqual({ message: "User deleted" });
});
});