Common Table Expressions in PostgresSQL


Getting a good handle on Common Table Expressions (CTEs) is something that has dramatically improved my ability to write more complex SQL queries as well as improved my understanding of how things fit together in a relational database. What follows is a brief tutorial geared towards newer SQL developers.

Common Table Expressions are queries that return a dataset that can then be used in another query. I think of them as the building blocks of a bigger query that are also helpful for organization and composition. Here is the basic syntax, taken from postgresqltutorial.com:

WITH cte_name (column_list) AS (
    CTE_query_definition 
)
statement;

Let's unpack this a bit. First, to create a CTE, you must always start by using the WITH keyboard. Then, you can give it any kind of name you'd like followed by an optional column list. The column list will name the return columns if you want, but I usually just name everything in the final statement. After tying the name to the query with the AS keyword, you can make any kind of SELECT, INSERT, UPDATE, or DELETE statement inside the parentheses. Lastly, you must create a SELECT statement at the end which pulls from the query definition.

Imagine a database with 2 tables, students and classes. We have recently created a third table, students_classes which will hold the links between the first two tables. We need to fill the new table programmatically so that every student in the students table will be linked to entry level English and Science classes.

WITH student_ids AS (
    SELECT id FROM students s
    WHERE s.id NOT IN (
        SELECT student_id 
        FROM students_classes
       )
   )
  , class_ids AS (
      SELECT id FROM classes c
      WHERE
      c.subject = 'English'
      OR
      c.subject = 'Science'
      AND c.level = 101
   )
   INSERT INTO students_classes
   (student_id, class_id)
   SELECT
   s.id AS student_id
   , c.id AS class_id
   FROM student_ids s,
   class_ids c;

First, I'm grabbing all of the Id numbers from the students table, taking care to only grab the ones that aren't already in the junction. Then, I get all the class Ids that I plan on inserting, in the second block. You can chain CTEs together endlessly by separating them with a comma and naming the next one. Finally, I use an INSERT statement that pulls from both of the temporary datasets, student_ids and class_ids. The CTE structure made it easy to grab just the data I needed from the first two tables and then insert data into students_classes based off of the information I collected previously. I'd encourage anyone interested in using CTEs to read through the postgreSQL tutorial website, (link posted above) and try them out as a means of breaking down a bigger problem into manageable parts.