Joins in SQL

When learning SQL, one of the first trickier things to get your head around are the various ways to join one table to another. Here, I’m going to give a basic summary of each as a reference, and, for Rails developers, show what using that join looks like from Active Record.

For the purposes of these examples, we’ll assume our database has a users table and an organisations table, where a user has an organisation_id column linking it to the organisation it belongs to.

In Rails, this would look like:

class User
  belongs_to :organisation
end

class Organisation
  has_many :users
end

Inner join

SELECT u.name, o.name
FROM users u
  INNER JOIN organisations o
  ON users.organisation_id = organisations.id;

This will give us every matching combination. Here, we will get every user with a valid organisation_id and their organisation. This join does not enforce uniqueness in either direction.

In Rails, this is more or less the “default” join. We can simply:

User.joins(:organisation)

One related Rails method to be aware of also performs a specific type of inner join, to check an association exists without returning any data from it:

Organisation.where.associated(:users)

Left join

SELECT u.name, o.name
FROM users u
  LEFT JOIN organisations o
  ON users.organisation_id = organisations.id;

The left join—or left outer join—is very similar to the inner join, with one key difference: if a user has no corresponding organisation, it will still be returned, with NULL values in the organisation’s fields.

Rails has a specific method for this join:

User.left_joins(:organisation)
# or its alias
User.left_outer_joins(:organisation)

And we also have the inverse of where.associated available using this join:

User.where.missing(:organisation)

The left join is used to find missing associations as Rails can simply look for rows where the association fields are NULL. This technique can be referred to as an “anti join”, though that’s not a specific join type.

Right join

SELECT u.name, o.name
FROM users u
  RIGHT JOIN organisations o
  ON users.organisation_id = organisations.id;

The right join—or right outer join—is the inverse of the left join, and makes it obvious where the name comes from. It returns all rows from the table being joined to—i.e, the right table—and corresponding rows from the left, filling them with NULL values where they don’t exist.

Rails doesn’t provide specific methods for any other join types, but we can pass joins SQL snippets:

User.joins("RIGHT JOIN organisations ON users.organisation_id = organisations.id")

Full join

SELECT u.name, o.name
FROM users u
  FULL JOIN organisations o
  ON users.organisation_id = organisations.id;

A full join, as you might guess, goes both ways, effectively returning the combined results of a left and right join. It will return rows from both tables and add NULLs on either table where there is missing data.

User.joins("FULL JOIN organisations ON users.organisation_id = organisations.id")

Natural join and Join using

These two are basically just syntactic sugar sprinkled on the above join types, and can be used in conjunction with any of them. If you don’t specify a join type, they will default to an inner join.

These methods simply give a more succinct syntax when column names are shared across tables. If both users and organisations used an organisation_id column (not the standard Rails way to do things) you could use users NATURAL JOIN organisations or users JOIN organisations USING (organisation_id). As you can see, the difference between the two is the latter specifies the column name. A natural join will use all columns with the same name, which can lead to confusion when columns are renamed and generic id columns are used. I caution against relying on it.

Self join

SELECT u.name, m.name AS manager
FROM users u
  INNER JOIN users m
  ON u.manager_id = m.id;

Similar to the “anti join”, this is a technique rather than a specific join type but warrants highlighting. A self join simply requires referencing the same table twice—once with an alias—and then joining on that alias. It’s mostly useful where a table references itself, such as employees and their managers in a single table.

Cross join

SELECT u.name, o.name
FROM users u
  CROSS JOIN organisations o;

This is where things start getting more unusual. The cross join doesn’t take a join condition, but returns every combination across the two tables, or to give it a mathematical name, the Cartesian product of the two tables.

User.joins("CROSS JOIN organisations")