01 May 2021
There are a few options available to you for performing counts of records with ActiveRecord. You would be forgiven for believing that length
, size
and count
are just aliases for one another. In some cases this is true, but there is a clear reason to use one over another in certain situations, this post highlights when to use each option to obtain the best performance from your application.
We'll be referencing a small rails app called "cars", the source code for which can be found here for you to follow along. In this example we are aiming to count the number of cars that are currently parked in each garage in our database. There is a one to many relationship between Garage & Car.
We want to get a count of all of the cars in each of our garages and in order to do this we can grab all of our Garage
records and iterate through them to fetch every Car
that is parked there. In order to prevent N+1 queries we are using #includes:
Garage.includes(:cars).map { |garage| garage.cars.count }
#=> [3, 2, 1, 1, 0]
This gives us exactly what we need, however if we look at our SQL logs we can see that every call to garage.cars.count
is making a request to the database to perform that count:
Garage Load (0.4ms) SELECT "garages".* FROM "garages"
Car Load (0.3ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" IN ($1, $2, $3, $4, $5) [[nil, 1], [nil, 2], [nil, 3], [nil, 4], [nil, 5]]
(0.5ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 1]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 2]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 3]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 4]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 5]]
The first line indicates fetching all of our Garage
records from the database. The second line is ActiveRecord
eager loading our Car
records from the database in the hope of preventing N+1 queries. However we can then see five separate requests to the database to fetch the Car
count for each Garage
. What the heck, using #includes
should mean this doesn't happen right?
Well it turns out that #count
will always make a request to the database to fetch the count of the records we want, regardless of whether we have eager loaded those records or not. This means that we might as well have done the following to save us a request to also fetch all of the cars from the database:
Garage.all.map { |garage| garage.cars.count }
Which gives us the same result but with one less database hit:
Garage Load (0.4ms) SELECT "garages".* FROM "garages"
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 1]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 2]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 3]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 4]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 5]]
So how can we make the most of eager loading to count record associations? Let's have a look at #length
.
Let's perform the same operation, substituting count for length:
Garage.includes(:cars).map { |garage| garage.cars.length }
#=> [3, 2, 1, 1, 0]
Much the same on the face of it but if we check the SQL logs we can see that there were no extra requests to the database in order to count the number of cars:
Garage Load (0.3ms) SELECT "garages".* FROM "garages"
Car Load (0.3ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" IN ($1, $2, $3, $4, $5) [[nil, 1], [nil, 2], [nil, 3], [nil, 4], [nil, 5]]
The reason for this is because ActiveRecord
is now honouring our eager loading of Car
records. As it's already performed a query to load all Car
records associated with each Garage
it can calculate the number of cars per garage in memory, without needing to hit the database again. It should be noted though, that if you use length when records are not eager loaded it will actually load all of the records you are counting from the database into memory and then count them in Ruby:
Garage.all.map { |g| g.cars.length }
Produces SQL logs:
Garage Load (0.3ms) SELECT "garages".* FROM "garages"
Car Load (0.2ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 1]]
Car Load (0.1ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 2]]
Car Load (0.1ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 3]]
Car Load (0.1ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 4]]
Car Load (0.1ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 5]]
Rails has one more trick up its sleeve when it comes to counting records in this way and it's size. On the face of it size performs the same as length:
Garage.includes(:cars).map { |garage| garage.cars.size }
#=> [3, 2, 1, 1, 0]
Producing the queries:
Garage Load (0.3ms) SELECT "garages".* FROM "garages"
Car Load (0.3ms) SELECT "cars".* FROM "cars" WHERE "cars"."garage_id" IN ($1, $2, $3, $4, $5) [[nil, 1], [nil, 2], [nil, 3], [nil, 4], [nil, 5]]
Size has a benefit over length in that it will always perform an SQL COUNT
statement if the records are not eagerly loaded, rather than loading them all into memory:
Garage.all.map { |garage| garage.cars.size }
Producing the queries:
Garage Load (0.3ms) SELECT "garages".* FROM "garages"
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 1]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 2]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 3]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 4]]
(0.2ms) SELECT COUNT(*) FROM "cars" WHERE "cars"."garage_id" = $1 [["garage_id", 5]]
Another perk of this method is that it will also observe a counter_cache if one is present for the association. A big plus!
So what does this all mean for performance? Let's compare the options (number in brackets it total execution time):
# count
Benchmark.measure { Garage.includes(:cars).map { |g| g.cars.count } }
#=> 0.004082 0.000419 0.004501 ( 0.006126)
# length
Benchmark.measure { Garage.includes(:cars).map { |g| g.cars.length } }
#=> 0.001571 0.000140 0.001711 ( 0.002080)
# size
Benchmark.measure { Garage.includes(:cars).map { |g| g.cars.size } }
#=> 0.001596 0.000114 0.001710 ( 0.002063)
As you can see count
is by far the slowest as it completely ignores the eager loading, while length
and size
are pretty much identical because they are both using length
under the hood.
So which option should you use? In this scenario you should use either length
or size
because we have eager loaded the cars associations. However, in general it's recommended to use size
as it will automatically delegate to either count
or length
depending on whether the association is loaded or not. It will also observe any counter caches that you have on your models.