The most important concept to understand when using includes and joins is they both have their optimal use cases. Includes uses eager loading whereas joins uses lazy loading, both of which are powerful but can easily be abused to reduce or overkill performance.
If we first take a look at the Ruby on Rails documentation, the most important point made in the description of the includes method is:
With includes, Active Record ensures that all of the specified associations are loaded using the minimum possible number of queries.
In other words, when querying a table for data with an associated table, both tables are loaded into memory which in turn reduce the amount of database queries required to retrieve any associated data. In the example below we are retrieving all companies which have an associated active Person record:
@companies = Company.includes(:persons).where(:persons => { active: true } ).all
@companies.each do |company|
company.person.name
end
When iterating through each of the companies and displaying the persons name, we would normally have to retrieve the persons name with a separate database query each time. However, when using the includes method, it has already eagerly loaded the associated person table, so this block only required a single query. Awesome, right?!
So what happens if I want to retrieve all companies with an active associated Person record, but I don’t want to display any data from the Person table? It’s starting to seem a tad overkill loading the associated table…well that’s where the joins method starts to shine!
If we use the above example again, we can start to see how easily people can become confused between the includes and joins method, when very little has changed:
@companies = Company.joins(:persons).where(:persons => { active: true } ).all
@companies.each do |company|
company.name
end
Visually the only difference is replacing the includes method call with joins, however under the hood there is a lot more going on. The joins method lazy loads the database query by utilising the associated table, but only loading the Company table into memory as the associated Person table is not required.Therefore we are not loading redundant data into memory needlessly; although if we wanted to use the Person table data later on from the same array variable, it would require further database queries.
I’m not convinced, I need some stats…stat!
Recently I fell victim to not using the awesome power behind the includes method in my Trado codebase. I noticed a severe performance leak when monitoring the database queries in my local instance server logs, which is a habit I would advise starting.
The following code in my index method was producing an abnormal amount of database queries, as seen below:
def index
@shippings = Shipping.active.all
respond_to do |format|
format.html # index.html.erb
format.json { render json: @shippings }
end
end
For every row in the table, it was making two database queries to grab data for the associated zones and tiers tables. When scaled up this starts to become a heavy load on resource with an Active Record loading time of 265.2ms. So in light of preserving scalability and performance in my application, I modified the index method to take of advantage of the includes method for the zones and tiers table associations:
def index
@shippings = Shipping.active.includes(:zones, :tiers).all
respond_to do |format|
format.html # index.html.erb
format.json { render json: @shippings }
end
end
The number of database queries has been reduced to an optimised number of just 5, which in turn drastically reduces the Active Record loading time to just 2.8ms – that’s a 99% reduction!
Source : http://tomdallimore.com/blog/includes-vs-joins-in-rails-when-and-where/
No comments:
Post a Comment