Harry's Engineering

Harry's
Engineering
Blog

29.07.14

Hacking your Bundler groups for fun and profit

By: Daniel Schwartz

TL;DR: We reduced our global memory footprint by ~40% by introducing a new style of Bundler groupings.

By default, Bundler groups in Rails are set up by environment. While this might not be a problem for single server applications, it starts to become a problem for multi-server applications, which encompass most complex applications out there these days. Many PaaS providers (Heroku included) will even force you into breaking up your application into logical processes, which is the right thing to do. Heroku denotes the different process types in something they call a Procfile. Consider the Procfile below:

web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
scheduler: bundle exec rake resque:scheduler
resque: bundle exec resque-pool

As you can see above, we have 3 different types of processes:

  1. web: Answers web requests
  2. scheduler: Runs the Resque Scheduler process for cron and interval type jobs
  3. resque: Runs a master process which will spawn child Resque workers

Now consider the Gemfile below:

source 'https://rubygems.org'
ruby '2.0.0'

# These get included everywhere!
gem 'rails', '3.2.19'
gem 'unicorn'
gem 'rack-timeout'
gem 'activeadmin'
gem 'resque'
gem 'resque-pool'
gem 'resque-scheduler'


# These are dev only deps
group :development do
    gem 'pry'
    gem 'rb-fsevent'
end

# These are test only deps
group :test do
    gem 'capybara-firebug'
    gem 'selenium-webdriver'
end

# These are things we only use on production
group :production do
    gem 'rack-attack'
end

As you can see above, in production, we are including Unicorn, Rack::Timeout, Rack::Attack, and ActiveAdmin everywhere, even in our Scheduler and Resque processes. Again, for simple applications this might be ok, but we’re incurring lots of memory overhead by requiring these gems in places we are never going to use them. In real world terms, this meant we were getting memory usage alerts/errors starting up just 4 Resque workers on a single 1X dyno. Bringing up our environment with all our gems was taking 140MB per process, before we processed a single action or request! For example, ActiveAdmin alone added 20.87MB of overhead. That’s a pretty crazy amount, especially since we never use it on Scheduler and Resque process/server types.

Having our work cut out for us, we got to thinking, and of course turned to StackOverflow. Using the script from StackOverflow, we were able to create a list of our Gems and their memory consumption on environment initialization. To fix our memory consumption issues, we subdivided our Gems into both process types and environments, and ended up with the solution below. The above Gem file can now be written like this:

source 'https://rubygems.org'
ruby '2.0.0'

# These get included everywhere!
group :default do
    gem 'rails', '3.2.19'
    # This still needs to be included everywhere so we can queue jobs
    gem 'resque'
end

# Web only group, regardless of environment
group :web do
    gem 'unicorn'
    gem 'rack-timeout'
    gem 'activeadmin'
end

# Resque only group, regardless of environment
group :resque do
    gem 'resque-pool'
end

# Scheduler only group, regardless of environment
group :scheduler do
    gem 'resque-scheduler'
end

# Development only, regardless of PROCESS_TYPE
group :development do
    gem 'pry'
    gem 'rb-fsevent'
end

# Test only, regardless of PROCESS_TYPE
group :test do
    gem 'capybara-firebug'
    gem 'selenium-webdriver'
end

# A group that will only load on the web PROCESS_TYPE in production
group :web_production do
    gem 'rack-attack'
end

Sure, the above is more verbose, but you’re saving a ton of memory! To get this to work natively within Rails we added the following to our application.rb file:

# Make sure we have a process type variable
if ENV.has_key?("PROCESS_TYPE")
    # If we do, assuming its a comma seperated list
    ENV["PROCESS_TYPE"].split(",").each { |type|
        # Require the current process type, and
        # current process type and environment joined by an underscore
        Bundler.require(type, "#{type}_#{Rails.env}")
    }
end

And changed our Procfile to:

web: env PROCESS_TYPE=web bundle exec unicorn -p $PORT -c ./config/unicorn.rb
scheduler: env PROCESS_TYPE=scheduler bundle exec rake resque:scheduler
resque: env PROCESS_TYPE=resque bundle exec resque-pool

The PROCESS_TYPE environment variable is how we tell Bundler and our application which Gems to require. In your development environment, if you don’t care about memory usage, you can always set PROCESS_TYPE=web,resque,scheduler and our code will simply require all the gems in our Gemfile for development.

Using the above method, we took our Resque memory usage from 140MB to 100MB per process, and our Scheduler process from 140MB to 95MB. Saving us between 30% and 40% respectively!