Making today worse so tomorrow seems better.

Bj Makes Attachment_fu Happy

on

The attachment_fu plugin for Rails is great, and it’s support for S3 as a backend sounds really handy. Sadly tho ugh, it isn’t practical for anything other a demo or a proof-of-concept. Luckily its limitations can be worked around with the cunning use of Bj .

When you upload a file to a Rails app, your browser waits while Mongrel buffers the file, then your browser waits some more while Rails processes the fully uploaded file. Normally this “processing” is simply copying the file from where Mongrel put it to where Rails wants it, no big whup. The problem arises when this “processing” is substantially more time consuming, like, say transmitting the file to S3. There’s no guarantee that this will take place in a reasonable amount of time and meanwhile, not only is the user’s browser getting nearer to timing out, but that whole Rails instance is going to block and no one else can use it either. Rails is single threaded, remember?

The solution is to write the file to disk like normal, then spawn a process in the background to take care of uploading it to S3. This is where Bj comes in. Install thusly:

./script/plugin install http://codeforpeople.rubyforge.org/svn/rails/plugins/bj

./script/bj setup

rake db:migrate

Bj is a light weight work queue that uses your app’s database as a store. Requests are put into the bj_job table and run one at a time outside of the mongrel_rails process. Say we have a model UploadFile that uses attachment_fu:

class UploadFile < ActiveRecord::Base
  has_attachment :storage => :file_system
  after_create :upload_to_s3

  BUCKET = 'your bucket'

  def s3_url(thumbnail = nil, use_https = false)
    if self.uploaded?
      "http#{'s' if use_https}://s3.amazonaws.com/#{BUCKET}/#{self.id}/#{self.filename}" 
    else
      self.public_filename
    end
  end

protected

  def upload_to_s3
    Bj.submit("./script/runner ./jobs/s3_uploader.rb #{self.id}")
  end

end

Note: for this to work your upload_files table will need a boolean column called uploaded with default false, along with the standard attachment_fu columns.

Whenever a new UploadFile is created, a job is queued to send it off to S3. Meanwhile, we can still read the file off the local filesystem before its upload is complete.

To define the actual job we create a RAILS_ROOT/jobs directory and put this file in as s3_uploader.rb:

# ./script/runner ./jobs/s3_uploader.rb <id>

require 'upload_file'
require 'aws/s3'
include AWS::S3

ACCESS_KEY = 'your access key'
SECRET_KEY = 'your secret key'
BUCKET     = 'your bucket'
FILE_ID    = ARGV[0]

file = UploadFile.find(FILE_ID)

Base.establish_connection!(:access_key_id     => ACCESS_KEY,
                           :secret_access_key => SECRET_KEY)

S3Object.store("/#{file.id}/#{file.filename}", 
               open(file.full_filename), 
               BUCKET,
               :access => :public_read)

file.update_attributes(:uploaded => true)

This will be run by Bj as its own process, leaving Rails to get on with life while the potentially slow upload to S3 drags on.

Couple exercises for the reader: clean up the files on the local filesystem after they’ve been successfully uploaded to S3, save memory by ditching ./script/runner and accessing MySQL directly without ActiveRecord, support for thumbnails…. A really slick one would be to retry the S3 upload by putting the job back in the queue if it fails.

In the past we solved this problem with ActiveMessaging but it’s fidgety to get that working right, it’s a memory hog, and deployment is a pain (the pollers don’t clean up well – we crashed a server once with 30 zombie pollers). There are still cases where you might want to use ActiveMessaging – one obvious one is if you need to talk to Java via ActiveMQ – but Bj wins for simplicity and ease of deployment.

Many MANY thanks to Ara Howard who wrote Bj and personally helped me get everything working properly.