Track Your Report

Enter your SolveTO reference (STO-A3F7K2) or your city 311 reference number.

rails solid-queue ruby debugging

Solid Queue recurring jobs silently break with Ruby keyword arguments

Ahmed Nadar · · 4 min read

A background job failed 55 times in two days and I didn’t know until I manually checked.

Let me tell you how a single colon cost me two days of stale dashboard data, and what I learned about a gap in Solid Queue that nobody really warns you about.

The setup

I have a job that refreshes materialized views for our analytics dashboard. It runs every hour. It had been running fine for weeks. Then on March 27th it stopped working.

Every hour, Solid Queue picked it up, tried to run it, got an ArgumentError, and quietly dumped it into solid_queue_failed_executions. No alert. No notification. Just a counter going up while my dashboard served numbers from two days ago.

Here’s the job:

class RefreshAnalyticsViewsJob < ApplicationJob
  queue_as :default

  def perform(view: "all")
    case view
    when "daily"
      refresh_view("daily_report_stats")
    when "monthly"
      refresh_view("monthly_kpis")
    end
  end
end

And here’s how it’s scheduled in config/recurring.yml:

refresh_daily_report_stats:
  class: RefreshAnalyticsViewsJob
  args: { view: "daily" }
  schedule: every hour

Looks perfectly fine, right? args: { view: "daily" } maps to perform(view: "daily"). That’s what I thought too.

Here’s where it gets weird

Solid Queue doesn’t pass those args the way you’d expect.

When Solid Queue deserializes from YAML, that args: { view: "daily" } becomes perform({"view" => "daily"}). A positional hash with string keys. But the job signature expects perform(view: "all"), a keyword argument with a symbol key.

If you’ve been writing Ruby for a while, this one might sting. We all got used to Ruby being forgiving about hash-to-kwargs conversion. Ruby 2.x would quietly handle this. Ruby 3.0+ does not. You get:

ArgumentError: wrong number of arguments (given 1, expected 0)

The job expects zero positional arguments. It receives one. Ruby 3.3.3 raises. Solid Queue catches the error, records the failure, and moves on to the next job. Silently.

The fix

One character:

# Before
def perform(view: "all")

# After
def perform(view = "all")

Keyword argument becomes a positional argument. Now when Solid Queue passes {"view" => "daily"}, Ruby assigns it to view. The case statement matches the string "daily" just fine.

That’s it. One colon removed. 55 failures resolved.

And here’s the part that bothered me most

My tests were green. All of them.

Unit tests call the job directly:

RefreshAnalyticsViewsJob.perform_now(view: "daily")

This passes a keyword argument from Ruby code. Ruby is happy. The test passes. But that’s not how Solid Queue calls the job. Solid Queue deserializes from the database and passes positionally. The test never exercises that path.

There’s no straightforward way to write a Minitest that says “schedule this job via recurring.yml and verify it actually runs.” So the test suite was green while production failed every hour for two days. That’s a humbling gap.

The thing about Solid Queue that nobody warns you about

Solid Queue doesn’t notify you when jobs fail. It inserts a row into solid_queue_failed_executions and moves on. If you don’t check that table, you don’t know.

I have a RecoverStuckReportsJob that monitors my pipeline jobs: report analysis, enrichment, delivery. But it only watches specific job classes. RefreshAnalyticsViewsJob isn’t a pipeline job. It’s infrastructure. Nobody was watching the watchers.

And the effect of this particular failure, stale dashboard data, isn’t something anyone would immediately notice. The dashboard still loads. It still shows numbers. They’re just from two days ago. You’d have to know what today’s numbers should be to realize something is wrong.

That’s what makes this class of bug so frustrating. The system is designed to be resilient. Solid Queue doesn’t crash when a job fails. It records the failure and keeps processing other jobs. That’s the right design for a queue system. But resilience without observability is just silent failure.

Two changes came out of this

The first one is obvious. Use positional args for any job triggered by recurring.yml. Keyword arguments work fine when you call perform_later(view: "daily") from Ruby code. They break when Solid Queue deserializes from YAML. If a job is triggered both ways, use a positional arg. Save yourself the debugging.

The second one I should have done from day one. Monitor the failed executions table. SolidQueue::FailedExecution.count should be zero, always. If it’s not, something is silently broken somewhere. A simple recurring check that alerts when this count is non-zero would have caught this in the first hour, not after 55 failures over two days.

Check yours

If you’re using Solid Queue with recurring.yml, take a minute and check two things:

  1. Do any of your recurring jobs use keyword arguments in perform?
  2. What’s your SolidQueue::FailedExecution.count right now?

You might be surprised. I was.