Aws::Rails::SqsActiveJob#execute captures StandardError which is too wide #120

jeropaul · 2024-05-14T05:03:29Z

Steps to reproduce

Set up AWS RDS to manage master passwords and rotate
Configure your database details via environment properties
Wait for your DB password to change

Once step 3 occurs my ActiveRecord jobs start throwing ActiveRecord::NoDatabaseError which is a subclass of StandardError

The current behaviour of Aws::Rails::SqsActiveJob#execute captures StandardError.

Expected behavior

I would expect an ActiveRecord::NoDatabaseError NOT to be caught.
I would expect the poller to die.

In my case the poller process finishing would kill the container which would be rescheduled with refreshed environment properties.

Actual behavior

The poller continues consuming from the queue, retrying messages at a great rate of speed.

System configuration

Rails version: 7.1

Ruby version: 3.3

The text was updated successfully, but these errors were encountered:

jterapin · 2024-05-14T16:15:21Z

Thank you for the submitting this issue! We will investigate and get back to you.

alextwoods · 2024-05-14T16:30:47Z

This is a bit tricky - the logic for retrying StandardErrors is that arbitrary user code may throw transient StandardErrors which generally should be retried.

You can disable this behavior by configuring standard_error_retry to false (this logic was added recently in #115). However, this discards jobs rather than causing the entire poller to fail and stop running - in general, the poller was meant to be robust to errors in jobs. However, this seems like a special case where we want to detect such errors and actually kill the poller. I think we would need to come up with a list of these exceptions. ActiveRecordError - the superclass of NoDatabaseError - is likely too broad.

jeropaul · 2024-05-14T23:08:31Z

I don't expect that it will be possible to come up with a full and complete list of error classes that should be ignored.

Personally I'd be happy if I could configure a list of error classes that should be re-raised by the executor with some really tight default that I can expand over time.

In terms of prior art and it is a slightly different use case but off the top of my head NewRelic allows error classes to be ignored via configuration

Which for one of my apps looks like:

  error_collector:
    ignore_errors: "ActionController::RoutingError,Sinatra::NotFound,ActionDispatch::Http::MimeNegotiation::InvalidType"

TLDR:

continue to catch StandardError
re-raise if the class is included in a configured list

alextwoods · 2024-05-15T19:43:52Z

I've been doing some more investigation and thinking on this. I agree we can't really come up with a complete or useful list of error classes that should be considered fatal. (Side note: fatal may be a better description than ignored, since its some what the opposite. Those errors should cause the poller to raise the error and cause the worker to exit rather than being ignored).

I think rather that configuring a list, we should follow the user configured error handlers. That error handler could choose to re-raise specific exceptions, causing the poller to fail and exit.

I've marked this as a feature request and we'll create an internal story to track it. Or alternatively, we're happy to accept PRs!

jeropaul · 2024-05-15T23:34:00Z

we should follow the user configured error handlers.

Can you elaborate further in regards to what this solution looks like?

jeropaul · 2024-05-16T06:37:17Z

Are you referring to https://guides.rubyonrails.org/active_job_basics.html#exceptions where each job is expected to handle errors and anything not handled by a job causes the poller to fail.

alextwoods · 2024-05-16T15:37:40Z

No - when a job has un-handled errors, the job should fail but should not cause the entire poller to fail.

What I was thinking is that the executor's error handler could be configured by a user, similar to what is supported in Shoryuken's Exception handler.

What I'm thinking:

# interface is a callable with 3 args: exception, the sqs message (so that delete could be called) and the job
my_handler = proc do |exception, message, job|
  if should_terminate_poller?(exception) # some custom logic that inspects the exception
    raise exception # re-raise the exception, causing the poller to terminate
  end
  # some other custom handling
end
Aws::Rails::SqsActiveJob.configure do |config|
  config.exception_handlers = [my_handler, DefaultErrorHandler]
end

jeropaul · 2024-06-07T02:54:17Z

This is looking to be a bigger change than I was expecting!

The current behaviour of Concurrent::RubyThreadPoolExecutor which is the concrete implementation of Concurrent::ThreadPoolExecutor used by Aws::Rails::SqsActiveJob::Executor does not propagate exceptions. Even when Thread.abort_on_exception=true is configured. There are some details and work arounds discussed in ruby-concurrency/concurrent-ruby#634 which has been open since 2017.

But the short of it is the current implementation provides no way for an exception raised when processing a message to terminate the polling process!

alextwoods added the feature-request A feature should be added or improved. label May 15, 2024

RanVaknin added the p3 This is a minor priority issue label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aws::Rails::SqsActiveJob#execute captures StandardError which is too wide #120

Aws::Rails::SqsActiveJob#execute captures StandardError which is too wide #120

jeropaul commented May 14, 2024

jterapin commented May 14, 2024

alextwoods commented May 14, 2024

jeropaul commented May 14, 2024

alextwoods commented May 15, 2024

jeropaul commented May 15, 2024

jeropaul commented May 16, 2024

alextwoods commented May 16, 2024

jeropaul commented Jun 7, 2024

Aws::Rails::SqsActiveJob#execute captures StandardError which is too wide #120

Aws::Rails::SqsActiveJob#execute captures StandardError which is too wide #120

Comments

jeropaul commented May 14, 2024

Steps to reproduce

Expected behavior

Actual behavior

System configuration

jterapin commented May 14, 2024

alextwoods commented May 14, 2024

jeropaul commented May 14, 2024

alextwoods commented May 15, 2024

jeropaul commented May 15, 2024

jeropaul commented May 16, 2024

alextwoods commented May 16, 2024

jeropaul commented Jun 7, 2024