You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been seeing a failure on and off when running concurrent-ruby's specs against JRuby head. I believe this failure in the cylic barrier specs may have a bug:
1) Concurrent::CyclicBarrier#number_waiting with waiting threads should be equal to the waiting threads count
Failure/Error: expect(thread_join).not_to be_nil, thread.inspect
#<Thread:0x15605d83@/home/travis/build/jruby/jruby/concurrent-ruby/spec/support/example_group_extensions.rb:35 aborting>
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-support-3.9.2/lib/rspec/support.rb:97:in `block in Support'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-support-3.9.2/lib/rspec/support.rb:106:in `notify_failure'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.9.1/lib/rspec/expectations/fail_with.rb:35:in `fail_with'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.9.1/lib/rspec/expectations/handler.rb:40:in `handle_failure'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.9.1/lib/rspec/expectations/handler.rb:72:in `block in handle_matcher'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.9.1/lib/rspec/expectations/handler.rb:27:in `with_matcher'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.9.1/lib/rspec/expectations/handler.rb:70:in `handle_matcher'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.9.1/lib/rspec/expectations/expectation_target.rb:78:in `not_to'
# ./spec/spec_helper.rb:62:in `block in /home/travis/build/jruby/jruby/concurrent-ruby/spec/spec_helper.rb'
# org/jruby/RubyBasicObject.java:2694:in `instance_exec'
# /home/travis/build/jruby/jruby/lib/ruby/gems/shared/gems/rspec-core-3.9.1/lib/rspec/core/example.rb:450:in `instance_exec'
The failure above is confusing because it's reporting one description and failing on some other line. Specifically, the failure error says it is waiting for a thread to join (which is the join_with logic used around line 200 in cyclic_barrier_spec.rb), but the spec description reports a "waiting threads count" spec has failed.
Digging into both specs, I think the in_thread logic in the spec helpers may be flawed. Here's the waiting threads spec:
context'with waiting threads'doit'should be equal to the waiting threads count'doin_thread{barrier.wait}in_thread{barrier.wait}repeat_until_success{expect(barrier.number_waiting).toeq2}endend
It starts up two threads to wait on the barrier and then checks that they eventually show up as waiting. I believe this spec passes fine, but those threads are little time bombs due to this code in in_thread:
This logic creates a new thread and adds it to a queue to be shut down later. But strangely, this code also sets the global abort_on_exception flag to true.
This combine with the shutdown code is likely leading to unexpected results:
My theory is that the failure above is due to one of these abandoned threads being lazily killed. The call to thread.kill causes the thread's barrier wait to be interrupted, raising an error and eventually terminating the thread. Because of abort_on_exception, whatever the main thread is running at that point (like a thread.join in join_with) will be interrupted, and that might happen while running other specs, as is the case here.
I don't think we should be using abort_on_exception at all. If these threads are expected to run to completion, we should be testing for that. We should not allow a spec in one thread to abort the entire spec run because it happened to bubble out an error... especially when these threads are transient and being forcibly killed.
The trivial patch here would be to remove the abort_on_exception call:
I have been seeing a failure on and off when running concurrent-ruby's specs against JRuby head. I believe this failure in the cylic barrier specs may have a bug:
The failure above is confusing because it's reporting one description and failing on some other line. Specifically, the failure error says it is waiting for a thread to join (which is the
join_with
logic used around line 200 in cyclic_barrier_spec.rb), but the spec description reports a "waiting threads count" spec has failed.Digging into both specs, I think the
in_thread
logic in the spec helpers may be flawed. Here's the waiting threads spec:It starts up two threads to wait on the barrier and then checks that they eventually show up as waiting. I believe this spec passes fine, but those threads are little time bombs due to this code in
in_thread
:This logic creates a new thread and adds it to a queue to be shut down later. But strangely, this code also sets the global
abort_on_exception
flag to true.This combine with the shutdown code is likely leading to unexpected results:
My theory is that the failure above is due to one of these abandoned threads being lazily killed. The call to
thread.kill
causes the thread's barrier wait to be interrupted, raising an error and eventually terminating the thread. Because ofabort_on_exception
, whatever the main thread is running at that point (like athread.join
injoin_with
) will be interrupted, and that might happen while running other specs, as is the case here.I don't think we should be using
abort_on_exception
at all. If these threads are expected to run to completion, we should be testing for that. We should not allow a spec in one thread to abort the entire spec run because it happened to bubble out an error... especially when these threads are transient and being forcibly killed.The trivial patch here would be to remove the
abort_on_exception
call:Alternatively, we could keep the abort logic if we gracefully shut down all of these transient threads, rather than forcibly killing them.
I will be disabling the concurrent-ruby suite for JRuby's CI until we can resolve this.
The text was updated successfully, but these errors were encountered: