Bug 1100158 - perl-Server-Starter-0.17 FTBFS sometimes because of races in tests
Summary: perl-Server-Starter-0.17 FTBFS sometimes because of races in tests
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: perl-Server-Starter
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Petr Pisar
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-22 07:03 UTC by Petr Pisar
Modified: 2014-09-09 07:40 UTC (History)
2 users (show)

Fixed In Version: perl-Server-Starter-0.17-8.fc22
Clone Of:
Environment:
Last Closed: 2014-09-09 07:40:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
CPAN 73711 0 None None None Never

Description Petr Pisar 2014-05-22 07:03:08 UTC
perl-Server-Starter-0.17 does not pass tests sometimes:

[...]
#           matches '(?^m:^FOO=foo-value2$)'
received TERM, sending TERM to all workers:30528,30535
worker 30528 died, status:0
worker 30535 died, status:0
exiting
[Test::TCP] Child process does not block(PID: 30526, PPID: 30525) at /builddir/build/BUILD/Server-Starter-0.17/inc/Test/TCP.pm line 89.
# Looks like you failed 1 test of 4.
t/07-envdir.t ........ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/4 subtests 
Test Summary Report
-------------------
t/06-autorestart.t (Wstat: 512 Tests: 28 Failed: 2)
  Failed tests:  6-7
  Non-zero exit status: 2
t/07-envdir.t      (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
Files=8, Tests=103, 137 wallclock secs ( 0.06 usr  0.00 sys +  1.50 cusr  0.25 csys =  1.81 CPU)
Result: FAIL

This happens especially if the build host is busy and delays hardcoded in the tests are not enough. E.g. t/07-envdir.t:

       my $restart = sub {
            sleep 1;
            kill "HUP", $server_pid;
            sleep 2;
       };

If one removes these sleep()s, then the tests fails all the time. This is known to the upstream <https://rt.cpan.org/Public/Bug/Display.html?id=73711>.

Comment 1 Petr Pisar 2014-06-17 10:30:17 UTC
I fixed the t/07-envdir.t test.

Comment 2 Petr Pisar 2014-07-08 13:52:46 UTC
There is another race in t/06-autorestart.t:

#   Failed test 'status after auto-restart'
#   at t/06-autorestart.t line 55.
#                   '1:26986
# 2:26992
# '
#     doesn't match '(?^s:^2:\d+\n$)'
new worker is now running, sending USR1 to old workers:26992
sleep 1 secs
killing old workers
received TERM, sending TERM to all workers:26992,26997
worker 26992 died, status:15
worker 26997 died, status:0
exiting
[Test::TCP] Child process does not block(PID: 26985, PPID: 26967) at /builddir/build/BUILD/Server-Starter-0.17/inc/Test/TCP.pm line 89.
# Looks like you failed 1 test of 28.
t/06-autorestart.t ... 
Dubious, test returned 1 (wstat 256, 0x100)
Failed

Comment 3 Petr Pisar 2014-07-10 13:13:13 UTC
The t/06-autorestart.t is full of sleeps racing with 4s auto_restart_interval.

Comment 4 Petr Pisar 2014-07-10 13:55:34 UTC
I fixed the 5/06-autorestart.t test.

Comment 5 Petr Pisar 2014-08-08 10:44:56 UTC
The is another race in t/01-starter.t:

#   Failed test 'status after restart'
#   at t/01-starter.t line 53.
#                   '1:22344
# 2:22354
# '
#     doesn't match '(?^s:^2:\d+\n$)'
received TERM, sending TERM to all workers:22354
worker 22354 died, status:0
exiting
[Test::TCP] Child process does not block(PID: 22343, PPID: 22342) at /builddir/build/BUILD/Server-Starter-0.17/inc/Test/TCP.pm line 89.
start_server (pid:22356) starting now...
starting new worker 22357
received HUP (num_old_workers=0)
spawning a new worker (num_old_workers=0)
starting new worker 22358
new worker is now running, sending USR1 to old workers:22357
sleep 0 secs
killing old workers
#   Failed test 'status during restart'
#   at t/01-starter.t line 51.
#                   '1:22357
# '
#     doesn't match '(?^s:^1:\d+\n2:\d+$)'
#   Failed test 'status after restart'
#   at t/01-starter.t line 53.
#                   '1:22357
# 2:22358
# '
#     doesn't match '(?^s:^2:\d+\n$)'
received TERM, sending TERM to all workers:22357,22358
worker 22357 died, status:15
worker 22358 died, status:0
exiting
[Test::TCP] Child process does not block(PID: 22356, PPID: 22342) at /builddir/build/BUILD/Server-Starter-0.17/inc/Test/TCP.pm line 89.
# Looks like you failed 3 tests of 28.
t/01-starter.t ....... 
Dubious, test returned 3 (wstat 768, 0x300)
Failed

Comment 6 Petr Pisar 2014-08-08 11:15:40 UTC
perl-Server-Starter-0.17-5.fc22 fixes the t/01-starter.t. I hope this is everything.

Comment 7 Petr Pisar 2014-08-21 11:55:29 UTC
There is another race in t/05-killolddelay.t:

#   Failed test 'status after restart'
#   at t/05-killolddelay.t line 58.
#                   '1:19612
# 2:19621
# '
#     doesn't match '(?^s:^2:\d+\n$)'
received TERM, sending TERM to all workers:19621
worker 19621 died, status:0
exiting
[Test::TCP] Child process does not block(PID: 19611, PPID: 19610) at /builddir/build/BUILD/Server-Starter-0.17/inc/Test/TCP.pm line 89.
start_server (pid:20006) starting now...
starting new worker 20007
received HUP (num_old_workers=0)
spawning a new worker (num_old_workers=0)
starting new worker 20010
new worker is now running, sending USR1 to old workers:20007
sleep 3 secs
killing old workers
old worker 20007 died, status:0
received TERM, sending TERM to all workers:20010
worker 20010 died, status:0
exiting
[Test::TCP] Child process does not block(PID: 20006, PPID: 19610) at /builddir/build/BUILD/Server-Starter-0.17/inc/Test/TCP.pm line 89.
# Looks like you failed 1 test of 28.
t/05-killolddelay.t .. 
Dubious, test returned 1 (wstat 256, 0x100)
Failed

Comment 8 Petr Pisar 2014-09-08 15:52:09 UTC
There another race. If t/05-killolddelay-echod.pl is not fast enough with writing received signal name (e.g. by adding "sleep 2;" in the beginning of a signal handler), the t/06-autorestart.t tests fails:

t/05-killolddelay.t .. ok
start_server (pid:17417) starting now...
starting new worker 17418
autorestart triggered (interval=4)
spawning a new worker (num_old_workers=0)
starting new worker 17421
new worker is now running, sending TERM to old workers:17418
sleep 1 secs
killing old workers
old worker 17418 died, status:0
autorestart triggered (interval=4)
spawning a new worker (num_old_workers=0)
starting new worker 17431
# Server changed from <1:17418
# > to <2:17421
# >
#   Failed test 'signal sent on hup'
#   at t/06-autorestart.t line 63.
#          got: ''
#     expected: 'TERM'
new worker is now running, sending TERM to old workers:17421

Comment 9 Petr Pisar 2014-09-09 07:40:55 UTC
Fixed with perl-Server-Starter-0.17-8.fc22.


Note You need to log in before you can comment on or make changes to this bug.