Bug 589636

Summary: rpmbuild cycles during test part when ipv6 is enabled
Product: Red Hat Enterprise Linux 6 Reporter: Aleš Mareček <amarecek>
Component: nssAssignee: Elio Maldonado Batiz <emaldona>
Status: CLOSED ERRATA QA Contact: Aleš Mareček <amarecek>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: azelinka, borgan, ddumas, dgregor, ebenes, jrieden, rrelyea
Target Milestone: rcKeywords: Reopened, TestBlocker
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: nss-3.12.9-4.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 14:03:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580448    
Attachments:
Description Flags
Short-term fix for ssl test suites hangs on ipv6 type connections emaldona: review+

Description Aleš Mareček 2010-05-06 15:25:51 UTC
Description of problem:
I tried to rebuild nss package from source and after entering test part it was cycling. SIGUSR1 signal caused rebuild finished.

Version-Release number of selected component (if applicable):
nss-3.12.6-2.el6.src.rpm

How reproducible:
Always

Steps to Reproduce:
1. Download nss source package from brew
2. rebuild it with "rpmbuild --rebuild nss-3.12.6-2.el6.src.rpm"
3. After some time you should be able to something like:
selfserv_9432 starting at Thu May  6 11:02:26 EDT 2010
selfserv_9432 -D -p 9432 -d ../server -n localhost.localdomain  \
          -w nss -r -r -i ../tests_pid.11198  &
trying to connect to selfserv_9432 at Thu May  6 11:02:26 EDT 2010
tstclnt -p 9432 -h localhost.localdomain  -q \
        -d ../client -v < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: Client timed out while waiting for connection to server: TCP connection reset by peer.
retrying to connect to selfserv_9432 at Thu May  6 11:03:32 EDT 2010
tstclnt -p 9432 -h localhost.localdomain  -q \
        -d ../client -v < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: Client timed out while waiting for connection to server: TCP connection reset by peer.
ssl.sh: #485: Waiting for Server - FAILED
kill -0 26327 >/dev/null 2>/dev/null
selfserv_9432 with PID 26327 found at Thu May  6 11:04:33 EDT 2010
selfserv_9432 with PID 26327 started at Thu May  6 11:04:33 EDT 2010
tstclnt -p 9432 -h localhost.localdomain -f -d ../client -v  \
        -T -w nss -n none  < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: connect: Operation is still in progress (probably a non-blocking connect).
tstclnt: about to call PR_Poll for connect completion!
tstclnt: PR_Poll returned 0x28 for socket out_flags.
tstclnt: unable to connect (poll): Connection refused by peer.
ssl.sh: #486: SSL3 Require client auth (client does not provide auth) produced a returncode of 1, expected is 1 - PASSED
trying to kill selfserv_9432 with PID 26327 at Thu May  6 11:04:33 EDT 2010
kill -USR1 26327
selfserv: 0 cache hits; 0 cache misses, 0 cache not reusable
          0 stateless resumes, 0 ticket parse failures
selfserv: normal termination
selfserv_9432 -b -p 9432 2>/dev/null;
selfserv_9432 with PID 26327 killed at Thu May  6 11:04:33 EDT 2010
ssl.sh: SSL3 Require client auth (bad password) ----
selfserv_9432 starting at Thu May  6 11:04:33 EDT 2010
selfserv_9432 -D -p 9432 -d ../server -n localhost.localdomain  \
          -w nss -r -r -i ../tests_pid.11198  &
trying to connect to selfserv_9432 at Thu May  6 11:04:33 EDT 2010
tstclnt -p 9432 -h localhost.localdomain  -q \
        -d ../client -v < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: Client timed out while waiting for connection to server: TCP connection reset by peer.
retrying to connect to selfserv_9432 at Thu May  6 11:05:38 EDT 2010
tstclnt -p 9432 -h localhost.localdomain  -q \
        -d ../client -v < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: Client timed out while waiting for connection to server: TCP connection reset by peer.
ssl.sh: #487: Waiting for Server - FAILED
kill -0 26396 >/dev/null 2>/dev/null
selfserv_9432 with PID 26396 found at Thu May  6 11:06:39 EDT 2010
selfserv_9432 with PID 26396 started at Thu May  6 11:06:39 EDT 2010
tstclnt -p 9432 -h localhost.localdomain -f -d ../client -v  \
        -T -n TestUser -w bogus  < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: connect: Operation is still in progress (probably a non-blocking connect).
tstclnt: about to call PR_Poll for connect completion!
tstclnt: PR_Poll returned 0x28 for socket out_flags.
tstclnt: unable to connect (poll): Connection refused by peer.
ssl.sh: #488: SSL3 Require client auth (bad password) produced a returncode of 1, expected is 1 - PASSED
trying to kill selfserv_9432 with PID 26396 at Thu May  6 11:06:39 EDT 2010
kill -USR1 26396
selfserv: 0 cache hits; 0 cache misses, 0 cache not reusable
          0 stateless resumes, 0 ticket parse failures
selfserv: normal termination
selfserv_9432 -b -p 9432 2>/dev/null;
selfserv_9432 with PID 26396 killed at Thu May  6 11:06:39 EDT 2010
ssl.sh: SSL3 Require client auth (client auth) ----
selfserv_9432 starting at Thu May  6 11:06:39 EDT 2010
selfserv_9432 -D -p 9432 -d ../server -n localhost.localdomain  \
          -w nss -r -r -i ../tests_pid.11198  &
trying to connect to selfserv_9432 at Thu May  6 11:06:39 EDT 2010
tstclnt -p 9432 -h localhost.localdomain  -q \
        -d ../client -v < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: Client timed out while waiting for connection to server: TCP connection reset by peer.
retrying to connect to selfserv_9432 at Thu May  6 11:07:45 EDT 2010
tstclnt -p 9432 -h localhost.localdomain  -q \
        -d ../client -v < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: Client timed out while waiting for connection to server: TCP connection reset by peer.
ssl.sh: #489: Waiting for Server - FAILED
kill -0 26465 >/dev/null 2>/dev/null
selfserv_9432 with PID 26465 found at Thu May  6 11:08:45 EDT 2010
selfserv_9432 with PID 26465 started at Thu May  6 11:08:45 EDT 2010
tstclnt -p 9432 -h localhost.localdomain -f -d ../client -v  \
        -T -n TestUser -w nss  < /root/rpmbuild/BUILD/nss-3.12.6/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: connect: Operation is still in progress (probably a non-blocking connect).
tstclnt: about to call PR_Poll for connect completion!
tstclnt: PR_Poll returned 0x28 for socket out_flags.
tstclnt: unable to connect (poll): Connection refused by peer.
ssl.sh: #490: SSL3 Require client auth (client auth) produced a returncode of 1, expected is 0 - FAILED
trying to kill selfserv_9432 with PID 26465 at Thu May  6 11:08:46 EDT 2010
kill -USR1 26465
selfserv: 0 cache hits; 0 cache misses, 0 cache not reusable
          0 stateless resumes, 0 ticket parse failures
selfserv: normal termination
selfserv_9432 -b -p 9432 2>/dev/null;
selfserv_9432 with PID 26465 killed at Thu May  6 11:08:46 EDT 2010
ssl.sh: skipping  TLS Request don't require client auth on 2nd hs (client does not provide auth) (non-FIPS only)
ssl.sh: skipping  TLS Request don't require client auth on 2nd hs (bad password) (non-FIPS only)
ssl.sh: skipping  TLS Request don't require client auth on 2nd hs (client auth) (non-FIPS only)
ssl.sh: SSL3 Require client auth on 2nd hs (client does not provide auth) ----
  
Actual results:
Rebuild cycles.

Expected results:
Successful rebuild.

Additional info:
Rebuild was done on RHEL6.0-Snapshot-2 (ppc64 and i386 machines)

Comment 2 RHEL Program Management 2010-05-06 17:25:16 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Elio Maldonado Batiz 2010-05-07 14:26:32 UTC
This problem seems to be the same as that reported in Bug 539183 which is now closed. It certainly doesn't happen with the brew builds. I wonder if it may be due to a particular network configuration setting in the machine in question. Can you consistently reproduce it?

Comment 4 Aleš Mareček 2010-05-10 07:20:48 UTC
1. Reserve machine in RHTS, RHEL6.0-Snapshot-2
2. Now, you can connect via ssh (there is an IP on eth0 and lo0)
3. Download the latest nss srpm for RHEL6 and try to reuild it
4. you should get cycle in test part of srpm

Additional info:
localhost and localhost.localdomain answered by 127.0.0.1 (ping)
localhost and localhost.localdomain answered by ::1 (ping6)
### LOG ###
tstclnt: connecting to localhost.localdomain:9432 (address=::1)
tstclnt: connect: Operation is still in progress (probably a non-blocking
connect).
tstclnt: about to call PR_Poll for connect completion!
tstclnt: PR_Poll returned 0x28 for socket out_flags.
tstclnt: unable to connect (poll): Connection refused by peer.
###########
In log, as yoou can see, is connection to IPv6 only or only this test is cycling?

Comment 6 Elio Maldonado Batiz 2010-07-12 22:19:02 UTC
(In reply to comment #5)
> What status is here please? Some news?    

I cannot reproduce the problem. I was able to do a rebuild of nss-3.12.6-3.el6.i686 in a system running RHEL 6 beta 2 updated to the latest fom the nightly internal repository. I downloaded the srpm from brew and executed
rpmbuild --rebuild nss-3.12.6-3.el6.src.rpm as per your instuctions and it completed just fine.

Comment 7 Elio Maldonado Batiz 2010-07-12 22:26:07 UTC
No changes to nss were required and I did this in a slow machine. Please verify that you can rebuild nss-3.12.6-3 with the most recent RHEL 6, I used Today's nighlt wich should be the same as Snapshot 7.

Comment 11 Denise Dumas 2010-08-24 18:54:21 UTC
Since comment 9 describes a workaround, and given where we are in RHEL6, we should move this to 6.1. Do we want a release note with the workaround?

Comment 12 Elio Maldonado Batiz 2010-08-24 19:08:12 UTC
By the way, I am trying to comment out lines in the spec file incorrecly.
#%{...} doesn't work, it should be #%%{..} or something else.

Getting back to the problem at hand, I can't reproduce it on my sytem of in the build machines. I enabled all the test suites and had no hang-ups in my system. Submitted a scratch build https://brewweb.devel.redhat.com/taskinfo?taskID=2704937
As I write several have completed without problems.

Comment 33 Eduard Benes 2010-10-29 13:47:34 UTC
Nss builds fail under root user due to a failure in following two tests from testsuite executed during the build:

dbtests.sh: #204: Dbtest r/w succeeded in an readonly directory 0 - FAILED
dbtests.sh: #4213: Dbtest r/w succeeded in an readonly directory 0 - FAILED

The same problem has been reported in Fedora as Bug 646045.

Comment 34 Aleš Mareček 2010-10-29 13:55:01 UTC
According to Elio's analysis these two tests are for non-root users. I've done rebuild under non-root user and all tests passed.
One of the ways for reparation could be create a user in test and then use it for these two tests. Other way could skip these tests while rebuilding under root but I would not recommend this "fix".

Comment 35 Elio Maldonado Batiz 2010-10-29 16:48:14 UTC
Ales's workaround is fine for now. I have a patch for the tests script at
https://bugzilla.redhat.com/attachment.cgi?id=455374&action=diff
that allows the test to be run as root. That will have to wait for RHEL 6.1 or RHEL 6.0_Z.

Comment 36 Elio Maldonado Batiz 2010-10-29 19:57:51 UTC
(In reply to comment #35) Even with the patch we should still find a way to create a non-root user to run the tests as Ales states, otherwise we aren't getting the intended testing.

Comment 37 Aleš Mareček 2011-02-22 12:36:34 UTC
Hi Elio,
I'm reopening this bug. Now, we're testing erratas' TPS and this nss test breaks all system like no tps job can continue because it freezes (cycles) on nss rebuild.
I think it's not a proper behaviour that test can cycle and nothing can stop it. Anyway, what does it mean "kill -0" ? See description log: kill -0 26327 >/dev/null 2>/dev/null

Comment 38 Aleš Mareček 2011-02-22 12:38:54 UTC
In case of /etc/hosts "fix":
What I discovered is that /etc/hosts like following is correct so I can't change it forever, it wouldn't be right solution.

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

Comment 39 Elio Maldonado Batiz 2011-02-22 17:41:03 UTC
The problem has been independently reported upstream. A patch to the selverv test tool has been suggested as a temporary fix for Linux that would allow the ssl tests to proceed under ipv6 and I am willing to apply it as soon as it's available. See https://bugzilla.mozilla.org/show_bug.cgi?id=617723#c13

Comment 40 Elio Maldonado Batiz 2011-02-23 19:29:02 UTC
Until we get a response from upstream, I'll pick he patch from
https://bugzilla.mozilla.org/attachment.cgi?id=499383&action=diff
as a temporary workaround so we unblock testing for the beta compose.

Comment 41 Elio Maldonado Batiz 2011-02-23 21:17:04 UTC
This patch which works for Fedora breaks the RHEL 6.1 builds. It seems that ipv6 is not enabled in the build system. We will have to wait for the better temporary fix that was mentioned in the upsstream bug report. I will contact our upstream friends on the issue.

Comment 42 Elio Maldonado Batiz 2011-02-24 16:05:49 UTC
Created attachment 480793 [details]
Short-term fix for ssl test suites hangs on ipv6 type connections

Better patch from upstream. Change selfserv to use a dual-stack IPv6 listening socket, which can accept connections from both IPv4 and IPv6 clients.  NSPR's IPv6 sockets have the IPV6_V6ONLY socket option default to false. My tests look good so far. Waiting on results from other platforms before I apply it. There is a brew scratch of nss-3.12.9-4.el6 in case someone else wants to test it.

Comment 44 Elio Maldonado Batiz 2011-02-24 18:35:19 UTC
Comment on attachment 480793 [details]
Short-term fix for ssl test suites hangs on ipv6 type connections

r+ by the NSS team.
Checking in nss-589636.patch;
/cvs/dist/rpms/nss/RHEL-6/nss-589636.patch,v  <--  nss-589636.patch
initial revision: 1.1
Checking in nss.spec;
/cvs/dist/rpms/nss/RHEL-6/nss.spec,v  <--  nss.spec
new revision: 1.68; previous revision: 1.67

Comment 48 errata-xmlrpc 2011-05-19 14:03:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0692.html