| Summary: | Restart NFS daemon failed on RHEL5.6/5.7 ppc64 arch by running automation job in beaker | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] Beaker | Reporter: | yanfu,wang <yanwang> | ||||
| Component: | beah | Assignee: | Jan Stancek <jstancek> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | low | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 0.7 | CC: | bpeck, dcallagh, eguan, jburke, jstancek, mcsontos, rmancy, stl | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-05-20 06:58:30 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | 705387 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
yanfu,wang
2011-05-11 02:09:29 UTC
I ran the test + /distribution/reserve, and after logging to machine, even then root couldn't stop nfsd with init.d script. ps afxl 1 0 3384 1 18 0 0 0 svc_re S ? 0:00 [nfsd] 1 0 3385 1 18 0 0 0 svc_re S ? 0:00 [nfsd] 1 0 3386 1 18 0 0 0 svc_re S ? 0:00 [nfsd] 1 0 3387 1 18 0 0 0 svc_re S ? 0:00 [nfsd] 1 0 3388 1 18 0 0 0 svc_re S ? 0:00 [nfsd] 1 0 3389 1 19 0 0 0 svc_re S ? 0:00 [nfsd] 1 0 3390 1 19 0 0 0 svc_re S ? 0:00 [nfsd] 1 0 3391 1 19 0 0 0 svc_re S ? 0:00 [nfsd] service nfs stop -> no effect killall -2 nfsd -> no effect killall nfsd -> no effect killall -9 nfsd -> this will finally kill it This seems like kernel issue. nfsd sets SIGINT as terminating signal, but handler seems to be left in state as it was set in userspace, before it did execve to rpc.nfsd.
/etc/init.d/nfs uses SIGINT (-2) to stop nfsd.
--- cut runtest.sh ---
#!/bin/bash
rpc.nfsd 8
--- cut ---
--- cut runtest2.sh ---
#!/bin/bash
./runtest.sh &
--- cut ---
1. running ./runtest.sh will spawn nfsd, and it reacts to SIGINT properly by terminating -> GOOD
[root@ ~]# ./runtest.sh
[root@ ~]# ps afx | grep nfsd
6284 ? S< 0:00 \_ [nfsd4]
6294 pts/0 S+ 0:00 | \_ grep nfsd
6285 ? S 0:00 [nfsd]
6286 ? S 0:00 [nfsd]
6287 ? S 0:00 [nfsd]
6288 ? S 0:00 [nfsd]
6289 ? S 0:00 [nfsd]
6290 ? S 0:00 [nfsd]
6291 ? S 0:00 [nfsd]
6292 ? S 0:00 [nfsd]
[root@ ~]# killall -2 nfsd
[root@ ~]# ps afx | grep nfsd
6301 pts/0 S+ 0:00 | \_ grep nfsd
2. running runtest2.sh will also spawn nfsd, but SIGINT will no longer work -> BAD
[root@ ~]# ./runtest2.sh
[root@ ~]# ps afx | grep nfsd
6311 ? S< 0:00 \_ [nfsd4]
6321 pts/0 S+ 0:00 | \_ grep nfsd
6312 ? S 0:00 [nfsd]
6313 ? S 0:00 [nfsd]
6314 ? S 0:00 [nfsd]
6315 ? S 0:00 [nfsd]
6316 ? S 0:00 [nfsd]
6317 ? S 0:00 [nfsd]
6318 ? S 0:00 [nfsd]
6319 ? S 0:00 [nfsd]
[root@ ~]# killall -2 nfsd
[root@ ~]# ps afx | grep nfsd
6311 ? S< 0:00 \_ [nfsd4]
6324 pts/0 S+ 0:00 | \_ grep nfsd
6312 ? S 0:00 [nfsd]
6313 ? S 0:00 [nfsd]
6314 ? S 0:00 [nfsd]
6315 ? S 0:00 [nfsd]
6316 ? S 0:00 [nfsd]
6317 ? S 0:00 [nfsd]
6318 ? S 0:00 [nfsd]
6319 ? S 0:00 [nfsd]
looking at ptrace, this is one of the differences before it does execve:
from 1.) rt_sigaction(SIGINT, {0x1, [], 0}, <unfinished ...>
from 2.) rt_sigaction(SIGINT, {SIG_DFL, [], 0}, {SIG_DFL, [], 0}, 8) = 0
I ran both runtest.sh ,runtest2.sh while system was reserved.
Created attachment 498907 [details]
patch, which makes the problem go away
With this patch, reproducer (runtest2.sh) no longer works and beaker job also gives PASS. But it doesn't explain why we see it only on ppc64.
Just run into another issue on ppc64 (Bug 657566) and seems the case is the same: el5 on ppc64 is using old harness repo. Keeping this open for tracking and reassigning to Jan. Thanks Jan! Upon initial discussion with Jeff Layton, filing this bug for kernel: https://bugzilla.redhat.com/show_bug.cgi?id=705387 can we close this now? the harness is now updated. Updated harness no longer triggers the issue, so closing this one. https://beaker.engineering.redhat.com/jobs/87345 https://beaker.engineering.redhat.com/jobs/87352 |