Bug 1151706
| Summary: | hwcert-backend tool cannot stop when test nfs kdump | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] Red Hat Hardware Certification Program | Reporter: | Amy Gou <goujm1> | ||||||||||||||||||||||||
| Component: | Test Suite (tests) | Assignee: | Greg Nichols <gnichols> | ||||||||||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||||||||||||||
| Severity: | urgent | Docs Contact: | |||||||||||||||||||||||||
| Priority: | urgent | ||||||||||||||||||||||||||
| Version: | 1.7.0 | CC: | bbrock, garrickyang, gbai, gnichols, juzou, qcai, rlandry | ||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||
| Clone Of: | |||||||||||||||||||||||||||
| : | 1161648 (view as bug list) | Environment: | |||||||||||||||||||||||||
| Last Closed: | 2014-11-11 12:42:37 UTC | Type: | Bug | ||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||||||
| Bug Depends On: | |||||||||||||||||||||||||||
| Bug Blocks: | 1161648 | ||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||
Amy,
Can you paste the content of the file /var/log/hwcert/runs/1/kdump/output.log?
FYI, I found below traceback in the sosreport you attached:
4397 Oct 11 11:08:20 localhost hwcert-backend: Traceback (most recent call last):
4398 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/bin/hwcert-backend", line 45, in <module>
4399 Oct 11 11:08:20 localhost hwcert-backend: success = hwcertBackend.do(args)
4400 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/backend.py", line 182, in do
4401 Oct 11 11:08:20 localhost hwcert-backend: result = self.commands[self.command]()
4402 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/harness.py", line 372, in doContinue
4403 Oct 11 11:08:20 localhost hwcert-backend: return self._doRun(tests, continueRun=True)
4404 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/harness.py", line 463, in _doRun
4405 Oct 11 11:08:20 localhost hwcert-backend: returnValue = self.runTest(tmpDirectory, test, run, outputFilePath)
4406 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/harness.py", line 909, in runTest
4407 Oct 11 11:08:20 localhost hwcert-backend: rv = test.run()
4408 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/tests/info/info.py", line 603, in run
4409 Oct 11 11:08:20 localhost hwcert-backend: if not self.runSubTest(self.generateSystemReport, name="System Report", description="generate syste m report"):
4410 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/test.py", line 473, in runSubTest
4411 Oct 11 11:08:20 localhost hwcert-backend: result = subtestFunction()
4412 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/tests/info/info.py", line 532, in generateSystemReport
4413 Oct 11 11:08:20 localhost hwcert-backend: result = self.__processSystemReport("sosreport --batch -n selinux")
4414 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/tests/info/info.py", line 567, in __processSystemReport
4415 Oct 11 11:08:20 localhost hwcert-backend: shutil.copy(tarFile, self.getOutputDirectory())
4416 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/lib64/python2.7/shutil.py", line 119, in copy
4417 Oct 11 11:08:20 localhost hwcert-backend: copyfile(src, dst)
4418 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/lib64/python2.7/shutil.py", line 83, in copyfile
4419 Oct 11 11:08:20 localhost hwcert-backend: with open(dst, 'wb') as fdst:
4420 Oct 11 11:08:20 localhost hwcert-backend: IOError: [Errno 2] No such file or directory: u'/var/log/hwcert/runs/1/info'
Created attachment 947165 [details]
kdump output
Hi Lenovo, Could you make SELinux permissive like below command and re-run your kdump test? # setenforce 0 Hi Lenovo, Your kdump test succeeded, but info test got terminated abnormally. I'm still investigating the issue. Could you also try again with abrtd disabled? # systemctl stop abrtd.service # systemctl disable abrtd.service Hi, 1. When use the command "# setenforce 0", it displayed as the attachment kdump.jpg, and the kdump nfs test cannot start successfully. 2.We used the command as below, but is still cannot stop. # systemctl stop abrtd.service # systemctl disable abrtd.service Created attachment 947181 [details]
kdump.jpg
(In reply to garrickyang from comment #5) > Hi, > > 1. When use the command "# setenforce 0", it displayed as the attachment > kdump.jpg, and the kdump nfs test cannot start successfully. answer y, directly Hi, After select y, it still cannot stop. And with command "hwcert-bankend print",it displayed "hwcert is already running(lock file /var/lock/systems/hwcert found)" ,then select y, it displayed info fail, kdump nfs incomplete. Please attach the /var/hwcert/results.xml here. Thanks Created attachment 947183 [details]
result
According to comment 2, the kdump test finished successfully, the problem is the failed info test which terminated abnormally with traceback as comment 1, but actually the info test went to the end if 'setenforce 0' according to the results.xml from the last comment. I suspect it's the selinux policy issue and find the following messages strange: --- [ cut ] --- Oct 15 18:17:55 localhost setroubleshoot: Plugin Exception restorecon Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/ls from getattr access on the directory . For complete SELinux messages. run sealert -l 5cafd25e-1e18-4c15-aaa3-ad51f0e3eb2f Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/ls from getattr access on the directory . ***** Plugin catchall (100. confidence) suggests ************************** If you believe that ls should be allowed getattr access on the directory by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # grep ls /var/log/audit/audit.log | audit2allow -M mypol # semodule -i mypol.pp Oct 15 18:17:55 localhost setroubleshoot: Plugin Exception restorecon_source Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/ls from getattr access on the directory . For complete SELinux messages. run sealert -l 5cafd25e-1e18-4c15-aaa3-ad51f0e3eb2f Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/ls from getattr access on the directory . $ If you believe that ls should be allowed getattr access on the directory by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # grep ls /var/log/audit/audit.log | audit2allow -M mypol # semodule -i mypol.pp Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/bash from read access on the lnk_file . For complete SELinux messages. run sealert -l 8091a8f3-b1f0-4b24-8d11-bf7f97f4cb88 Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/bash from read access on the lnk_file . ***** Plugin catchall (100. confidence) suggests ************************** If you believe that bash should be allowed read access on the lnk_file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # grep barf /var/log/audit/audit.log | audit2allow -M mypol # semodule -i mypol.pp Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/ls from read access on the directory . For complete SELinux messages. run sealert -l 00236891-6a66-4fb8-922d-e39419146ce3 Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/ls from read access on the directory . ***** Plugin catchall (100. confidence) suggests ************************** If you believe that ls should be allowed read access on the directory by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # grep ls /var/log/audit/audit.log | audit2allow -M mypol # semodule -i mypol.pp --- [ end ] --- Another thing strange is the warning in lsof file from the sosreport: --- [ cut ] --- lsof: WARNING: can't stat() rootfs file system / lsof: WARNING: can't stat() proc file system /proc lsof: WARNING: can't stat() sysfs file system /sys lsof: WARNING: can't stat() devtmpfs file system /dev lsof: WARNING: can't stat() securityfs file system /sys/kernel/security lsof: WARNING: can't stat() tmpfs file system /dev/shm lsof: WARNING: can't stat() devpts file system /dev/pts lsof: WARNING: can't stat() tmpfs file system /run lsof: WARNING: can't stat() tmpfs file system /sys/fs/cgroup lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/systemd lsof: WARNING: can't stat() pstore file system /sys/fs/pstore lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/cpuset lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/cpu,cpuacct lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/memory lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/devices lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/freezer lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/net_cls lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/blkio lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/perf_event lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/hugetlb lsof: WARNING: can't stat() configfs file system /sys/kernel/config lsof: WARNING: can't stat() xfs file system / lsof: WARNING: can't stat() selinuxfs file system /sys/fs/selinux lsof: WARNING: can't stat() debugfs file system /sys/kernel/debug lsof: WARNING: can't stat() mqueue file system /dev/mqueue lsof: WARNING: can't stat() hugetlbfs file system /dev/hugepages lsof: WARNING: can't stat() rpc_pipefs file system /var/lib/nfs/rpc_pipefs lsof: WARNING: can't stat() nfsd file system /proc/fs/nfsd lsof: WARNING: can't stat() xfs file system /boot lsof: WARNING: can't stat() binfmt_misc file system /proc/sys/fs/binfmt_misc lsof: WARNING: can't stat() fusectl file system /sys/fs/fuse/connections lsof: WARNING: can't stat() nfs4 file system /tmp/hwcert-kdump-S035mi/hwcert-nfs lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/0/gvfs --- [ end ] --- I don't know why those. Are you using a clean OS for certification? Not sure the following operations help, just a suggestion: # yum remove 'abrt*' # yum reinstall '*selinux*' sos Created attachment 947466 [details]
selinux alert
Created attachment 947479 [details]
reinstall selinux
Hi, 1. After using the below commands, it displayes as the attachment 'reinstall selinux.jpg'. And we continue test the kdump nfs, the test cannot stop too. # yum remove 'abrt*' # yum reinstall '*selinux*' sos. 2.After the reboot when testing kdump nfs, an selinux alert displayed. Please refer to the 'selinux alert.txt' attachment for the details. Thanks (In reply to garrickyang from comment #14) > Created attachment 947479 [details] > reinstall selinux That's because you didn't setup any repo as the command prompted. How to setup repo? Please consult with our support team. I used a clean OS to test kdump nfs, the test process cannot stop either. Really? Please attach the results.xml. Created attachment 947541 [details]
new-os results.xml
Hello, Your tests looks good. There's a fact: After nfs kdump finishes and reboots back, the hwcert daemon continues itself, so if you issue "hwcert-backend print" and encounter "hwcert is already running (lock file /var/lock/systems/hwcert found)", please answer n and wait minutes for its completion because the info test often spends several minutes on the sosreport generation. After that, please run the print command again. Hi, We wait about 1 hours and used the command "hwcert-backend print", it still displayed "hwcert is already running (lock file /var/lock/systems/hwcert found)". Is there any method to finish the test? Thanks Hi, After "hwcert is already running (lock file /var/lock/systems/hwcert found)" displayed, we select y, and it displayed info pass, kdump nfs incomplete. Hi, All the servers of lenovo have the same issue about kdump nfs and the result.xml has the abnormal infomation in comment 11 that you mentioned. Is there any update about the bug? Thanks Hi, The correct lock file is "/var/lock/subsys/hwcert", why it prompted "/var/lock/systems/hwcert" in your testing? Please "rm -f /etc/hwcert.xml" and re-run your test. Hi, With the command "rm -f /etc/hwcert.xml" and re-run the test,it prompted "/var/lock/subsys/hwcert", but it cannot stopped either. I was not able to reproduce your failure from our boxes. Still suspicious of the selinux problems. Please run the following command and paste the results: # touch /var/lock/subsys/dbgcert # ls -Z /var/lock/subsys/dbgcert # rm -f /var/lock/subsys/dbgcert # stat /var /tmp Created attachment 948488 [details]
the new result
Hi The result is as the below: [root@localhost Desktop]# touch /var/lock/subsys/dbgcert [root@localhost Desktop]# ls -Z /var/lock/subsys/dbgcert -rw-r--r--. root root unconfined_u:object_r:var_lock_t:s0 /var/lock/subsys/dbgcert [root@localhost Desktop]# rm -f /var/lock/subsys/dbgcert [root@localhost Desktop]# stat /var/tmp File: ‘/var/tmp’ Size: 4096 Blocks: 8 IO Block: 4096 directory Device: fd00h/64768d Inode: 402654232 Links: 12 Access: (1777/drwxrwxrwt) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:tmp_t:s0 Access: 2014-10-20 16:51:09.995707523 +0800 Modify: 2014-10-20 17:53:08.472769566 +0800 Change: 2014-10-20 17:53:08.472769566 +0800 Birth: - Hi Is there any update about the issue? Lenovo, For debugging the issue, please do exactly following these instructions: 1. Edit /etc/init.d/hwcert-backend and comment out the the only statement in start() function. 2. Re-run your kdump nfs test, but right after the system reboot, run "hwcert-backend continue" and paste the result here. Thanks Created attachment 954815 [details]
result for hwcert-banckend continue
Oh, you must issue the test with "--server" option. Ok, it's fine. Please again test per comment 35 but with "hwcert-backend continue --server 192.168.5.1" and feedback. Thanks Created attachment 954857 [details]
new-result
Created attachment 954860 [details]
print-result
Attachment 954857 [details] Shows the sosreport sucessfully copied to the log directory:
Your sosreport has been generated and saved in:
/var/tmp/sosreport-localhost.localdomain-20141108014219.tar.xz
Copied sosreport --batch -n selinux /var/tmp/sosreport-localhost.localdomain-20141108014219.tar.xz to /var/log/hwcert/runs/1/info
And it's also enclosed as an attachment within the results.xml.
|
Created attachment 945908 [details] sosreport Description of problem: The kdump nfs test cannot be passed on RHEL7, it always running and we cannot get the result whether it pass or fail. The server can restart after running the kdump test(use the command"hwcert-bankend run -test=kdump -device=nfs -server=TC IP Address") and it can produce vmcore file on TC. After restart and login, using command "hwcert-backend print" , it shows"hwcert is already running(lock file /var/lock/systems/hwcert found)". Version-Release number of selected component (if applicable): 1.7.0.1-20140704 How reproducible: Steps to Reproduce: 1.Use the command"hwcert-bankend run -test=kdump -device=nfs -server=TC IP Address" to run the nfs kdump test. 2.Restart the SUT and login 3.Use the command "hwcert-backend print" Actual results: It shows"hwcert is already running(lock file /var/lock/systems/hwcert found)" Expected results: Show the result of nfs kdump, pass or fail Additional info: When relogin the RHEL7, it shows "A problem in the hwcert-client package has been detected" on the Desktop.