Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1151706

Summary: hwcert-backend tool cannot stop when test nfs kdump
Product: [Retired] Red Hat Hardware Certification Program Reporter: Amy Gou <goujm1>
Component: Test Suite (tests)Assignee: Greg Nichols <gnichols>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 1.7.0CC: bbrock, garrickyang, gbai, gnichols, juzou, qcai, rlandry
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1161648 (view as bug list) Environment:
Last Closed: 2014-11-11 12:42:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1161648    
Attachments:
Description Flags
sosreport
none
kdump output
none
kdump.jpg
none
result
none
selinux alert
none
reinstall selinux
none
new-os results.xml
none
the new result
none
result for hwcert-banckend continue
none
new-result
none
print-result none

Description Amy Gou 2014-10-11 05:56:20 UTC
Created attachment 945908 [details]
sosreport

Description of problem:

    The kdump nfs test cannot be passed on RHEL7, it always running and we cannot get the result whether it pass or fail. 
    The server can restart after running the kdump test(use the command"hwcert-bankend run -test=kdump -device=nfs -server=TC IP Address") and it can produce vmcore file on TC. After restart and login, using command "hwcert-backend print" , it shows"hwcert is already running(lock file /var/lock/systems/hwcert found)".

Version-Release number of selected component (if applicable):

1.7.0.1-20140704

How reproducible:


Steps to Reproduce:
1.Use the command"hwcert-bankend run -test=kdump -device=nfs -server=TC IP Address" to run the nfs kdump test.
2.Restart the SUT and login
3.Use the command "hwcert-backend print" 

Actual results:

It shows"hwcert is already running(lock file /var/lock/systems/hwcert found)"

Expected results:

Show the result of nfs kdump, pass or fail

Additional info:

When relogin the RHEL7, it shows "A problem in the hwcert-client package has been detected" on the Desktop.

Comment 1 Guangze Bai 2014-10-13 10:42:21 UTC
Amy,

Can you paste the content of the file /var/log/hwcert/runs/1/kdump/output.log?

FYI, I found below traceback in the sosreport you attached:

4397 Oct 11 11:08:20 localhost hwcert-backend: Traceback (most recent call last):                                                                 
4398 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/bin/hwcert-backend", line 45, in <module>                                               
4399 Oct 11 11:08:20 localhost hwcert-backend: success = hwcertBackend.do(args)                                                                   
4400 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/backend.py", line 182, in do                                    
4401 Oct 11 11:08:20 localhost hwcert-backend: result = self.commands[self.command]()                                                             
4402 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/harness.py", line 372, in doContinue                            
4403 Oct 11 11:08:20 localhost hwcert-backend: return self._doRun(tests, continueRun=True)                                                        
4404 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/harness.py", line 463, in _doRun                                
4405 Oct 11 11:08:20 localhost hwcert-backend: returnValue = self.runTest(tmpDirectory, test, run, outputFilePath)                                
4406 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/harness.py", line 909, in runTest                               
4407 Oct 11 11:08:20 localhost hwcert-backend: rv = test.run()                                                                                    
4408 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/tests/info/info.py", line 603, in run                                      
4409 Oct 11 11:08:20 localhost hwcert-backend: if not self.runSubTest(self.generateSystemReport, name="System Report", description="generate syste     m report"):                                                                                                                                  
4410 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/lib/hwcert/test.py", line 473, in runSubTest                               
4411 Oct 11 11:08:20 localhost hwcert-backend: result = subtestFunction()                                                                         
4412 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/tests/info/info.py", line 532, in generateSystemReport                     
4413 Oct 11 11:08:20 localhost hwcert-backend: result = self.__processSystemReport("sosreport --batch -n selinux")                                
4414 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/share/hwcert/tests/info/info.py", line 567, in __processSystemReport                    
4415 Oct 11 11:08:20 localhost hwcert-backend: shutil.copy(tarFile, self.getOutputDirectory())                                                    
4416 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/lib64/python2.7/shutil.py", line 119, in copy                                           
4417 Oct 11 11:08:20 localhost hwcert-backend: copyfile(src, dst)                                                                                 
4418 Oct 11 11:08:20 localhost hwcert-backend: File "/usr/lib64/python2.7/shutil.py", line 83, in copyfile                                        
4419 Oct 11 11:08:20 localhost hwcert-backend: with open(dst, 'wb') as fdst:                                                                      
4420 Oct 11 11:08:20 localhost hwcert-backend: IOError: [Errno 2] No such file or directory: u'/var/log/hwcert/runs/1/info'

Comment 2 garrickyang 2014-10-15 09:18:49 UTC
Created attachment 947165 [details]
kdump output

Comment 3 Guangze Bai 2014-10-15 09:38:28 UTC
Hi Lenovo,

Could you make SELinux permissive like below command and re-run your kdump test?

# setenforce 0

Comment 4 Guangze Bai 2014-10-15 09:58:41 UTC
Hi Lenovo,

Your kdump test succeeded, but info test got terminated abnormally. I'm still investigating the issue. Could you also try again with abrtd disabled?

# systemctl stop abrtd.service
# systemctl disable abrtd.service

Comment 5 garrickyang 2014-10-15 10:10:17 UTC
Hi,

1. When use the command "# setenforce 0", it displayed as the attachment kdump.jpg, and the kdump nfs test cannot start successfully.

2.We used the command as below, but is still cannot stop.
# systemctl stop abrtd.service
# systemctl disable abrtd.service

Comment 6 garrickyang 2014-10-15 10:11:02 UTC
Created attachment 947181 [details]
kdump.jpg

Comment 7 Guangze Bai 2014-10-15 10:15:09 UTC
(In reply to garrickyang from comment #5)
> Hi,
> 
> 1. When use the command "# setenforce 0", it displayed as the attachment
> kdump.jpg, and the kdump nfs test cannot start successfully.

answer y, directly

Comment 8 garrickyang 2014-10-15 10:22:49 UTC
Hi,

After select y, it still cannot stop. And with command "hwcert-bankend print",it displayed "hwcert is already running(lock file /var/lock/systems/hwcert found)" ,then select y, it displayed info fail, kdump nfs incomplete.

Comment 9 Guangze Bai 2014-10-15 10:27:42 UTC
Please attach the /var/hwcert/results.xml here.

Thanks

Comment 10 garrickyang 2014-10-15 10:29:37 UTC
Created attachment 947183 [details]
result

Comment 11 Guangze Bai 2014-10-15 11:13:24 UTC
According to comment 2, the kdump test finished successfully, the problem is the failed info test which terminated abnormally with traceback as comment 1, but actually the info test went to the end if 'setenforce 0' according to the results.xml from the last comment.

I suspect it's the selinux policy issue and find the following messages strange:

--- [ cut ] ---
Oct 15 18:17:55 localhost setroubleshoot: Plugin Exception restorecon
Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/ls from getattr access on the directory . For complete SELinux messages. run sealert -l 5cafd25e-1e18-4c15-aaa3-ad51f0e3eb2f
Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/ls from getattr access on the directory .

*****  Plugin catchall (100. confidence) suggests   **************************

If you believe that ls should be allowed getattr access on the  directory by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# grep ls /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp


Oct 15 18:17:55 localhost setroubleshoot: Plugin Exception restorecon_source
Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/ls from getattr access on the directory . For complete SELinux messages. run sealert -l 5cafd25e-1e18-4c15-aaa3-ad51f0e3eb2f
Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/ls from getattr access on the directory .
                                                                                                                                                                                                                                             $

If you believe that ls should be allowed getattr access on the  directory by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# grep ls /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp


Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/bash from read access on the lnk_file . For complete SELinux messages. run sealert -l 8091a8f3-b1f0-4b24-8d11-bf7f97f4cb88
Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/bash from read access on the lnk_file .

*****  Plugin catchall (100. confidence) suggests   **************************
If you believe that bash should be allowed read access on the  lnk_file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# grep barf /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp


Oct 15 18:17:55 localhost setroubleshoot: SELinux is preventing /usr/bin/ls from read access on the directory . For complete SELinux messages. run sealert -l 00236891-6a66-4fb8-922d-e39419146ce3
Oct 15 18:17:55 localhost python: SELinux is preventing /usr/bin/ls from read access on the directory .

*****  Plugin catchall (100. confidence) suggests   **************************

If you believe that ls should be allowed read access on the  directory by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# grep ls /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp

--- [ end ] ---

Another thing strange is the warning in lsof file from the sosreport:
--- [ cut ] ---
lsof: WARNING: can't stat() rootfs file system /
lsof: WARNING: can't stat() proc file system /proc
lsof: WARNING: can't stat() sysfs file system /sys
lsof: WARNING: can't stat() devtmpfs file system /dev
lsof: WARNING: can't stat() securityfs file system /sys/kernel/security
lsof: WARNING: can't stat() tmpfs file system /dev/shm
lsof: WARNING: can't stat() devpts file system /dev/pts
lsof: WARNING: can't stat() tmpfs file system /run
lsof: WARNING: can't stat() tmpfs file system /sys/fs/cgroup
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/systemd
lsof: WARNING: can't stat() pstore file system /sys/fs/pstore
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/cpuset
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/cpu,cpuacct
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/memory
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/devices
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/freezer
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/net_cls
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/blkio
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/perf_event
lsof: WARNING: can't stat() cgroup file system /sys/fs/cgroup/hugetlb
lsof: WARNING: can't stat() configfs file system /sys/kernel/config
lsof: WARNING: can't stat() xfs file system /
lsof: WARNING: can't stat() selinuxfs file system /sys/fs/selinux
lsof: WARNING: can't stat() debugfs file system /sys/kernel/debug
lsof: WARNING: can't stat() mqueue file system /dev/mqueue
lsof: WARNING: can't stat() hugetlbfs file system /dev/hugepages
lsof: WARNING: can't stat() rpc_pipefs file system /var/lib/nfs/rpc_pipefs
lsof: WARNING: can't stat() nfsd file system /proc/fs/nfsd
lsof: WARNING: can't stat() xfs file system /boot
lsof: WARNING: can't stat() binfmt_misc file system /proc/sys/fs/binfmt_misc
lsof: WARNING: can't stat() fusectl file system /sys/fs/fuse/connections
lsof: WARNING: can't stat() nfs4 file system /tmp/hwcert-kdump-S035mi/hwcert-nfs
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/0/gvfs
--- [ end ] ---

I don't know why those. Are you using a clean OS for certification?

Comment 12 Guangze Bai 2014-10-15 11:22:34 UTC
Not sure the following operations help, just a suggestion:

# yum remove 'abrt*'
# yum reinstall '*selinux*' sos

Comment 13 garrickyang 2014-10-16 07:08:50 UTC
Created attachment 947466 [details]
selinux alert

Comment 14 garrickyang 2014-10-16 07:14:03 UTC
Created attachment 947479 [details]
reinstall selinux

Comment 15 garrickyang 2014-10-16 07:20:03 UTC
Hi,

1. After using the below commands, it displayes as the attachment 'reinstall selinux.jpg'. And we continue test the kdump nfs, the test cannot stop too.
# yum remove 'abrt*' 
# yum reinstall '*selinux*' sos.

2.After the reboot when testing kdump nfs, an selinux alert displayed.
Please refer to the 'selinux alert.txt' attachment for the details.

Thanks

Comment 16 Guangze Bai 2014-10-16 07:43:07 UTC
(In reply to garrickyang from comment #14)
> Created attachment 947479 [details]
> reinstall selinux

That's because you didn't setup any repo as the command prompted.

Comment 17 garrickyang 2014-10-16 08:08:31 UTC
How to setup repo?

Comment 18 Guangze Bai 2014-10-16 08:12:52 UTC
Please consult with our support team.

Comment 19 garrickyang 2014-10-16 08:35:13 UTC
I used a clean OS to test kdump nfs, the test process cannot stop either.

Comment 20 Guangze Bai 2014-10-16 08:44:18 UTC
Really? Please attach the results.xml.

Comment 21 garrickyang 2014-10-16 09:56:07 UTC
Created attachment 947541 [details]
new-os results.xml

Comment 22 Guangze Bai 2014-10-16 10:31:24 UTC
Hello,

Your tests looks good. 

There's a fact: After nfs kdump finishes and reboots back, the hwcert daemon continues itself, so if you issue "hwcert-backend print" and encounter "hwcert is already running (lock file /var/lock/systems/hwcert found)", please answer n and wait minutes for its completion because the info test often spends several minutes on the sosreport generation. After that, please run the print command again.

Comment 23 garrickyang 2014-10-17 02:38:52 UTC
Hi,

We wait about 1 hours and used the command "hwcert-backend print", it still displayed "hwcert is already running (lock file /var/lock/systems/hwcert found)".
Is there any method to finish the test?

Thanks

Comment 24 garrickyang 2014-10-17 05:59:08 UTC
Hi,

After "hwcert is already running (lock file /var/lock/systems/hwcert found)" displayed, we select y, and it displayed info pass, kdump nfs incomplete.

Comment 25 garrickyang 2014-10-20 07:10:30 UTC
Hi,

All the servers of lenovo have the same issue about kdump nfs and the result.xml has the abnormal infomation in comment 11 that you mentioned. Is there any update about the bug?

Thanks

Comment 26 Guangze Bai 2014-10-20 08:04:55 UTC
Hi,

The correct lock file is "/var/lock/subsys/hwcert", why it prompted "/var/lock/systems/hwcert" in your testing?

Please "rm -f /etc/hwcert.xml" and re-run your test.

Comment 27 garrickyang 2014-10-20 08:48:53 UTC
Hi,

With the command "rm -f /etc/hwcert.xml" and re-run the test,it prompted "/var/lock/subsys/hwcert", but it cannot stopped either.

Comment 28 Guangze Bai 2014-10-20 10:07:21 UTC
I was not able to reproduce your failure from our boxes. Still suspicious of the selinux problems. Please run the following command and paste the results:

# touch /var/lock/subsys/dbgcert
# ls -Z /var/lock/subsys/dbgcert
# rm -f /var/lock/subsys/dbgcert

# stat /var /tmp

Comment 29 garrickyang 2014-10-20 10:22:11 UTC
Created attachment 948488 [details]
the new result

Comment 30 garrickyang 2014-10-20 10:23:23 UTC
Hi 

The result is as the below:

[root@localhost Desktop]# touch /var/lock/subsys/dbgcert
[root@localhost Desktop]# ls -Z /var/lock/subsys/dbgcert 
-rw-r--r--. root root unconfined_u:object_r:var_lock_t:s0 /var/lock/subsys/dbgcert
[root@localhost Desktop]# rm -f /var/lock/subsys/dbgcert 
[root@localhost Desktop]# stat /var/tmp
  File: ‘/var/tmp’
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: fd00h/64768d	Inode: 402654232   Links: 12
Access: (1777/drwxrwxrwt)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:tmp_t:s0
Access: 2014-10-20 16:51:09.995707523 +0800
Modify: 2014-10-20 17:53:08.472769566 +0800
Change: 2014-10-20 17:53:08.472769566 +0800
 Birth: -

Comment 31 garrickyang 2014-10-21 06:19:53 UTC
Hi 

Is there any update about the issue?

Comment 35 Guangze Bai 2014-11-07 04:05:36 UTC
Lenovo,

For debugging the issue, please do exactly following these instructions:

1.
Edit /etc/init.d/hwcert-backend and comment out the the only statement in start() function.

2.
Re-run your kdump nfs test, but right after the system reboot, run "hwcert-backend continue" and paste the result here.

Thanks

Comment 36 garrickyang 2014-11-07 08:19:37 UTC
Created attachment 954815 [details]
result for hwcert-banckend continue

Comment 37 Guangze Bai 2014-11-07 09:29:44 UTC
Oh, you must issue the test with "--server" option. Ok, it's fine.

Please again test per comment 35 but with "hwcert-backend continue --server 192.168.5.1" and feedback.

Thanks

Comment 38 garrickyang 2014-11-07 09:45:17 UTC
Created attachment 954857 [details]
new-result

Comment 39 garrickyang 2014-11-07 09:49:20 UTC
Created attachment 954860 [details]
print-result

Comment 40 Greg Nichols 2014-11-07 14:20:42 UTC
Attachment 954857 [details] Shows the sosreport sucessfully copied to the log directory:

Your sosreport has been generated and saved in:
/var/tmp/sosreport-localhost.localdomain-20141108014219.tar.xz
Copied sosreport --batch -n selinux /var/tmp/sosreport-localhost.localdomain-20141108014219.tar.xz to /var/log/hwcert/runs/1/info

And it's also enclosed as an attachment within the results.xml.