Bug 626970
| Summary: | FEAT: kdump test needs to verify kdump | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Hardware Certification Program | Reporter: | Greg Nichols <gnichols> |
| Component: | Test Suite (tests) | Assignee: | Greg Nichols <gnichols> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 1.2 | CC: | bugproxy, czhang, rlandry, sdenham, yuchen |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
A new feature is added to v7 1.3, that reboot test enables verifying the vmcore kdump image after reboot finishes. This feature requires corresponding kernel debuginfo package and crash utility to be installed.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-05-19 14:31:48 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 635973, 701491 | ||
| Bug Blocks: | |||
| Attachments: | |||
|
Description
Greg Nichols
2010-08-24 18:53:03 UTC
Created attachment 483690 [details]
reboot.py reboot test with kdump image verification
This revision to the reboot test verifies that a kdump image was produced (vmcore) when kdump is configured and running. The test parses /etc/kdump.conf to determine the location of the image files (default: /var/crash). It then looks for the image file with the closest match to the timestamp saved in /var/v7/bootprint when v7 caused the panic. The file is required to be within 10 min. of the panic.
The image is verified by running crash on the image, and immediately quitting and looking for the return code of the pipe.
Created attachment 483691 [details]
revised continuation to mark reboot method (panic vs. reboot) and use localtime over gmtime.
Created attachment 483692 [details]
config file parser for v7
Questions on the implementation: - The test requires additional rpms: crash, kernel-debug-info, kernel-debug. Should these be run-time requires, and only if kdump is operational on the system, or should they be v7 package requires? - how long a gap between the image file and the time of panic be allowed (above code has it at 10 min.)? - Is the image validation sufficient (merely starting crash, then quitting)? - Should v7 have an environment setting in /etc/v7.xml for location of images, rather than parsing it out of /etc/kdump.conf? Created attachment 485605 [details]
v7 spec file patch adding kernel-debuginfo to the required rpms
Created attachment 492890 [details]
reboot.py patch on R28 adding network testing via nfs
This patch plans the reboot test twice using the tests' logical device parameter. It is planned on the "local" logical device to test local disk image dumps, observing the kdump path parameter, but removing any net settings.
It is also planned on the "nfs" logical device, where it will set the net parameter to use the export directory of the v7 test server, testing network dumps over nfs. if there is a net setting in kdump.conf, v7 will use it instead.
Created attachment 492893 [details]
R28 configfile.py patch to allow reset/removal of parameters
*** Bug 700846 has been marked as a duplicate of this bug. *** Created attachment 495812 [details]
reboot test log
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
A new feature is added to v7 1.3, that reboot test enables verifying the vmcore kdump image after reboot finishes. This feature requires corresponding kernel debuginfo package and crash utility to be installed.
------- Comment From lxie.com 2011-05-04 09:04 EDT------- RedHat, When will you be able to provide the fix in next HTS build for IBM to test? ------- Comment From markwiz.com 2011-05-04 13:36 EDT------- Red Hat - We have the -43 build. Can we assume it is fixed in it? IBM - Yes. ------- Comment From lxie.com 2011-05-06 02:14 EDT------- (In reply to comment #16) > IBM - Yes. Hien, When you get a chance, Please verify build -43 and post the test results asap. Thanks, Linda ------- Comment From hienn.com 2011-05-06 16:40 EDT------- Re-run the reboot test with v7-1.3-43. The reboot test: - passed on NFS - failed on local See attachment. Created attachment 497458 [details]
output.log of the reboot test
------- Comment (attachment only) From hienn.com 2011-05-06 16:41 EDT-------
From the attached log: Looking for vmcore image directories under /var/crash 127.0.0.1-2011-05-05-20:27:24 took 0:00:00 127.0.0.1-2011-04-28-21:52:21 took -7 days, 1:25:00 Found kdump image: /var/crash/127.0.0.1-2011-05-05-20:27:24/vmcore crash: /usr/lib/debug/lib/modules/2.6.32-71.el6.ppc64/vmlinux: No such file or directory It seems likely that the rpms for kernel-debug and/or kernel-debuginfo are not installed. Could you check this? An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0497.html ------- Comment From lxie.com 2011-05-11 14:20 EDT------- RedHat, We tested version '-43', reboot NFS test passed, but reboot local test failed. See below for the test run done by Hien. So please reopen this bug I re-checked and reran the test. kernel-debuginfo is there. The results was the same as last time. [root@eagle3 ~]# rpm -qa|grep v7 v7-1.3-43.el6.noarch [root@eagle3 ~]# rpm -qa|grep kernel-debug kernel-debug-devel-2.6.32-71.el6.ppc64 kernel-debug-2.6.32-71.el6.ppc64 kernel-debuginfo-2.6.32-71.24.1.el6.ppc64 kernel-debug-debuginfo-2.6.32-71.24.1.el6.ppc64 kernel-debuginfo-common-ppc64-2.6.32-71.24.1.el6.ppc64 [root@eagle3 ~]# uname -a Linux eagle3.ltc.austin.ibm.com 2.6.32-71.el6.ppc64 #1 SMP Wed Sep 1 02:56:55 EDT 2010 ppc64 ppc64 ppc64 GNU/Linux [root@eagle3 ~]# v7 print Red Hat Hardware Certification test -------------------------------------------- Test Suite: 1.3 Release: 43 Plan Created: 2011-05-05 21:22:08 Test Server: 10.1.1.29 -------------------------------------------- Run: 1 on 2011-05-05 21:29:11 -------------------------------------------- Tests: 10 planned, 3 run, 3 passed, 0 failed -------------------------------------------- Test Run 1 ---------------------------------------------------------------- usb - network eth1 net_00_21_5e_bf_19_65 - PASS network eth0 net_00_21_5e_bf_19_64 - PASS memory - storage sdb storage_serial_1IBM_IPR_0_0130A201AF6CAD0C - core - profiler - info - PASS reboot nfs - reboot local - Run: 2 on 2011-05-05 21:42:10 -------------------------------------------- Tests: 10 planned, 5 run, 5 passed, 0 failed -------------------------------------------- Test Run 2 ---------------------------------------------------------------- usb - network eth1 net_00_21_5e_bf_19_65 - network eth0 net_00_21_5e_bf_19_64 - memory - PASS storage sdb storage_serial_1IBM_IPR_0_0130A201AF6CAD0C - PASS core - PASS profiler - PASS info - PASS reboot nfs - reboot local - Run: 3 on 2011-05-06 01:17:23 -------------------------------------------- Tests: 10 planned, 3 run, 2 passed, 1 failed -------------------------------------------- Test Run 3 ---------------------------------------------------------------- usb - network eth1 net_00_21_5e_bf_19_65 - network eth0 net_00_21_5e_bf_19_64 - memory - storage sdb storage_serial_1IBM_IPR_0_0130A201AF6CAD0C - core - profiler - info - PASS reboot nfs - PASS reboot local - FAIL Run: 4 on 2011-05-09 21:27:49 -------------------------------------------- Tests: 10 planned, 3 run, 2 passed, 1 failed -------------------------------------------- Test Run 4 ---------------------------------------------------------------- usb - network eth1 net_00_21_5e_bf_19_65 - network eth0 net_00_21_5e_bf_19_64 - memory - storage sdb storage_serial_1IBM_IPR_0_0130A201AF6CAD0C - core - profiler - info - PASS reboot nfs - PASS reboot local - FAIL Combined Results for 4 Runs: -------------------------------------------- 10 tests planned 9 tests run 1 tests always failed 8 tests always passed [root@eagle3 ~]# Please attach the results of rpm -ql kernel-debuginfo, and the file /var/v7/results.xml Created attachment 498574 [details]
output of rpm -ql kernel-debuginfo
------- Comment (attachment only) From hienn.com 2011-05-12 11:59 EDT-------
------- Comment From hienn.com 2011-05-12 12:05 EDT------- The file results.xml is larger than 4000KB (6136KB). So, the attachment couldn't get thru. I will clean up the v7 run logs and rerun only reboot test. Then will attach th results.xml file later. Created attachment 498577 [details]
file /var/v7/results.xml
------- Comment (attachment only) From hienn.com 2011-05-12 12:16 EDT-------
There are a couple of problems here: 1) There is no v7 test server for the nfs reboot/kdump test - please set the --server option to the v7 test server host 2) The kernel-debuginfo release does not match the kernel release: kernel: 2.6.32-71.el6.ppc64 kernel-debuginfo: 2.6.32-71.24.1.el6.ppc64 - please install a matching kernel-debuginfo package ------- Comment From hienn.com 2011-05-12 19:40 EDT------- Applied correct kernel-debuginfo [root@eagle3 v7]# rpm -qa|grep kernel-debug kernel-debug-devel-2.6.32-71.el6.ppc64 kernel-debuginfo-2.6.32-71.el6.ppc64 kernel-debug-debuginfo-2.6.32-71.el6.ppc64 kernel-debuginfo-common-ppc64-2.6.32-71.el6.ppc64 kernel-debug-2.6.32-71.el6.ppc64 [root@eagle3 v7]# the reboot failed on nfs: [root@eagle3 v7]# v7 print Red Hat Hardware Certification test -------------------------------------------- Test Suite: 1.3 Release: 43 Plan Created: 2011-05-13 03:49:22 Test Server: 10.1.1.29 -------------------------------------------- Run: 1 on 2011-05-13 03:50:31 -------------------------------------------- Tests: 10 planned, 3 run, 2 passed, 1 failed -------------------------------------------- Test Run 1 ---------------------------------------------------------------- usb - network eth1 net_00_21_5e_bf_19_65 - network eth0 net_00_21_5e_bf_19_64 - memory - storage sdb storage_serial_1IBM_IPR_0_0130A201AF6CAD0C - core - profiler - info - PASS reboot nfs - FAIL reboot local - PASS Combined Results for 1 Runs: -------------------------------------------- 10 tests planned 3 tests run 1 tests always failed 2 tests always passed [root@eagle3 v7]# Created attachment 498652 [details]
results of reboot test after install correct kernel-debuginfo
------- Comment (attachment only) From hienn.com 2011-05-12 19:43 EDT-------
Is the v7 test server running on 10.1.1.29 ? You can check this with "v7 server status" on 10.1.1.29. Thanks! ------- Comment From hienn.com 2011-05-13 12:59 EDT------- Of course, v7 server running on 10.1.1.29 and two systems can communicate to each other. [root@jeep01 tmp]# v7 server status Tested OS: Red Hat Enterprise Linux Server 6 (Santiago) Kernel RPM: kernel-2.6.32-71.el6 v7 version 1.3, release 43 make server RUNMODE=status python network.py server status lmbench bw_tcp is running lmbench lat_udp is running /var/v7/export is exported rpc.svcgssd is stopped rpc.mountd (pid 32209) is running... nfsd (pid 32206 32205 32204 32203 32202 32201 32200 32199) is running... rpc.rquotad (pid 32193) is running... httpd (pid 32239) is running... The v7 server daemon is running [root@jeep01 tmp]# rpm -qa|grep kernel-debug kernel-debug-debuginfo-2.6.32-71.el6.ppc64 kernel-debuginfo-2.6.32-71.el6.ppc64 kernel-debuginfo-common-ppc64-2.6.32-71.el6.ppc64 [root@jeep01 tmp]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:1A:64:D8:17:90 inet addr:10.1.1.29 Bcast:10.1.1.255 Mask:255.255.255.0 inet6 addr: fe80::21a:64ff:fed8:1790/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8674589 errors:0 dropped:0 overruns:0 frame:0 TX packets:375535 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9836050158 (9.1 GiB) TX bytes:54048539826 (50.3 GiB) [root@jeep01 tmp]# ping -c 3 -R 10.1.1.30 PING 10.1.1.30 (10.1.1.30) 56(124) bytes of data. 64 bytes from 10.1.1.30: icmp_seq=1 ttl=64 time=13.2 ms RR: 10.1.1.29 10.1.1.30 10.1.1.30 10.1.1.29 64 bytes from 10.1.1.30: icmp_seq=2 ttl=64 time=0.042 ms (same route) 64 bytes from 10.1.1.30: icmp_seq=3 ttl=64 time=0.041 ms (same route) --- 10.1.1.30 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.041/4.454/13.281/6.241 ms [root@jeep01 tmp]# ping -c 3 -R 10.1.1.31 PING 10.1.1.31 (10.1.1.31) 56(124) bytes of data. 64 bytes from 10.1.1.31: icmp_seq=1 ttl=64 time=13.4 ms RR: 10.1.1.29 10.1.1.31 10.1.1.31 10.1.1.29 64 bytes from 10.1.1.31: icmp_seq=2 ttl=64 time=0.042 ms (same route) 64 bytes from 10.1.1.31: icmp_seq=3 ttl=64 time=0.038 ms (same route) --- 10.1.1.31 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.038/4.496/13.408/6.301 ms [root@jeep01 tmp]# ------- Comment From lxie.com 2011-05-19 10:01 EDT------- (In reply to comment #33) > Of course, v7 server running on 10.1.1.29 and two systems can communicate to > each other. > > [root@jeep01 tmp]# v7 server status > Tested OS: Red Hat Enterprise Linux Server 6 (Santiago) > Kernel RPM: kernel-2.6.32-71.el6 > v7 version 1.3, release 43 > make server RUNMODE=status > python network.py server status > lmbench bw_tcp is running > lmbench lat_udp is running > /var/v7/export is exported > rpc.svcgssd is stopped > rpc.mountd (pid 32209) is running... > nfsd (pid 32206 32205 32204 32203 32202 32201 32200 32199) is running... > rpc.rquotad (pid 32193) is running... > httpd (pid 32239) is running... > The v7 server daemon is running > > [root@jeep01 tmp]# rpm -qa|grep kernel-debug > kernel-debug-debuginfo-2.6.32-71.el6.ppc64 > kernel-debuginfo-2.6.32-71.el6.ppc64 > kernel-debuginfo-common-ppc64-2.6.32-71.el6.ppc64 > > [root@jeep01 tmp]# ifconfig eth0 > eth0 Link encap:Ethernet HWaddr 00:1A:64:D8:17:90 > inet addr:10.1.1.29 Bcast:10.1.1.255 Mask:255.255.255.0 > inet6 addr: fe80::21a:64ff:fed8:1790/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:8674589 errors:0 dropped:0 overruns:0 frame:0 > TX packets:375535 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:9836050158 (9.1 GiB) TX bytes:54048539826 (50.3 GiB) > > [root@jeep01 tmp]# ping -c 3 -R 10.1.1.30 > PING 10.1.1.30 (10.1.1.30) 56(124) bytes of data. > 64 bytes from 10.1.1.30: icmp_seq=1 ttl=64 time=13.2 ms > RR: 10.1.1.29 > 10.1.1.30 > 10.1.1.30 > 10.1.1.29 > > 64 bytes from 10.1.1.30: icmp_seq=2 ttl=64 time=0.042 ms (same route) > 64 bytes from 10.1.1.30: icmp_seq=3 ttl=64 time=0.041 ms (same route) > > --- 10.1.1.30 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms > rtt min/avg/max/mdev = 0.041/4.454/13.281/6.241 ms > [root@jeep01 tmp]# ping -c 3 -R 10.1.1.31 > PING 10.1.1.31 (10.1.1.31) 56(124) bytes of data. > 64 bytes from 10.1.1.31: icmp_seq=1 ttl=64 time=13.4 ms > RR: 10.1.1.29 > 10.1.1.31 > 10.1.1.31 > 10.1.1.29 > > 64 bytes from 10.1.1.31: icmp_seq=2 ttl=64 time=0.042 ms (same route) > 64 bytes from 10.1.1.31: icmp_seq=3 ttl=64 time=0.038 ms (same route) > > --- 10.1.1.31 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms > rtt min/avg/max/mdev = 0.038/4.496/13.408/6.301 ms > [root@jeep01 tmp]# RedHat, Just wanted to clarify/confirm that the latest test results above was from testing v7-1.3-43.el6.noarch.rpm which is the same as the version listed in the ERRATA (http://rhn.redhat.com/errata/RHBA-2011-0497.html). So the ERRATA didn't fix this issue. Please reopen this bug on your side. Thanks, Linda I still suspect issues in either communication with the server, or disk space on the server. 1) How much space is available on /var/v7/export/v7-net/var/crash on the server? Does this directory exist, and are there any core files? 2) could you try removing the path setting from /etc/kdump.conf (apparently it's set to v7-net/var/crash), and rerun the test with --device nfs 4) is selinux set to premissive on the server? 3) please attach /etc/kdump.conf after running the test. I'd suggest we open a new bug for this issue re: comment 33, as this bug is for verification of kdump-generated cores, where the current IBM issue is likely different. I'll close this bug again, and open a new one to track this issue. Created BZ 706115 for the current issue. Closing this FEAT bug. ------- Comment From hienn.com 2011-06-01 14:34 EDT------- Follow up on 72167. I close this bug. *** This bug has been marked as a duplicate of bug 72167 *** |