Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 626970

Summary:

FEAT: kdump test needs to verify kdump

Product:

[Retired] Red Hat Hardware Certification Program

Reporter:

Greg Nichols <gnichols>

Component:

Test Suite (tests)

Assignee:

Greg Nichols <gnichols>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

high

Version:

1.2

CC:

bugproxy, czhang, rlandry, sdenham, yuchen

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

A new feature is added to v7 1.3, that reboot test enables verifying the vmcore kdump image after reboot finishes. This feature requires corresponding kernel debuginfo package and crash utility to be installed.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-05-19 14:31:48 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

635973, 701491

Bug Blocks:

Attachments:

Description	Flags
reboot.py reboot test with kdump image verification	none
revised continuation to mark reboot method (panic vs. reboot) and use localtime over gmtime.	none
config file parser for v7	none
v7 spec file patch adding kernel-debuginfo to the required rpms	none
reboot.py patch on R28 adding network testing via nfs	none
R28 configfile.py patch to allow reset/removal of parameters	none
reboot test log	none
output.log of the reboot test	none
output of rpm -ql kernel-debuginfo	none
file /var/v7/results.xml	none
results of reboot test after install correct kernel-debuginfo	none

Description Greg Nichols 2010-08-24 18:53:03 UTC

Description of problem:

Currently, the reboot test will test kdump by causing a kernel panic, then verify that the system rebooted.   In addition, it needs to verify that kdump produced a usable crash dump.


Version-Release number of selected component (if applicable):

v7 1.2 R20

Comment 1 Greg Nichols 2011-03-11 11:31:54 UTC

Created attachment 483690 [details]
reboot.py reboot test with kdump image verification

This revision to the reboot test verifies that a kdump image was produced (vmcore) when kdump is configured and running.   The test parses /etc/kdump.conf to determine the location of the image files (default: /var/crash).  It then looks for the image file with the closest match to the timestamp saved in /var/v7/bootprint when v7 caused the panic.  The file is required to be within 10 min. of the panic.

The image is verified by running crash on the image, and immediately quitting and looking for the return code of the pipe.

Comment 2 Greg Nichols 2011-03-11 11:33:12 UTC

Created attachment 483691 [details]
revised continuation to mark reboot method (panic vs. reboot) and use localtime over gmtime.

Comment 3 Greg Nichols 2011-03-11 11:33:53 UTC

Created attachment 483692 [details]
config file parser for v7

Comment 4 Greg Nichols 2011-03-11 11:39:10 UTC

Questions on the implementation:

- The test requires additional rpms: crash, kernel-debug-info, kernel-debug.  Should these be run-time requires, and only if kdump is operational on the system,
or should they be v7 package requires?

- how long a gap between the image file and the time of panic be allowed (above code has it at 10 min.)?

- Is the image validation sufficient (merely starting crash, then quitting)?

- Should v7 have an environment setting in /etc/v7.xml for location of images, rather than parsing it out of /etc/kdump.conf?

Comment 6 Greg Nichols 2011-03-15 21:01:59 UTC

Created attachment 485605 [details]
v7 spec file patch adding kernel-debuginfo to the required rpms

Comment 17 Greg Nichols 2011-04-18 13:18:55 UTC

Created attachment 492890 [details]
reboot.py patch on R28 adding network testing via nfs

This patch plans the reboot test twice using the tests' logical device parameter.  It is planned on the "local" logical device to test local disk image dumps, observing the kdump path parameter, but removing any net settings.

It is also planned on the "nfs" logical device, where it will set the net parameter to use the export directory of the v7 test server, testing network dumps over nfs.   if there is a net setting in kdump.conf, v7 will use it instead.

Comment 18 Greg Nichols 2011-04-18 13:21:56 UTC

Created attachment 492893 [details]
R28 configfile.py patch to allow reset/removal of parameters

Comment 22 Greg Nichols 2011-04-29 16:25:35 UTC

*** Bug 700846 has been marked as a duplicate of this bug. ***

Comment 23 IBM Bug Proxy 2011-04-29 16:35:02 UTC

Created attachment 495812 [details]
reboot test log

Comment 25 Caspar Zhang 2011-05-01 08:19:37 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A new feature is added to v7 1.3, that reboot test enables verifying the vmcore kdump image after reboot finishes. This feature requires corresponding kernel debuginfo package and crash utility to be installed.

Comment 26 IBM Bug Proxy 2011-05-04 13:10:40 UTC

------- Comment From lxie.com 2011-05-04 09:04 EDT-------
RedHat,

When will you be able to provide the fix in next HTS build for IBM to test?

Comment 27 IBM Bug Proxy 2011-05-04 17:40:44 UTC

------- Comment From markwiz.com 2011-05-04 13:36 EDT-------
Red Hat - We have the -43 build. Can we assume it is fixed in it?

Comment 28 Rob Landry 2011-05-04 18:42:32 UTC

IBM - Yes.

Comment 29 IBM Bug Proxy 2011-05-06 06:20:40 UTC

------- Comment From lxie.com 2011-05-06 02:14 EDT-------
(In reply to comment #16)
> IBM - Yes.

Hien,

When you get a chance, Please verify build -43 and post the test results asap.

Thanks,

Linda

Comment 30 IBM Bug Proxy 2011-05-06 20:50:46 UTC

------- Comment From hienn.com 2011-05-06 16:40 EDT-------
Re-run the reboot test with v7-1.3-43. The reboot test:
- passed on NFS
- failed on local

See attachment.

Comment 31 IBM Bug Proxy 2011-05-06 20:50:49 UTC

Created attachment 497458 [details]
output.log of the reboot test


------- Comment (attachment only) From hienn.com 2011-05-06 16:41 EDT-------

Comment 32 Greg Nichols 2011-05-09 12:43:22 UTC

From the attached log:

Looking for vmcore image directories under /var/crash
127.0.0.1-2011-05-05-20:27:24 took 0:00:00
127.0.0.1-2011-04-28-21:52:21 took -7 days, 1:25:00
Found kdump image: /var/crash/127.0.0.1-2011-05-05-20:27:24/vmcore
crash: /usr/lib/debug/lib/modules/2.6.32-71.el6.ppc64/vmlinux: No such file or directory


It seems likely that the rpms for kernel-debug and/or kernel-debuginfo are not installed.   Could you check this?

Comment 33 errata-xmlrpc 2011-05-09 16:13:50 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0497.html

Comment 34 IBM Bug Proxy 2011-05-11 18:20:49 UTC

------- Comment From lxie.com 2011-05-11 14:20 EDT-------
RedHat,

We tested  version '-43', reboot NFS test passed, but reboot local test failed. See below for the test run done by Hien. So please reopen this bug

I re-checked and reran the test. kernel-debuginfo is there. The results was the
same as last time.

[root@eagle3 ~]# rpm -qa|grep v7
v7-1.3-43.el6.noarch
[root@eagle3 ~]# rpm -qa|grep kernel-debug
kernel-debug-devel-2.6.32-71.el6.ppc64
kernel-debug-2.6.32-71.el6.ppc64
kernel-debuginfo-2.6.32-71.24.1.el6.ppc64
kernel-debug-debuginfo-2.6.32-71.24.1.el6.ppc64
kernel-debuginfo-common-ppc64-2.6.32-71.24.1.el6.ppc64
[root@eagle3 ~]# uname -a
Linux eagle3.ltc.austin.ibm.com 2.6.32-71.el6.ppc64 #1 SMP Wed Sep 1 02:56:55
EDT 2010 ppc64 ppc64 ppc64 GNU/Linux

[root@eagle3 ~]# v7 print

Red Hat Hardware Certification test
--------------------------------------------
Test Suite:    1.3    Release: 43
Plan Created:  2011-05-05 21:22:08
Test Server:   10.1.1.29
--------------------------------------------

Run: 1 on 2011-05-05 21:29:11
--------------------------------------------
Tests: 10 planned,  3 run, 3 passed, 0 failed
--------------------------------------------

Test Run 1
----------------------------------------------------------------
usb                                                  -
network eth1    net_00_21_5e_bf_19_65                - PASS
network eth0    net_00_21_5e_bf_19_64                - PASS
memory                                               -
storage sdb     storage_serial_1IBM_IPR_0_0130A201AF6CAD0C -
core                                                 -
profiler                                              -
info                                                 - PASS
reboot  nfs                                          -
reboot  local                                        -

Run: 2 on 2011-05-05 21:42:10
--------------------------------------------
Tests: 10 planned,  5 run, 5 passed, 0 failed
--------------------------------------------

Test Run 2
----------------------------------------------------------------
usb                                                  -
network eth1    net_00_21_5e_bf_19_65                -
network eth0    net_00_21_5e_bf_19_64                -
memory                                               - PASS
storage sdb     storage_serial_1IBM_IPR_0_0130A201AF6CAD0C - PASS
core                                                 - PASS
profiler                                              - PASS
info                                                 - PASS
reboot  nfs                                          -
reboot  local                                        -

Run: 3 on 2011-05-06 01:17:23
--------------------------------------------
Tests: 10 planned,  3 run, 2 passed, 1 failed
--------------------------------------------

Test Run 3
----------------------------------------------------------------
usb                                                  -
network eth1    net_00_21_5e_bf_19_65                -
network eth0    net_00_21_5e_bf_19_64                -
memory                                               -
storage sdb     storage_serial_1IBM_IPR_0_0130A201AF6CAD0C -
core                                                 -
profiler                                              -
info                                                 - PASS
reboot  nfs                                          - PASS
reboot  local                                        - FAIL

Run: 4 on 2011-05-09 21:27:49
--------------------------------------------
Tests: 10 planned,  3 run, 2 passed, 1 failed
--------------------------------------------

Test Run 4
----------------------------------------------------------------
usb                                                  -
network eth1    net_00_21_5e_bf_19_65                -
network eth0    net_00_21_5e_bf_19_64                -
memory                                               -
storage sdb     storage_serial_1IBM_IPR_0_0130A201AF6CAD0C -
core                                                 -
profiler                                              -
info                                                 - PASS
reboot  nfs                                          - PASS
reboot  local                                        - FAIL

Combined Results for 4 Runs:
--------------------------------------------
10 tests planned
9 tests run
1 tests always failed
8 tests always passed
[root@eagle3 ~]#

Comment 35 Greg Nichols 2011-05-12 13:12:25 UTC

Please attach the results of rpm -ql kernel-debuginfo, and the file /var/v7/results.xml

Comment 36 IBM Bug Proxy 2011-05-12 16:01:21 UTC

Created attachment 498574 [details]
output of rpm -ql kernel-debuginfo


------- Comment (attachment only) From hienn.com 2011-05-12 11:59 EDT-------

Comment 37 IBM Bug Proxy 2011-05-12 16:10:59 UTC

------- Comment From hienn.com 2011-05-12 12:05 EDT-------
The file results.xml is larger than 4000KB (6136KB). So, the attachment couldn't get thru.
I will clean up the v7 run logs and rerun only reboot test. Then will attach th results.xml file later.

Comment 38 IBM Bug Proxy 2011-05-12 16:21:26 UTC

Created attachment 498577 [details]
file /var/v7/results.xml


------- Comment (attachment only) From hienn.com 2011-05-12 12:16 EDT-------

Comment 39 Greg Nichols 2011-05-12 16:40:34 UTC

There are a couple of problems here:

1) There is no v7 test server for the nfs reboot/kdump test

   - please set the --server option to the v7 test server host

2) The kernel-debuginfo release does not match the kernel release:

   kernel:           2.6.32-71.el6.ppc64
   kernel-debuginfo: 2.6.32-71.24.1.el6.ppc64

   - please install a matching kernel-debuginfo package

Comment 40 IBM Bug Proxy 2011-05-12 23:51:33 UTC

------- Comment From hienn.com 2011-05-12 19:40 EDT-------
Applied correct kernel-debuginfo
[root@eagle3 v7]# rpm -qa|grep kernel-debug
kernel-debug-devel-2.6.32-71.el6.ppc64
kernel-debuginfo-2.6.32-71.el6.ppc64
kernel-debug-debuginfo-2.6.32-71.el6.ppc64
kernel-debuginfo-common-ppc64-2.6.32-71.el6.ppc64
kernel-debug-2.6.32-71.el6.ppc64
[root@eagle3 v7]#

the reboot failed on nfs:

[root@eagle3 v7]# v7 print

Red Hat Hardware Certification test
--------------------------------------------
Test Suite:    1.3    Release: 43
Plan Created:  2011-05-13 03:49:22
Test Server:   10.1.1.29
--------------------------------------------

Run: 1 on 2011-05-13 03:50:31
--------------------------------------------
Tests: 10 planned,  3 run, 2 passed, 1 failed
--------------------------------------------

Test Run 1
----------------------------------------------------------------
usb                                                  -
network eth1    net_00_21_5e_bf_19_65                -
network eth0    net_00_21_5e_bf_19_64                -
memory                                               -
storage sdb     storage_serial_1IBM_IPR_0_0130A201AF6CAD0C -
core                                                 -
profiler                                              -
info                                                 - PASS
reboot  nfs                                          - FAIL
reboot  local                                        - PASS

Combined Results for 1 Runs:
--------------------------------------------
10 tests planned
3 tests run
1 tests always failed
2 tests always passed
[root@eagle3 v7]#

Comment 41 IBM Bug Proxy 2011-05-12 23:51:38 UTC

Created attachment 498652 [details]
results of reboot test after install correct kernel-debuginfo


------- Comment (attachment only) From hienn.com 2011-05-12 19:43 EDT-------

Comment 42 Greg Nichols 2011-05-13 12:22:06 UTC

Is the v7 test server running on 10.1.1.29 ?

You can check this with "v7 server status" on 10.1.1.29.

Thanks!

Comment 43 IBM Bug Proxy 2011-05-13 17:01:36 UTC

------- Comment From hienn.com 2011-05-13 12:59 EDT-------
Of course, v7 server running on 10.1.1.29 and two systems can communicate to each other.

[root@jeep01 tmp]# v7 server status
Tested OS: Red Hat Enterprise Linux Server 6 (Santiago)
Kernel RPM: kernel-2.6.32-71.el6
v7 version 1.3, release 43
make server RUNMODE=status
python network.py server status
lmbench bw_tcp is running
lmbench lat_udp is running
/var/v7/export is exported
rpc.svcgssd is stopped
rpc.mountd (pid 32209) is running...
nfsd (pid 32206 32205 32204 32203 32202 32201 32200 32199) is running...
rpc.rquotad (pid 32193) is running...
httpd (pid  32239) is running...
The v7 server daemon is running

[root@jeep01 tmp]# rpm -qa|grep kernel-debug
kernel-debug-debuginfo-2.6.32-71.el6.ppc64
kernel-debuginfo-2.6.32-71.el6.ppc64
kernel-debuginfo-common-ppc64-2.6.32-71.el6.ppc64

[root@jeep01 tmp]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:1A:64:D8:17:90
inet addr:10.1.1.29  Bcast:10.1.1.255  Mask:255.255.255.0
inet6 addr: fe80::21a:64ff:fed8:1790/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:8674589 errors:0 dropped:0 overruns:0 frame:0
TX packets:375535 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9836050158 (9.1 GiB)  TX bytes:54048539826 (50.3 GiB)

[root@jeep01 tmp]# ping -c 3 -R 10.1.1.30
PING 10.1.1.30 (10.1.1.30) 56(124) bytes of data.
64 bytes from 10.1.1.30: icmp_seq=1 ttl=64 time=13.2 ms
RR: 	10.1.1.29
10.1.1.30
10.1.1.30
10.1.1.29

64 bytes from 10.1.1.30: icmp_seq=2 ttl=64 time=0.042 ms	(same route)
64 bytes from 10.1.1.30: icmp_seq=3 ttl=64 time=0.041 ms	(same route)

--- 10.1.1.30 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.041/4.454/13.281/6.241 ms
[root@jeep01 tmp]# ping -c 3 -R 10.1.1.31
PING 10.1.1.31 (10.1.1.31) 56(124) bytes of data.
64 bytes from 10.1.1.31: icmp_seq=1 ttl=64 time=13.4 ms
RR: 	10.1.1.29
10.1.1.31
10.1.1.31
10.1.1.29

64 bytes from 10.1.1.31: icmp_seq=2 ttl=64 time=0.042 ms	(same route)
64 bytes from 10.1.1.31: icmp_seq=3 ttl=64 time=0.038 ms	(same route)

--- 10.1.1.31 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.038/4.496/13.408/6.301 ms
[root@jeep01 tmp]#

Comment 44 IBM Bug Proxy 2011-05-19 14:17:08 UTC

------- Comment From lxie.com 2011-05-19 10:01 EDT-------
(In reply to comment #33)
> Of course, v7 server running on 10.1.1.29 and two systems can communicate to
> each other.
>
> [root@jeep01 tmp]# v7 server status
> Tested OS: Red Hat Enterprise Linux Server 6 (Santiago)
> Kernel RPM: kernel-2.6.32-71.el6
> v7 version 1.3, release 43
> make server RUNMODE=status
> python network.py server status
> lmbench bw_tcp is running
> lmbench lat_udp is running
> /var/v7/export is exported
> rpc.svcgssd is stopped
> rpc.mountd (pid 32209) is running...
> nfsd (pid 32206 32205 32204 32203 32202 32201 32200 32199) is running...
> rpc.rquotad (pid 32193) is running...
> httpd (pid  32239) is running...
> The v7 server daemon is running
>
> [root@jeep01 tmp]# rpm -qa|grep kernel-debug
> kernel-debug-debuginfo-2.6.32-71.el6.ppc64
> kernel-debuginfo-2.6.32-71.el6.ppc64
> kernel-debuginfo-common-ppc64-2.6.32-71.el6.ppc64
>
>  [root@jeep01 tmp]# ifconfig eth0
> eth0      Link encap:Ethernet  HWaddr 00:1A:64:D8:17:90
>           inet addr:10.1.1.29  Bcast:10.1.1.255  Mask:255.255.255.0
>           inet6 addr: fe80::21a:64ff:fed8:1790/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:8674589 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:375535 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:9836050158 (9.1 GiB)  TX bytes:54048539826 (50.3 GiB)
>
> [root@jeep01 tmp]# ping -c 3 -R 10.1.1.30
> PING 10.1.1.30 (10.1.1.30) 56(124) bytes of data.
> 64 bytes from 10.1.1.30: icmp_seq=1 ttl=64 time=13.2 ms
> RR:     10.1.1.29
>         10.1.1.30
>         10.1.1.30
>         10.1.1.29
>
> 64 bytes from 10.1.1.30: icmp_seq=2 ttl=64 time=0.042 ms        (same route)
> 64 bytes from 10.1.1.30: icmp_seq=3 ttl=64 time=0.041 ms        (same route)
>
> --- 10.1.1.30 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> rtt min/avg/max/mdev = 0.041/4.454/13.281/6.241 ms
> [root@jeep01 tmp]# ping -c 3 -R 10.1.1.31
> PING 10.1.1.31 (10.1.1.31) 56(124) bytes of data.
> 64 bytes from 10.1.1.31: icmp_seq=1 ttl=64 time=13.4 ms
> RR:     10.1.1.29
>         10.1.1.31
>         10.1.1.31
>         10.1.1.29
>
> 64 bytes from 10.1.1.31: icmp_seq=2 ttl=64 time=0.042 ms        (same route)
> 64 bytes from 10.1.1.31: icmp_seq=3 ttl=64 time=0.038 ms        (same route)
>
> --- 10.1.1.31 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> rtt min/avg/max/mdev = 0.038/4.496/13.408/6.301 ms
> [root@jeep01 tmp]#

RedHat,

Just wanted to clarify/confirm that the latest test results above was from testing v7-1.3-43.el6.noarch.rpm  which is the same as the version listed in the ERRATA (http://rhn.redhat.com/errata/RHBA-2011-0497.html). So the ERRATA didn't fix this issue. Please reopen this bug on your side.

Thanks,

Linda

Comment 45 Greg Nichols 2011-05-19 14:26:00 UTC

I still suspect issues in either communication with the server, or disk space on the server.

1) How much space is available on /var/v7/export/v7-net/var/crash
 on the server?  Does this directory exist, and are there any core files?

2) could you try removing the path setting from /etc/kdump.conf (apparently it's set to v7-net/var/crash), and rerun the test with --device nfs

4) is selinux set to premissive on the server?

3) please attach /etc/kdump.conf after running the test.

Comment 46 Greg Nichols 2011-05-19 14:28:35 UTC

I'd suggest we open a new bug for this issue re: comment 33, as this bug is for verification of kdump-generated cores, where the current IBM issue is likely different.

I'll close this bug again, and open a new one to track this issue.

Comment 47 Greg Nichols 2011-05-19 14:31:48 UTC

Created BZ 706115 for the current issue.   Closing this FEAT bug.

Comment 48 IBM Bug Proxy 2011-06-01 18:42:12 UTC

------- Comment From hienn.com 2011-06-01 14:34 EDT-------
Follow up on 72167. I close this bug.

*** This bug has been marked as a duplicate of bug 72167 ***