Bug 368941 - mkdumprd gives error to stderr with nfs root
Summary: mkdumprd gives error to stderr with nfs root
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kexec-tools
Version: 5.1
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Neil Horman
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 368981 (view as bug list)
Depends On:
Blocks: 426293
TreeView+ depends on / blocked
 
Reported: 2007-11-06 21:33 UTC by Scott Moser
Modified: 2018-10-19 22:34 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-12 17:19:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to remove nfs root detection from mkdumprd (1.25 KB, patch)
2007-12-18 19:34 UTC, Neil Horman
no flags Details | Diff
correct patch (1.45 KB, patch)
2007-12-18 22:12 UTC, Neil Horman
no flags Details | Diff
patch to enable transparent NFS root on kdump (6.72 KB, patch)
2008-05-27 16:55 UTC, Neil Horman
no flags Details | Diff
new patch to enable transparent NFS root on kdump (6.79 KB, patch)
2008-06-03 14:42 UTC, Neil Horman
no flags Details | Diff
console log of booting the kdump kernel after a triggered crash (8.00 KB, text/plain)
2008-06-04 15:25 UTC, IBM Bug Proxy
no flags Details
patch to remove nfs root detection from mkdumprd (1.25 KB, text/plain)
2008-08-03 04:35 UTC, IBM Bug Proxy
no flags Details
correct patch (1.45 KB, application/octet-stream; charset=ISO-8859-1)
2008-08-03 04:35 UTC, IBM Bug Proxy
no flags Details
patch to enable transparent NFS root on kdump (6.72 KB, text/plain)
2008-08-03 04:35 UTC, IBM Bug Proxy
no flags Details
new patch to enable transparent NFS root on kdump (6.79 KB, text/plain)
2008-08-03 04:35 UTC, IBM Bug Proxy
no flags Details
console log of booting the kdump kernel after a triggered crash (8.00 KB, text/plain)
2008-08-03 04:36 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 45210 0 None None None Never

Description Scott Moser 2007-11-06 21:33:37 UTC
Description of problem:

On my QS21 set up to boot with NFS root, mkdumprd spits an error message to
stderr.  It exits with 0 (success) but because 'handlenetdev' is never actually
called, its unlikely to work.

[root@ibm-qs21-01 ~]# rpm -q kexec-tools
kexec-tools-1.101-194.4.el5
[root@ibm-qs21-01 ~]# grep nfs /etc/fstab
192.168.79.232:/exports/ibm-qs21-01 /    nfs     defaults        1 1
[root@ibm-qs21-01 ~]# touch /etc/kdump.conf 
[root@ibm-qs21-01 ~]# /etc/init.d/kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.18-54.el5.rhel5u2.sm4kdump.img
/sbin/mkdumprd: line 1425: handle_netdev: command not found
Starting kdump:[  OK  ]
[root@ibm-qs21-01 ~]# service kdump status
Kdump is operational


The system is 'ibm-qs21-01' if you need access to it, or want me to test
anything, please feel free to ask.

Comment 1 Neil Horman 2007-12-18 19:34:31 UTC
Created attachment 289925 [details]
patch to remove nfs root detection from mkdumprd

Ugh, this is a holdover from when we forked mkdumprd from mkinitrd.  kexec
shouldn't care in the least about nfs root.  Theres just some leftover code
that tries to automatically mount the root file system over nfs.  This patch
should fix it.	Mind you, this does require that you explictly configure a dump
target in /etc/kdump.conf

Comment 2 Scott Moser 2007-12-18 21:43:18 UTC
I'm confused.  The problem described above is that mkdumprd tries to call
'handle_netdev'.  handle_netdev is not a program (or function).  However,
'handlenetdev' is a function defined in the program.

So the fix is:  sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd

Aside from that, I expected kdump to correctly work on nfs root with no
configuration.  My expectation was that it would go about the same path it does
on other root file system types:
 - determine the filesystem type and what modules are needed to mount it
 - create initrd that  mounts the root filesystem
 - dump to /var/crash/date...

I guess that might be asking a bit more than you'd like to provide


Comment 3 Neil Horman 2007-12-18 22:00:30 UTC
"So the fix is:  sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd"
Not really, I've not tested the code that gets generated if you do that.  Given
the extent of the other changes that have been made to mkdumprd from mkinitrd,
the safest (best) thing to do is to remove the nfs root setup code  entirely, as
I'm positive it doesn't generate an init script that works with the rest of the
current dump capture setup in the initramfs that mkdumprd generates.  If you
want to dump to your nfs rootfile system, the solution is direct: add nfs <nfs
server:/path spec> to /etc/kdump.conf.

"I guess that might be asking a bit more than you'd like to provide"
No, I'd love to provide transparent dumping to nfs roots.  I agree that we
should handle an nfs root filesystem just like any other root file system, but
to do that is not as simple as you make it out to be with your proposed fix
above (as your bz 368981 indicates, fixing it your way produces an initramfs
that loads, but doesn't dump properly).  Theres a good deal more work to do than
what you suggest, and it would likely be something to slate for 5.3, the above
solution I've provided gives you all the same abilities, as long as you
configure kdump appropriately (which our configuration guides strongly suggest
you do anyway, given that your can't rely on the integrity of the root file
system after a crash).

If you'd like to open an RFE to add nfs root support to kdump, I'll happily work
on it, but the above patch will fix the bug you're reporting here.

Comment 4 Neil Horman 2007-12-18 22:12:25 UTC
Created attachment 289946 [details]
correct patch

Comment 5 Andrew Hecox 2008-02-15 19:43:34 UTC
This patch resolved the issue for us; in our case, we were dump'ing via ssh.
Without the patch, ifup failed due to the duplicate entry in (the initrd's)
/etc/network/interfaces file; with the patch, the kdump worked correctly.

Any timeline for inclusion? I realize the OP wanted to dump to NFS via the file
path, but simply having any working mechanism, as this patch provides, is a much
higher priority for the customers we're working with.

Thanks.

Comment 6 Neil Horman 2008-02-16 01:54:42 UTC
5.3 is the timeframe

Comment 7 Andrew Hecox 2008-02-18 13:13:29 UTC
Thanks.

Comment 8 Neil Horman 2008-03-07 19:08:29 UTC
*** Bug 368981 has been marked as a duplicate of this bug. ***

Comment 10 Neil Horman 2008-04-14 16:41:53 UTC
Um, why did IBM close this?  I haven't fixed it yet.

Comment 11 Sam Knuth 2008-04-14 16:45:24 UTC
Neil - I'm sorry, the IT was actually opened for a separate issue (system was
crashing) and we uncovered the kdump problem as a result of that. So, IBM is
closing the original issue (the project has moved on - they just aren't going to
be hitting it anymore). We still want the kdump thing fixed but that was never
the intent of the issue on the customer side.

Do you need an IT opened? If so, I can ask the IBM tam to do that.

Comment 12 Neil Horman 2008-04-14 16:49:55 UTC
Its fine, I don't need it, I just didn't want IBM thinking this problem was
solved only to find out it wasn't sometime later.

Comment 13 Kevin Krafthefer 2008-05-22 16:05:13 UTC
This RFE has been reviewed during the RHEL RFE review
with Red Hat product management. This request has been *tentatively* approved
for inclusion
in the next update. This decision is not final and still pends further
technical review and scoping by Red Hat development engineering.

Comment 14 Neil Horman 2008-05-27 16:55:38 UTC
Created attachment 306796 [details]
patch to enable transparent NFS root on kdump

Ok, this patch includes the patch from comment 4 which removes the old nfs root
generation code, and add what I think is appropriate logic to detect and mount
NFS root devices.  I don't have an appropriate system set up here, but if you
could test it and provide your thumbs up, I'll get it in for 5.3.  Thanks!

Comment 15 Scott Moser 2008-05-27 17:30:49 UTC
Adding brad to CC.  He is the on-site representative at this point.  I no longer
am working on this.


Comment 16 IBM Bug Proxy 2008-06-03 12:48:57 UTC
------- Comment From jroth.com 2008-06-03 08:44 EDT-------
I was able to patch mkdumprd even if the patch couldn't be cleanly applied to
the latest src.rpm version: kexec-tools-1.102pre-21.el5.src.rpm

[root@localhost ~]# touch /etc/kdump.conf
[root@localhost ~]# /etc/init.d/kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):

/etc/kdump.conf
Rebuilding /boot/initrd-2.6.18-92.el5kdump.img
ls: /etc/ld.so.conf.d/*: No such file or directory
awk: cmd. line:1: {print $5
awk: cmd. line:1:          ^ unexpected newline or end of string
Starting kdump:                                            [  OK  ]

[root@localhost ~]# /etc/init.d/kdump status
Kdump is operational

Comment 17 Neil Horman 2008-06-03 14:42:11 UTC
Created attachment 308244 [details]
new patch to enable transparent NFS root on kdump

I'vechecked in several other changes since the 5.2 release, so it probably does
need some massaging into -21.el5.  I've added the missing bracket for you to
continue testing.  Thanks

Comment 18 IBM Bug Proxy 2008-06-03 16:00:51 UTC
------- Comment From jroth.com 2008-06-03 11:58 EDT-------
looks good now, but couldn't the warning be suppressed by using "ls
/etc/ld.so.conf.d/" instead of "ls /etc/ld.so.conf.d/*" ??

[root@localhost ~]# service kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):

/etc/kdump.conf
Rebuilding /boot/initrd-2.6.18-92.el5kdump.img
ls: /etc/ld.so.conf.d/*: No such file or directory
Starting kdump:                                            [  OK  ]

[root@localhost ~]# service kdump status
Kdump is operational

Comment 19 Neil Horman 2008-06-03 16:59:48 UTC
The warning is fixed as one of the updates I did after the 5.2 release.

As for testing, I see you managed to start the kdump service.  Have you crashed
the system to see if it by default properly mounts the root file system via NFS?

Comment 20 IBM Bug Proxy 2008-06-04 09:56:52 UTC
------- Comment From jroth.com 2008-06-04 05:54 EDT-------
Yes, I tried but unfortunately the same problem as in bug #426293 occurs. The
system is rebooting right after the message "Freeing unused kernel memory: 320k
freed"

Right now I'm building the latest kernel with sys_open commented out as
suggested in bug #426293

Comment 21 Neil Horman 2008-06-04 10:54:31 UTC
Ok, copy that.  If you can get this to work with the sys_open commented out, I
can  commit this. 

Comment 22 IBM Bug Proxy 2008-06-04 15:25:15 UTC
Created attachment 308358 [details]
console log of booting the kdump kernel after a triggered crash

I was able to boot the kdump kernel until the tg3 network module is being
loaded. See bug #426293

Comment 23 Neil Horman 2008-06-04 15:29:54 UTC
Ok, thats a start.
Unfortunately only booting to that point isn't enough to verify that this patch
works.  I've tested in non-nfs root environments, so we should be safe against
regressions here, but I'd really rather confirm that this works properly in the
nfs case.  We'll just have to tackle bz426293

Comment 24 IBM Bug Proxy 2008-08-03 04:35:24 UTC
Description of problem:

On my QS21 set up to boot with NFS root, mkdumprd spits an error message to
stderr.  It exits with 0 (success) but because 'handlenetdev' is never actually
called, its unlikely to work.

[root@ibm-qs21-01 ~]# rpm -q kexec-tools
kexec-tools-1.101-194.4.el5
[root@ibm-qs21-01 ~]# grep nfs /etc/fstab
192.168.79.232:/exports/ibm-qs21-01 /    nfs     defaults        1 1
[root@ibm-qs21-01 ~]# touch /etc/kdump.conf
[root@ibm-qs21-01 ~]# /etc/init.d/kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):
/etc/kdump.conf
Rebuilding /boot/initrd-2.6.18-54.el5.rhel5u2.sm4kdump.img
/sbin/mkdumprd: line 1425: handle_netdev: command not found
Starting kdump:[  OK  ]
[root@ibm-qs21-01 ~]# service kdump status
Kdump is operational

The system is 'ibm-qs21-01' if you need access to it, or want me to test
anything, please feel free to ask.


I'm confused.  The problem described above is that mkdumprd tries to call
'handle_netdev'.  handle_netdev is not a program (or function).  However,
'handlenetdev' is a function defined in the program.

So the fix is:  sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd

Aside from that, I expected kdump to correctly work on nfs root with no
configuration.  My expectation was that it would go about the same path it does
on other root file system types:
- determine the filesystem type and what modules are needed to mount it
- create initrd that  mounts the root filesystem
- dump to /var/crash/date...

I guess that might be asking a bit more than you'd like to provide


"So the fix is:  sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd"
Not really, I've not tested the code that gets generated if you do that.  Given
the extent of the other changes that have been made to mkdumprd from mkinitrd,
the safest (best) thing to do is to remove the nfs root setup code  entirely, as
I'm positive it doesn't generate an init script that works with the rest of the
current dump capture setup in the initramfs that mkdumprd generates.  If you
want to dump to your nfs rootfile system, the solution is direct: add nfs <nfs
server:/path spec> to /etc/kdump.conf.

"I guess that might be asking a bit more than you'd like to provide"
No, I'd love to provide transparent dumping to nfs roots.  I agree that we
should handle an nfs root filesystem just like any other root file system, but
to do that is not as simple as you make it out to be with your proposed fix
above (as your bz 368981 indicates, fixing it your way produces an initramfs
that loads, but doesn't dump properly).  Theres a good deal more work to do than
what you suggest, and it would likely be something to slate for 5.3, the above
solution I've provided gives you all the same abilities, as long as you
configure kdump appropriately (which our configuration guides strongly suggest
you do anyway, given that your can't rely on the integrity of the root file
system after a crash).

If you'd like to open an RFE to add nfs root support to kdump, I'll happily work
on it, but the above patch will fix the bug you're reporting here.



This patch resolved the issue for us; in our case, we were dump'ing via ssh.
Without the patch, ifup failed due to the duplicate entry in (the initrd's)
/etc/network/interfaces file; with the patch, the kdump worked correctly.

Any timeline for inclusion? I realize the OP wanted to dump to NFS via the file
path, but simply having any working mechanism, as this patch provides, is a much
higher priority for the customers we're working with.

Thanks.


5.3 is the timeframe



Thanks.


*** Bug 368981 has been marked as a duplicate of this bug. ***


Um, why did IBM close this?  I haven't fixed it yet.


Neil - I'm sorry, the IT was actually opened for a separate issue (system was
crashing) and we uncovered the kdump problem as a result of that. So, IBM is
closing the original issue (the project has moved on - they just aren't going to
be hitting it anymore). We still want the kdump thing fixed but that was never
the intent of the issue on the customer side.

Do you need an IT opened? If so, I can ask the IBM tam to do that.


Its fine, I don't need it, I just didn't want IBM thinking this problem was
solved only to find out it wasn't sometime later.


This RFE has been reviewed during the RHEL RFE review
with Red Hat product management. This request has been *tentatively* approved
for inclusion
in the next update. This decision is not final and still pends further
technical review and scoping by Red Hat development engineering.


Adding brad to CC.  He is the on-site representative at this point.  I no longer
am working on this.


I was able to patch mkdumprd even if the patch couldn't be cleanly applied to
the latest src.rpm version: kexec-tools-1.102pre-21.el5.src.rpm

[root@localhost ~]# touch /etc/kdump.conf
[root@localhost ~]# /etc/init.d/kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):

/etc/kdump.conf
Rebuilding /boot/initrd-2.6.18-92.el5kdump.img
ls: /etc/ld.so.conf.d/*: No such file or directory
awk: cmd. line:1: {print $5
awk: cmd. line:1:          ^ unexpected newline or end of string
Starting kdump:                                            [  OK  ]

[root@localhost ~]# /etc/init.d/kdump status
Kdump is operational

looks good now, but couldn't the warning be suppressed by using "ls
/etc/ld.so.conf.d/" instead of "ls /etc/ld.so.conf.d/*" ??

[root@localhost ~]# service kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):

/etc/kdump.conf
Rebuilding /boot/initrd-2.6.18-92.el5kdump.img
ls: /etc/ld.so.conf.d/*: No such file or directory
Starting kdump:                                            [  OK  ]

[root@localhost ~]# service kdump status
Kdump is operational


The warning is fixed as one of the updates I did after the 5.2 release.

As for testing, I see you managed to start the kdump service.  Have you crashed
the system to see if it by default properly mounts the root file system via NFS?

Yes, I tried but unfortunately the same problem as in bug #426293 occurs. The
system is rebooting right after the message "Freeing unused kernel memory: 320k
freed"

Right now I'm building the latest kernel with sys_open commented out as
suggested in bug #426293


Ok, copy that.  If you can get this to work with the sys_open commented out, I
can  commit this.


Ok, thats a start.
Unfortunately only booting to that point isn't enough to verify that this patch
works.  I've tested in non-nfs root environments, so we should be safe against
regressions here, but I'd really rather confirm that this works properly in the
nfs case.  We'll just have to tackle bz426293

Comment 25 IBM Bug Proxy 2008-08-03 04:35:37 UTC
Created attachment 313279 [details]
patch to remove nfs root detection from mkdumprd

Comment 26 IBM Bug Proxy 2008-08-03 04:35:44 UTC
Created attachment 313280 [details]
correct patch

Comment 27 IBM Bug Proxy 2008-08-03 04:35:51 UTC
Created attachment 313281 [details]
patch to enable transparent NFS root on kdump

Comment 28 IBM Bug Proxy 2008-08-03 04:35:59 UTC
Created attachment 313282 [details]
new patch to enable transparent NFS root on kdump

Comment 29 IBM Bug Proxy 2008-08-03 04:36:05 UTC
Created attachment 313283 [details]
console log of booting the kdump kernel after a triggered crash

Comment 30 Neil Horman 2008-08-03 16:09:44 UTC
What just happened here? All the comments of the bug were just copied, all the attachment clones, and the state changed back to assigned.  There is no new information here, yet, I seem unable to reassign the bug to needinfo state.

Comment 32 IBM Bug Proxy 2008-11-13 13:33:45 UTC
Hello Red Hat,
fyi ... with the RHEL5.3 Snapshot 2 I am now closing this bugzilla and we will
address remaining issues in the RHEL5.4 timeframe based on the RHEL5.3
deliverable.
Please keep me informed in case of any questions.
Thanks for your support.

Comment 33 Neil Horman 2009-05-17 18:03:05 UTC
does that mean I should close this bz as well? Or did you want to keep it open still?


Note You need to log in before you can comment on or make changes to this bug.