Description of problem: On my QS21 set up to boot with NFS root, mkdumprd spits an error message to stderr. It exits with 0 (success) but because 'handlenetdev' is never actually called, its unlikely to work. [root@ibm-qs21-01 ~]# rpm -q kexec-tools kexec-tools-1.101-194.4.el5 [root@ibm-qs21-01 ~]# grep nfs /etc/fstab 192.168.79.232:/exports/ibm-qs21-01 / nfs defaults 1 1 [root@ibm-qs21-01 ~]# touch /etc/kdump.conf [root@ibm-qs21-01 ~]# /etc/init.d/kdump restart Stopping kdump:[ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.18-54.el5.rhel5u2.sm4kdump.img /sbin/mkdumprd: line 1425: handle_netdev: command not found Starting kdump:[ OK ] [root@ibm-qs21-01 ~]# service kdump status Kdump is operational The system is 'ibm-qs21-01' if you need access to it, or want me to test anything, please feel free to ask.
Created attachment 289925 [details] patch to remove nfs root detection from mkdumprd Ugh, this is a holdover from when we forked mkdumprd from mkinitrd. kexec shouldn't care in the least about nfs root. Theres just some leftover code that tries to automatically mount the root file system over nfs. This patch should fix it. Mind you, this does require that you explictly configure a dump target in /etc/kdump.conf
I'm confused. The problem described above is that mkdumprd tries to call 'handle_netdev'. handle_netdev is not a program (or function). However, 'handlenetdev' is a function defined in the program. So the fix is: sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd Aside from that, I expected kdump to correctly work on nfs root with no configuration. My expectation was that it would go about the same path it does on other root file system types: - determine the filesystem type and what modules are needed to mount it - create initrd that mounts the root filesystem - dump to /var/crash/date... I guess that might be asking a bit more than you'd like to provide
"So the fix is: sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd" Not really, I've not tested the code that gets generated if you do that. Given the extent of the other changes that have been made to mkdumprd from mkinitrd, the safest (best) thing to do is to remove the nfs root setup code entirely, as I'm positive it doesn't generate an init script that works with the rest of the current dump capture setup in the initramfs that mkdumprd generates. If you want to dump to your nfs rootfile system, the solution is direct: add nfs <nfs server:/path spec> to /etc/kdump.conf. "I guess that might be asking a bit more than you'd like to provide" No, I'd love to provide transparent dumping to nfs roots. I agree that we should handle an nfs root filesystem just like any other root file system, but to do that is not as simple as you make it out to be with your proposed fix above (as your bz 368981 indicates, fixing it your way produces an initramfs that loads, but doesn't dump properly). Theres a good deal more work to do than what you suggest, and it would likely be something to slate for 5.3, the above solution I've provided gives you all the same abilities, as long as you configure kdump appropriately (which our configuration guides strongly suggest you do anyway, given that your can't rely on the integrity of the root file system after a crash). If you'd like to open an RFE to add nfs root support to kdump, I'll happily work on it, but the above patch will fix the bug you're reporting here.
Created attachment 289946 [details] correct patch
This patch resolved the issue for us; in our case, we were dump'ing via ssh. Without the patch, ifup failed due to the duplicate entry in (the initrd's) /etc/network/interfaces file; with the patch, the kdump worked correctly. Any timeline for inclusion? I realize the OP wanted to dump to NFS via the file path, but simply having any working mechanism, as this patch provides, is a much higher priority for the customers we're working with. Thanks.
5.3 is the timeframe
Thanks.
*** Bug 368981 has been marked as a duplicate of this bug. ***
Um, why did IBM close this? I haven't fixed it yet.
Neil - I'm sorry, the IT was actually opened for a separate issue (system was crashing) and we uncovered the kdump problem as a result of that. So, IBM is closing the original issue (the project has moved on - they just aren't going to be hitting it anymore). We still want the kdump thing fixed but that was never the intent of the issue on the customer side. Do you need an IT opened? If so, I can ask the IBM tam to do that.
Its fine, I don't need it, I just didn't want IBM thinking this problem was solved only to find out it wasn't sometime later.
This RFE has been reviewed during the RHEL RFE review with Red Hat product management. This request has been *tentatively* approved for inclusion in the next update. This decision is not final and still pends further technical review and scoping by Red Hat development engineering.
Created attachment 306796 [details] patch to enable transparent NFS root on kdump Ok, this patch includes the patch from comment 4 which removes the old nfs root generation code, and add what I think is appropriate logic to detect and mount NFS root devices. I don't have an appropriate system set up here, but if you could test it and provide your thumbs up, I'll get it in for 5.3. Thanks!
Adding brad to CC. He is the on-site representative at this point. I no longer am working on this.
------- Comment From jroth.com 2008-06-03 08:44 EDT------- I was able to patch mkdumprd even if the patch couldn't be cleanly applied to the latest src.rpm version: kexec-tools-1.102pre-21.el5.src.rpm [root@localhost ~]# touch /etc/kdump.conf [root@localhost ~]# /etc/init.d/kdump restart Stopping kdump: [ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.18-92.el5kdump.img ls: /etc/ld.so.conf.d/*: No such file or directory awk: cmd. line:1: {print $5 awk: cmd. line:1: ^ unexpected newline or end of string Starting kdump: [ OK ] [root@localhost ~]# /etc/init.d/kdump status Kdump is operational
Created attachment 308244 [details] new patch to enable transparent NFS root on kdump I'vechecked in several other changes since the 5.2 release, so it probably does need some massaging into -21.el5. I've added the missing bracket for you to continue testing. Thanks
------- Comment From jroth.com 2008-06-03 11:58 EDT------- looks good now, but couldn't the warning be suppressed by using "ls /etc/ld.so.conf.d/" instead of "ls /etc/ld.so.conf.d/*" ?? [root@localhost ~]# service kdump restart Stopping kdump: [ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.18-92.el5kdump.img ls: /etc/ld.so.conf.d/*: No such file or directory Starting kdump: [ OK ] [root@localhost ~]# service kdump status Kdump is operational
The warning is fixed as one of the updates I did after the 5.2 release. As for testing, I see you managed to start the kdump service. Have you crashed the system to see if it by default properly mounts the root file system via NFS?
------- Comment From jroth.com 2008-06-04 05:54 EDT------- Yes, I tried but unfortunately the same problem as in bug #426293 occurs. The system is rebooting right after the message "Freeing unused kernel memory: 320k freed" Right now I'm building the latest kernel with sys_open commented out as suggested in bug #426293
Ok, copy that. If you can get this to work with the sys_open commented out, I can commit this.
Created attachment 308358 [details] console log of booting the kdump kernel after a triggered crash I was able to boot the kdump kernel until the tg3 network module is being loaded. See bug #426293
Ok, thats a start. Unfortunately only booting to that point isn't enough to verify that this patch works. I've tested in non-nfs root environments, so we should be safe against regressions here, but I'd really rather confirm that this works properly in the nfs case. We'll just have to tackle bz426293
Description of problem: On my QS21 set up to boot with NFS root, mkdumprd spits an error message to stderr. It exits with 0 (success) but because 'handlenetdev' is never actually called, its unlikely to work. [root@ibm-qs21-01 ~]# rpm -q kexec-tools kexec-tools-1.101-194.4.el5 [root@ibm-qs21-01 ~]# grep nfs /etc/fstab 192.168.79.232:/exports/ibm-qs21-01 / nfs defaults 1 1 [root@ibm-qs21-01 ~]# touch /etc/kdump.conf [root@ibm-qs21-01 ~]# /etc/init.d/kdump restart Stopping kdump:[ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.18-54.el5.rhel5u2.sm4kdump.img /sbin/mkdumprd: line 1425: handle_netdev: command not found Starting kdump:[ OK ] [root@ibm-qs21-01 ~]# service kdump status Kdump is operational The system is 'ibm-qs21-01' if you need access to it, or want me to test anything, please feel free to ask. I'm confused. The problem described above is that mkdumprd tries to call 'handle_netdev'. handle_netdev is not a program (or function). However, 'handlenetdev' is a function defined in the program. So the fix is: sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd Aside from that, I expected kdump to correctly work on nfs root with no configuration. My expectation was that it would go about the same path it does on other root file system types: - determine the filesystem type and what modules are needed to mount it - create initrd that mounts the root filesystem - dump to /var/crash/date... I guess that might be asking a bit more than you'd like to provide "So the fix is: sed -i 's/handle_netdev/handlenetdev/' /sbin/mkdumprd" Not really, I've not tested the code that gets generated if you do that. Given the extent of the other changes that have been made to mkdumprd from mkinitrd, the safest (best) thing to do is to remove the nfs root setup code entirely, as I'm positive it doesn't generate an init script that works with the rest of the current dump capture setup in the initramfs that mkdumprd generates. If you want to dump to your nfs rootfile system, the solution is direct: add nfs <nfs server:/path spec> to /etc/kdump.conf. "I guess that might be asking a bit more than you'd like to provide" No, I'd love to provide transparent dumping to nfs roots. I agree that we should handle an nfs root filesystem just like any other root file system, but to do that is not as simple as you make it out to be with your proposed fix above (as your bz 368981 indicates, fixing it your way produces an initramfs that loads, but doesn't dump properly). Theres a good deal more work to do than what you suggest, and it would likely be something to slate for 5.3, the above solution I've provided gives you all the same abilities, as long as you configure kdump appropriately (which our configuration guides strongly suggest you do anyway, given that your can't rely on the integrity of the root file system after a crash). If you'd like to open an RFE to add nfs root support to kdump, I'll happily work on it, but the above patch will fix the bug you're reporting here. This patch resolved the issue for us; in our case, we were dump'ing via ssh. Without the patch, ifup failed due to the duplicate entry in (the initrd's) /etc/network/interfaces file; with the patch, the kdump worked correctly. Any timeline for inclusion? I realize the OP wanted to dump to NFS via the file path, but simply having any working mechanism, as this patch provides, is a much higher priority for the customers we're working with. Thanks. 5.3 is the timeframe Thanks. *** Bug 368981 has been marked as a duplicate of this bug. *** Um, why did IBM close this? I haven't fixed it yet. Neil - I'm sorry, the IT was actually opened for a separate issue (system was crashing) and we uncovered the kdump problem as a result of that. So, IBM is closing the original issue (the project has moved on - they just aren't going to be hitting it anymore). We still want the kdump thing fixed but that was never the intent of the issue on the customer side. Do you need an IT opened? If so, I can ask the IBM tam to do that. Its fine, I don't need it, I just didn't want IBM thinking this problem was solved only to find out it wasn't sometime later. This RFE has been reviewed during the RHEL RFE review with Red Hat product management. This request has been *tentatively* approved for inclusion in the next update. This decision is not final and still pends further technical review and scoping by Red Hat development engineering. Adding brad to CC. He is the on-site representative at this point. I no longer am working on this. I was able to patch mkdumprd even if the patch couldn't be cleanly applied to the latest src.rpm version: kexec-tools-1.102pre-21.el5.src.rpm [root@localhost ~]# touch /etc/kdump.conf [root@localhost ~]# /etc/init.d/kdump restart Stopping kdump: [ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.18-92.el5kdump.img ls: /etc/ld.so.conf.d/*: No such file or directory awk: cmd. line:1: {print $5 awk: cmd. line:1: ^ unexpected newline or end of string Starting kdump: [ OK ] [root@localhost ~]# /etc/init.d/kdump status Kdump is operational looks good now, but couldn't the warning be suppressed by using "ls /etc/ld.so.conf.d/" instead of "ls /etc/ld.so.conf.d/*" ?? [root@localhost ~]# service kdump restart Stopping kdump: [ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.18-92.el5kdump.img ls: /etc/ld.so.conf.d/*: No such file or directory Starting kdump: [ OK ] [root@localhost ~]# service kdump status Kdump is operational The warning is fixed as one of the updates I did after the 5.2 release. As for testing, I see you managed to start the kdump service. Have you crashed the system to see if it by default properly mounts the root file system via NFS? Yes, I tried but unfortunately the same problem as in bug #426293 occurs. The system is rebooting right after the message "Freeing unused kernel memory: 320k freed" Right now I'm building the latest kernel with sys_open commented out as suggested in bug #426293 Ok, copy that. If you can get this to work with the sys_open commented out, I can commit this. Ok, thats a start. Unfortunately only booting to that point isn't enough to verify that this patch works. I've tested in non-nfs root environments, so we should be safe against regressions here, but I'd really rather confirm that this works properly in the nfs case. We'll just have to tackle bz426293
Created attachment 313279 [details] patch to remove nfs root detection from mkdumprd
Created attachment 313280 [details] correct patch
Created attachment 313281 [details] patch to enable transparent NFS root on kdump
Created attachment 313282 [details] new patch to enable transparent NFS root on kdump
Created attachment 313283 [details] console log of booting the kdump kernel after a triggered crash
What just happened here? All the comments of the bug were just copied, all the attachment clones, and the state changed back to assigned. There is no new information here, yet, I seem unable to reassign the bug to needinfo state.
Hello Red Hat, fyi ... with the RHEL5.3 Snapshot 2 I am now closing this bugzilla and we will address remaining issues in the RHEL5.4 timeframe based on the RHEL5.3 deliverable. Please keep me informed in case of any questions. Thanks for your support.
does that mean I should close this bz as well? Or did you want to keep it open still?