Bug 500857
Summary: | [RHEL5 U4] Systems seems to hang on reboot | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Burke <jburke> | ||||
Component: | kernel | Assignee: | Andy Gospodarek <agospoda> | ||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.4 | CC: | abdulkh, abjoglek, agospoda, andriusb, cward, dzickus, gcase, jim, jjarvis, jtluka, lwang, lwoodman, maurizio.antillon, mchristi, mgahagan, nhorman, pbunyan, peterm, savbu-lnx-drivers, scofeldm, zaitcev | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | 5.4 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
URL: | http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=8127400 | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-09-02 08:14:52 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 475528 | ||||||
Attachments: |
|
Description
Jeff Burke
2009-05-14 15:09:50 UTC
I can't access any of the http://rhts.redhat.com URLs mentioned in the description. Problem on my end? Scott - no, that's correct - rhts is an internal site, walled off from the interweb. Some folks here say there could be an issue with the virtual CDROM doing some flaking out at shutdown. I'll leave it to the experts to confirm that and reply back. Actually we still believe it is a network issue only because if we do a 'ifdown eth0' before the reboot, everything works fine. This goes along the line of some other bugs we have seen because we have supposedly changed the way nic cards shutdown for 5.4. Oh yeah, one other thing, at some point I was thinking that I could just add some network and iscsi shutdown code to the /etc/init.d/halt script. I would have to load the /sbin/halt or kexec command in memory then I could stop the network and stop iscsi (nfs is mounted read only in /etc/init.d/halt so it does not need a special shutdown, right?). Would that be better? How do you load a program in memory without running it? Would I just have to create a ram based FS and run it from there? Mike, in response to your comments, following along with what you said, I see why you've done what you did in the way you did. I also think that, while its not the safest idea to rely on the network while iscsi is shutting down, I guess you need to for now. We also shouldn't be hanging during shutdown, so to that end I'm trying to figure out why that might be via a diseection of our git tree. I'll put some thought into how we might bring iscsi to a stop in a slightly safer fashion. to answer your above question, yes, the way you access a program after you have unmounted your rootfs is to make a ramdisk and do a pivot root to it I'll post here when I know more about the hang. update: so I besected the point where this started happening, and as andy thought it might it started occuring with the latest ixgbe update. Given that the last warning is telling us that the devices pci interrupt is getting disabled, I wonder if we're not processing an outstanding interrupt when this is happening and thats causing a problem. I'm going to try adding some disable_irq's prior to our call to pci_disable_device to see if that fixes us up. I think we can ignore the scsci stuff (unless you have further comment pete). I just tried to remove all the usb modules prior to halt (which stops the khub_thread that prints out the above, and we still hung Created attachment 345936 [details]
gospos proposed patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1820023 I'm building a test kernel with andys patch (minusthe msix chunk at the end on his request for testing when the cisco box is feeling better and can see link on its interfaces Neil/Gospo - can you all detail the way to reproduce this issue? I'd like to keep Cisco in the loop to test this on their side as well. Fire up the box, make sure the ixgbe network driver is installed and loaded, and reboot. It will hang during shutdown. I've tested andys patch from comment #36, and it passed several times for me. So I think we've found a winner. Interestingly REmoving the last chunk to back out the msix vector fix in the patch caused the box to continue to hang, so even thought the napi poll bug is still valid, it doesn't seem related to this hang. I'm going to test just the msix fix on sunday to confirm, but regardless, I think the whole patch needs to go in. Since its gospos patch, I'm reassigning this to him to post for 5.4 monday morning. Thanks andy! in kernel-2.6.18-152.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. ~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative. ~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~ RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching. If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative. Patch is in -158.el5. Adding SanityOnly. Tested/Verified based on conversations with Shrijeet @ Cisco. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |