Description of problem: On one of our HP Proliant DL585s I'm seeing this output when booting the installer kernel: NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 Kernel panic - not syncing: NetLabel: failed to initialize properly (-17) This is happening with every x86_64 RHEL5 tree I've tested so far (0927-1012), but only on this system. All other trees, including i386 RHEL5, install normally, and I'm not seeing this behavior on another DL585 in the lab. Version-Release number of selected component (if applicable): RHEL5-Server-20060927.0 (kernel 2.6.18-1.2702.el5) RHEL5-Server-20061006.2 (kernel 2.6.18-1.2714.el5) RHEL5-Server-20061010.nightly (kernel 2.6.18-1.2717.el5) RHEL5-Server-20061012.nightly (kernel 2.6.18-1.2717.el5) How reproducible: 100% Steps to Reproduce: 1. Boot a x86_64 RHEL5 install on dl585-02.rhts.boston.redhat.com Actual results: Kernel panic Expected results: Booting into anaconda Additional info: Console output from test runs available at: RHEL5-Server-20060927.0 http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=870766 RHEL5-Server-20061006.2 http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=870556 RHEL5-Server-20061010.nightly http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=870019
Marking as a Beta blocker. Won't get testing on a signficant platform. Section 6c of the release criteria. http://intranet.corp.redhat.com/ic/intranet/RHEL500Beta2ReleaseCriteria#HardwareCert
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering. This request is not yet committed for inclusion in release.
Netlabel should not be active, I think.
*** Bug 210191 has been marked as a duplicate of this bug. ***
Chip, was this actually on dl585-02.rhts.boston.redhat.com or do you have a second physical machine that had the same problem?
See https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=208445 for a possibly related issue.
(In reply to comment #5) > Chip, was this actually on dl585-02.rhts.boston.redhat.com or do you have a > second physical machine that had the same problem? This happened to me on dl585-03.lab.boston.redhat.com, so it is a second physical machine with the identical problem. Chip
I think this is likely a duplicate of bug 208445; maybe someone in RHTS can test a 2.6.18-1.2725.el5 or later kernel and see if the bug is fixed? Chip
Just tried 2.6.18-1.2727.el5 and ended with NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 Kernel panic - not syncing: NetLabel: failed to initialize properly (-17) So it appears to not be fixed.
NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: About to call netlbl_domhsh_init NetLabel: returned from netlbl_domhsh_init with ret_val:0 NetLabel: About to call netlbl_netlink_init NetLabel: returned from netlbl_netlink_init with ret_val:0 NetLabel: About to call netlbl_unlabel_defconf NetLabel: returned from netlbl_netlink_init with ret_val:-17 Kernel panic - not syncing: NetLabel: failed to initialize properly (-17) Did a little adding of printk's (and apparently messed the last one up) and it looks like netlbl_unabel_defconf is where the EEXIST comes from. I'll start putting them deeper down tomorrow. And look at the thread http://marc.theaimsgroup.com/?l=linux-netdev&m=116016517332266&w=2 which might have some remote chance of applicability.
NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: About to call netlbl_domhsh_init NetLabel: returned from netlbl_domhsh_init with ret_val:0 NetLabel: About to call netlbl_netlink_init NetLabel: returned from netlbl_netlink_init with ret_val:0 NetLabel: About to call netlbl_unlabel_defconf NetLabel: netlbl_unlabel_defconf: about to call netlbl_domhsh_add_default NetLabel: netlbl_domhsh_add: entry->type:5 NETLBL_NLTYPE_UNLABELED:5 NetLabel: netlbl_domhsh_add: entry->domain IS null NetLabel: netlbl_domhsh_add: rcu_dereference(netlbl_domhsh_def) != NULL NetLabel: netlbl_domhsh_def: ffffffffffffffff and &netlbl_domhsh_def: ffffffff808d6580 NetLabel: netlbl_unlabel_defconf: return from netlbl_domhsh_add_default with ret_val:-17 NetLabel: returned from netlbl_unlabel_defconf with ret_val:-17 Kernel panic - not syncing: NetLabel: failed to initialize properly (-17)
I'm not going to claim to understand at ALL, but the PAGE_ALIGN changed to bad_addr as talked about in the netdev thread seem to be allowing the machine to boot. This is going to take some research as all of the addresses that it looks like bad_addr are fixing up as so far away from 808d6580......
Just for reference, I decided to include the printk's I got from bad_addr which prints out the 'fixed-up' *addrp bad_addr: fixing up addr < 0x8000 *addrp: 0000000000008000 bad_addr: fixing up last >= table_start<<PAGE_SHIFT && addr < table_end<<PAGE_SHIFT *addrp: 0000000000029000 bad_addr: fixing up last >= 640*1024 && addr < 1024*1024 *addrp: 0000000000100000 bad_addr: fixing up last >= __pa_symbol(&_text) && last < __pa_symbol(&_end) *addrp: 0000000000ad7000
Intelligent conversation about the problem happens here: http://marc.theaimsgroup.com/?l=linux-netdev&m=116014524814284&w=2
seems like Vivek's proposed patch on lkml on 10/6 fixes the problem.
Created attachment 138605 [details] fix for bss data corruption
post for review on 10/16.
in kernel-2.6.18-1.2728.el5
We've now had multiple successful RHTS installs of the 1020.1 tree on dl585-02. Marking this as VERIFIED/CURRENTRELEASE.
Actually running through the Verified state.