Bug 680278

Summary: crash in cache_alloc_refill when driver loaded from initrd - invalid opcode: 0000
Product: Red Hat Enterprise Linux 6 Reporter: jayamohank <jayamohan.kallickal>
Component: kernelAssignee: Dave Anderson <anderson>
Status: CLOSED WORKSFORME QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: coughlan, mchristi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-21 21:38:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description jayamohank 2011-02-24 21:45:10 UTC
Description of problem: If  the driver is part of the initrd, there is a kernel crash as seen in the attachment. The crash happens in scsi_alloc_host which is the very first major call made.

Also, the crash does not happen when the system has booted up and insmod is done. 


Version-Release number of selected component (if applicable):
kernel-2.6.32-114.0.1.el6

How reproducible: Every time


Steps to Reproduce:
1.install the kernel
2. reboot
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 jayamohank 2011-02-24 21:47:07 UTC
Red Hat Enterprise Linux Server release 6.0 (Santiago)
Kernel 2.6.32-95.el6.scsiall2.x86_64 on an x86_64

DellSlot12.lab.sj.emulex.com login: bnx2i: dev eth1 does not support iSCSI
bnx2i: dev eth0 does not support iSCSI
No Boot Session for this pci_func,session Hndl = -1
scsi host9: beiscsi_dev_probe-No Boot Devices !!!!!
------------[ cut here ]------------
kernel BUG at mm/slab.c:3067!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/class
CPU 2 
Modules linked in: be2iscsi(+) iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon sr_mod cdrom sg dcdbas iTCO_wdt iTCO_vendor_support i7core_edac edac_core be2net bnx2 ext3 jbd mbcache sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_transport_iscsi]

Modules linked in: be2iscsi(+) iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon sr_mod cdrom sg dcdbas iTCO_wdt iTCO_vendor_support i7core_edac edac_core be2net bnx2 ext3 jbd mbcache sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_transport_iscsi]
Pid: 5762, comm: modprobe Not tainted 2.6.32-95.el6.scsiall2.x86_64 #1 PowerEdge M610
RIP: 0010:[<ffffffff8114f3c4>]  [<ffffffff8114f3c4>] cache_alloc_refill+0x1e4/0x240
RSP: 0018:ffff8801a0df1b78  EFLAGS: 00010082
RAX: 0000000000000004 RBX: ffff8801afd50540 RCX: 00000000ffffffff
RDX: 0000000000000168 RSI: 0000000000000000 RDI: ffff8801afc21a00
RBP: ffff8801a0df1bd8 R08: 0000000000000246 R09: 0000000000000040
R10: 0000000000000003 R11: 0000000000000000 R12: ffff8801a8dc51c0
R13: ffff8801afc219c0 R14: 0000000000000004 R15: ffff8801a5dc5e00
FS:  00007fa5d2f44700(0000) GS:ffff880031c40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000020f8108 CR3: 00000001a8e65000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 5762, threadinfo ffff8801a0df0000, task ffff8801a509d560)
Stack:
 ffff8801a0df1bb8 0000000000000093 ffff8801afc21a00 000492d0813f37dc
<0> ffff8801afc219e0 ffff8801afc219d0 0000000000000001 00000000000022e8
<0> 0000000000000010 00000000000080d0 ffff8801afd50540 ffffffff8133ebb0
Call Trace:
 [<ffffffff8133ebb0>] ? scsi_host_alloc+0x40/0x320
 [<ffffffff81150269>] __kmalloc+0x1a9/0x220
 [<ffffffff8133ebb0>] scsi_host_alloc+0x40/0x320
 [<ffffffff811d9390>] ? sysfs_ilookup_test+0x0/0x20
 [<ffffffffa0209a97>] iscsi_host_alloc+0x27/0xe0 [libiscsi]
 [<ffffffffa00f2ea2>] beiscsi_dev_probe+0x11d/0xd7b [be2iscsi]
 [<ffffffff811da68c>] ? sysfs_add_one+0x2c/0x130
 [<ffffffff81271817>] local_pci_probe+0x17/0x20
 [<ffffffff81272a01>] pci_device_probe+0x101/0x120
 [<ffffffff8132a0a2>] ? driver_sysfs_add+0x62/0x90
 [<ffffffff8132a240>] driver_probe_device+0xa0/0x2a0
 [<ffffffff8132a4eb>] __driver_attach+0xab/0xb0
 [<ffffffff8132a440>] ? __driver_attach+0x0/0xb0
 [<ffffffff813294a4>] bus_for_each_dev+0x64/0x90
 [<ffffffff81329fde>] driver_attach+0x1e/0x20
 [<ffffffff813298e0>] bus_add_driver+0x200/0x300
 [<ffffffff8132a816>] driver_register+0x76/0x140
 [<ffffffff81272c66>] __pci_register_driver+0x56/0xd0
 [<ffffffffa00fb000>] ? beiscsi_module_init+0x0/0xa8 [be2iscsi]
 [<ffffffffa00fb000>] ? beiscsi_module_init+0x0/0xa8 [be2iscsi]
 [<ffffffffa00fb068>] beiscsi_module_init+0x68/0xa8 [be2iscsi]
 [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
 [<ffffffff810a7a6f>] sys_init_module+0xdf/0x250
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: 89 ff e8 a0 48 11 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00 00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fÿMounting proc filesystem
Mounting sysfs filesystem

Comment 3 Dave Anderson 2011-02-25 15:35:31 UTC
It's not entirely clear to me why the output is showing a login
prompt at the beginning:

  Red Hat Enterprise Linux Server release 6.0 (Santiago)
  Kernel 2.6.32-95.el6.scsiall2.x86_64 on an x86_64

  DellSlot12.lab.sj.emulex.com login: bnx2i: dev eth1 does not support iSCSI
  bnx2i: dev eth0 does not support iSCSI
  No Boot Session for this pci_func,session Hndl = -1
  scsi host9: beiscsi_dev_probe-No Boot Devices !!!!!
  ------------[ cut here ]------------
  kernel BUG at mm/slab.c:3067!
  invalid opcode: 0000 [#1] SMP 
  last sysfs file: /sys/devices/pci0000

But I'm presuming that all of the output from "bnx2i: dev eth1"
and beyond is captured console output of a crash that occurred
during boot?

The crash is happening when scsi_alloc_host() was making this
allocation:

        shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);

I'm not sure what the "privsize" would be, but given that the 
Scsi_Host struture itself is 1504 bytes, I'm guessing that it would
come from the size-2048 slab cache and that the slab structure
associated with it woud be "off-slab".

Anyway, the kzallod() eventually reaches cache_alloc_refill(), 
where it BUG's here:

               /*
                 * The slab was either on partial or free list so
                 * there must be at least one object available for
                 * allocation.
                 */
                BUG_ON(slabp->inuse >= cachep->num);

From the registers, the "slabp->inuse" value is in the lower
32 bits of the RCX register or ffffffff, which is a nonsensical
value.  That inuse value tracks the number of objects within a
slab page (or pages) that are currently in use.  It's initialized
to zero and then incremented or decremented as objects in the slab
are allocated/deallocated:
  
  struct slab {
      struct list_head list;
      long unsigned int colouroff;
      void *s_mem;
      unsigned int inuse;
      kmem_bufctl_t free;
      short unsigned int nodeid;
  }

Anyway to get to ffffffff either:

(1) the slab structure itself was corrupted by some entity 
    overwriting it, or 
(2) if it got decremented to -1 here:

static void slab_put_obj(struct kmem_cache *cachep, struct slab *slabp,
                                void *objp, int nodeid)
{
        unsigned int objnr = obj_to_index(cachep, slabp, objp);

#if DEBUG
        /* Verify that the slab belongs to the intended node */
        WARN_ON(slabp->nodeid != nodeid);

        if (slab_bufctl(slabp)[objnr] + 1 <= SLAB_LIMIT + 1) {
                printk(KERN_ERR "slab: double free detected in cache "
                                "'%s', objp %p\n", cachep->name, objp);
                BUG();
        }
#endif
        slab_bufctl(slabp)[objnr] = slabp->free;
        slabp->free = objnr;
        slabp->inuse--;
}

DEBUG is not turned on, so the double-free detection is not there.

Given all of the error messages that preceded the final BUG(),
I don't know which of the failure modes would be most likely.
Without a dump, we don't know the contents of the rest of the
slab data structure to see if they are also nonsensical.

I don't know where the "2.6.32-95.el6.scsiall2.x86_64" kernel came
from, but maybe it would be helpful to put a printk in slab_put_obj()
above to at least verify that the slabp->inuse-- caused it?  And
if so, dump out the other structure contents?

Again without a dump, it's hard to pinpoint anything else.  Other than 
that, I don't know what else to suggest, as I declare complete 
cluelessness w/respect to initrd or scsi driver issues.

Comment 4 Dave Anderson 2011-02-25 18:24:44 UTC
I would suggest the first thing to do would be to run
this "2.6.32-95.el6.scsiall2" with its debug kernel.

If kernel-debug-6.32-95.el6.scsiall2.x86_64.rpm was built
with the existing kernel, then just install that that one
and see what happens.

Otherwise it would have to be rebuilt without turning off
the debug kernel line in the kernel.spec file.

Anyway, the debug kernel has CONFIG_SLAB_DEBUG turned on,
and may catch the earlier slab-cache offense.

Comment 5 Tom Coughlan 2011-02-25 19:17:20 UTC
And while you are at it, set up to capture a kdump of the failure.

Comment 6 jayamohank 2011-02-25 19:20:53 UTC
I already have the kdump vmcore. How can I send it to you.

It's about 60 MB compressed

Comment 7 Dave Anderson 2011-02-25 19:25:20 UTC
> I already have the kdump vmcore. How can I send it to you.

Actually, I don't know -- usually vmcores come via GSS intermediary.
But since this isn't an issue-tracker BZ, I don't know.

Tom?

Comment 8 jayamohank 2011-03-01 20:54:08 UTC
I have put the vmcore file in ftp.serverengines.com.

I will send the username/passwd in a private email

Comment 9 Dave Anderson 2011-03-01 21:16:54 UTC
OK, thanks, I'm downloading the vmcore...

However, the vmcore is useless without the associated vmlinux file.

As I mentioned before, I have no clue where you got this "2.6.32-95.el6.scsiall2"
kernel that you're running?

Can you put the kernel-debuginfo-2.6.32-95.el6.scsiall2.x86_64.rpm and
kernel-debuginfo-common-x86_64-2.6.32-95.el6.scsiall2.x86_64.rpm files
in the same place?

Comment 10 jayamohank 2011-03-01 22:03:54 UTC
Yep. I forgot to upload the vmlnux. Only worry is I have recompiled the kernel but don't remember making any changes other than my driver for debugging.

I do not have the rpm's you are referring to. I built this from src rpm.

Pl let me know if you need any further info

Comment 11 Dave Anderson 2011-03-01 22:38:17 UTC
If you built it from the src rpm, then the RPMS subdirectory should contain
the debuginfo rpms.  (That is, unless you specifically turned off the
"with-debuginfo" %define in the kernel.spec file.)  If there's any need 
to debug any kernel module code, then the full kernel-debuginfo package
is required.

Anyway, I see that this is a different crash path from the one that 
whose log is cut-and-pasted above:

crash> bt
PID: 8883   TASK: ffff88012783cb00  CPU: 0   COMMAND: "work_for_cpu"
 #0 [ffff8801298fb9d0] machine_kexec at ffffffff8103080b
 #1 [ffff8801298fba30] crash_kexec at ffffffff810b58b2
 #2 [ffff8801298fbb00] oops_end at ffffffff814daa80
 #3 [ffff8801298fbb30] die at ffffffff8100f2fb
 #4 [ffff8801298fbb60] do_trap at ffffffff814da374
 #5 [ffff8801298fbbc0] do_invalid_op at ffffffff8100ceb5
 #6 [ffff8801298fbc60] invalid_op at ffffffff8100bf5b
    [exception RIP: cache_alloc_refill+0x1e4]
    RIP: ffffffff81158b64  RSP: ffff8801298fbd10  RFLAGS: 00010086
    RAX: 0000000000000004  RBX: ffff88012fd50540  RCX: 00000000fffffffd
    RDX: 0000000000000115  RSI: 0000000000000000  RDI: ffff88012fc21a00
    RBP: ffff8801298fbd70   R8: 0000000000000246   R9: 0000000000000040
    R10: 0000000000000002  R11: 0000000000000000  R12: ffff88012fe96940
    R13: ffff88012fc219c0  R14: 0000000000000004  R15: ffff880127a73500
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff8801298fbd68] scsi_host_alloc at ffffffff8134d7e0
 #8 [ffff8801298fbdc8] scsi_host_alloc at ffffffff8134d7e0
 #9 [ffff8801298fbdf8] iscsi_host_alloc at ffffffffa04d2a97 [libiscsi]
#10 [ffff8801298fbe28] beiscsi_dev_probe at ffffffffa027eeb2 [be2iscsi]
#11 [ffff8801298fbeb8] local_pci_probe at ffffffff8127e4a7
#12 [ffff8801298fbec8] do_work_for_cpu at ffffffff81088598
#13 [ffff8801298fbee8] kthread at ffffffff8108d476
#14 [ffff8801298fbf48] kernel_thread at ffffffff8100c1ca
crash>

It runs into the similar kind of slab corruption, though, where the
"privsize" tacked onto the end of the Scsi_Host structure must be
pretty big, because the slab-size that has a bad inuse counter 
of -3 (4294967293 unsigned) is the size-16384 slab:

crash> kmem -s
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
... [ cut ] ...
ffff88012fd705c0 size-32768             32768          5         5      5    32k
ffff88012fd60580 size-16384(DMA)        16384          0         0      0    16k
kmem: size-16384: partial list: slab: ffff880127a73500  bad inuse counter: 4294967293
ffff88012fd50540 size-16384             16384          2         2      2    16k
ffff88012fd40500 size-8192(DMA)          8192          0         0      0     8k
ffff88012fd304c0 size-8192               8192         65        65     65     8k
...

The whole slab structure is not corrupt, so it looks like a multiple-free
situation:

crash> slab ffff880127a73500
struct slab {
  list = {
    next = 0xffff880129c83dc0, 
    prev = 0xffff88012fc219c0
  }, 
  colouroff = 0x0, 
  s_mem = 0xffff8801298cc000, 
  inuse = 0xfffffffd, 
  free = 0x0, 
  nodeid = 0x0
}
crash>

I look at this some more tomorrow, but you really should run the
kernel-debuginfo kernel which has CONFIG_SLAB_DEBUG turned on.

If you left the "with_debug" %define in the kernel-spec file as is,
then that kernel should have been built, and its rpm files can
be found in the RPM subdirectory.

Note the difference in the kernel.spec file:

This one builds the kernel-debug kernel, which has CONFIG_SLAB_DEBUG
turned on:

%define with_debug     %{?_without_debug:     0} %{?!_without_debug:     1}

This one, for each kernel built, will also build the associated
kernel-debuginfo packages:

%define with_debuginfo %{?_without_debuginfo: 0} %{?!_without_debuginfo: 1}

Comment 12 Dave Anderson 2011-03-02 16:37:02 UTC
With respect to the kernel.spec file settings that I suggested above,
I guess I'm even more confused as to the genesis of this second kernel:

> Pid: 8883, comm: work_for_cpu Not tainted 2.6.32 #1 PowerEdge M805

where the kernel version string is:

Linux version 2.6.32 (root.sj.emulex.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1
 SMP Wed Feb 23 14:57:50 PST 2011

Was this kernel built from a generic 2.6.32 tree, i.e., not even built
from the "2.6.32-95.el6" base like the first one was?

I can only suggest things if you follow an rpmbuild procedure that
starts with a RHEL6 src.rpm, where you do something like this:

  # rpm -ivh kernel-2.6.32-71.el6.src.rpm
  # cd $HOME/rpmbuild/SPECS

Then modify the SPECS/kernel.spec in two places:

  (1) Create a .buildid %define as it suggests, and
  (2) It's probably also worth changing "with_firmware" to "1": 

# kernel-firmware
%define with_firmware  %{?_with_firmware:     1} %{?!_with_firmware:     0}

Then lay down the source tree:

  # rpm -bp kernel.spec

Then go to $HOME/rpmbuild/BUILD/kernel-2.6.32/linux-2.6.32.x86_64.

For each file that you modify, first copy the original file to a
"file.orig" file.  For each file that you create from scratch, 
create an empty "file.orig" file.  Then make your changes.

Then go to $HOME/rpmbuild/BUILD/kernel-2.6.32, and enter:

  # gendiff linux-2.6.32.x86_64 .orig > ../../SOURCES/linux-kernel-test.patch

Sanity-check the linux-kernel-test.patch, and then build everything;

  # cd $HOME/rpmbuild/SPECS
  # rpmbuild -ba kernel.spec

That will build the base kernel, the kernel-debug kernel, the 
kernel-firmware and debuginfo rpms (and a couple others).  
"rpm -ivh" the base kernel, the kernel-debug kernel, and the 
kernel-firmware rpms.

Reboot into the base kernel, and if the problem still occurs, 
then reboot with the kernel-debug kernel -- which will hopefully
catch the first double-free offender.

That's about all I can suggest at this point.

Comment 13 jayamohank 2011-03-21 21:38:00 UTC
I have tried to repro this bug using RHEL6.1 Alpha release and haven't been able to. So, closing this bug