Bug 722801

Summary: sd_mod: Kernel panic with call trace printed out.
Product: Red Hat Enterprise Linux 6 Reporter: Ethan <ethan.zhao>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: emilne, hejiash, oehmes, thenzl
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-17 22:54:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ethan 2011-07-18 00:55:30 UTC
Description of problem:
 After installed Redhat enterprise Linux 6.1 on a X86_64 machine, It will panic 4-5 times of 10 times boot with call trace printed out:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018       
IP: [<ffffffff8110fb2c>] mempool_alloc+0x5c/0x140                               
PGD 0                                                                           
Oops: 0000 [#1] SMP                                                             
last sysfs file: /sys/module/lpfc/initstate                                     
CPU 0                                                                           
Modules linked in: sd_mod crc_t10dif usb_storage lpfc ahci qla2xxx scsi_transport_fc scsi_tgt mpt2sas scsi_transport_sas raid_class dm_mod                      
                                                                                
Modules linked in: sd_mod crc_t10dif usb_storage lpfc ahci qla2xxx scsi_transport_fc scsi_tgt mpt2sas scsi_transport_sas raid_class dm_mod                      
Pid: 1549, comm: async/2 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 SUN FIRE X4470 M2 SERVER                                                                    
RIP: 0010:[<ffffffff8110fb2c>]  [<ffffffff8110fb2c>] mempool_alloc+0x5c/0x140
RSP: 0018:ffff883f641eb770  EFLAGS: 00010002
RAX: ffff883f64ffb540 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 0000000000000002 RSI: 0000000000000020 RDI: 0000000000011220
RBP: ffff883f641eb7f0 R08: 0000000000000000 R09: ffff883f664d1180
R10: 0000000000000000 R11: ffff883f6441dbc0 R12: 0000000000011220
R13: ffff883f641eb790 R14: ffff883f641eb7a8 R15: 0000000000000030
FS:  0000000000000000(0000) GS:ffff88018a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000003f642e5000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process async/2 (pid: 1549, threadinfo ffff883f641ea000, task ffff883f64ffb540)
Stack:
 ffff883f641eb790 ffffffff81356790 ffff883f64ffb540 0000000062951800
<0> ffff883f641eb7d0 ffffffff81356a8a ffff883f641eb7e0 ffff883f6447e5e0
<0> ffff883f62951800 ffff883f639a2aa0 ffff883f62951800 ffff883f6447e5e0
Call Trace:
 [<ffffffff81356790>] ? scsi_init_sgtable+0x40/0x70
 [<ffffffff81356a8a>] ? scsi_init_io+0x3a/0x170
 [<ffffffffa018c371>] sd_prep_fn+0x761/0xea0 [sd_mod]
 [<ffffffff8110d3d0>] ? sync_page+0x0/0x50
 [<ffffffff81249dd3>] blk_peek_request+0xd3/0x210
 [<ffffffff81355f93>] scsi_request_fn+0x63/0x590
 [<ffffffff8110d3d0>] ? sync_page+0x0/0x50
 [<ffffffff81247b52>] __generic_unplug_device+0x32/0x40
 [<ffffffff81247b8e>] generic_unplug_device+0x2e/0x50
 [<ffffffff81242914>] blk_unplug+0x34/0x70
 [<ffffffff81242962>] blk_backing_dev_unplug+0x12/0x20
 [<ffffffff811a21be>] block_sync_page+0x3e/0x50
 [<ffffffff8110d408>] sync_page+0x38/0x50
 [<ffffffff814dc1ea>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff811a8d70>] ? blkdev_get_block+0x0/0x70
 [<ffffffff8110d3a7>] __lock_page+0x67/0x70
 [<ffffffff8108e1a0>] ? wake_bit_function+0x0/0x50
 [<ffffffff8110f2aa>] do_read_cache_page+0xca/0x180
 [<ffffffff811a9cf0>] ? blkdev_readpage+0x0/0x20
 [<ffffffff8110f3a9>] read_cache_page_async+0x19/0x20
 [<ffffffff8110f3be>] read_cache_page+0xe/0x20
 [<ffffffff811e0910>] read_dev_sector+0x30/0xa0
 [<ffffffff811e3651>] read_lba+0x101/0x110
 [<ffffffff811e3a85>] find_valid_gpt+0xd5/0x6b0
 [<ffffffff811e40df>] efi_partition+0x7f/0x360
 [<ffffffff814dad34>] ? printk+0x41/0x45
 [<ffffffff811e1635>] rescan_partitions+0x1a5/0x470
 [<ffffffffa018b281>] ? sd_open+0x81/0x1f0 [sd_mod]
 [<ffffffff811aa456>] __blkdev_get+0x1b6/0x3c0
 [<ffffffff811aa670>] blkdev_get+0x10/0x20
 [<ffffffff811e0ad5>] register_disk+0x155/0x170
 [<ffffffff81251526>] add_disk+0xa6/0x160
 [<ffffffffa018e80b>] sd_probe_async+0x13b/0x210 [sd_mod]
 [<ffffffff8108e4c6>] ? add_wait_queue+0x46/0x60
 [<ffffffff81096302>] async_thread+0x102/0x250
 [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20
 [<ffffffff81096200>] ? async_thread+0x0/0x250
 [<ffffffff8108ddf6>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd60>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Code: 12 01 00 4c 8d 6d a0 4c 8d 7b 30 44 89 e0 44 89 e7 83 e0 10 4d 8d 75 18 83 e7 af 89 45 9c 65 48 8b 04 25 00 cc 00 00 48 89 45 90 <48> 8b 73 18 ff 53 20 48 85 c0 48 89 c2 74 24 48 89 d0 48 8b 5d 
RIP  [<ffffffff8110fb2c>] mempool_alloc+0x5c/0x140
 RSP <ffff883f641eb770>
CR2: 0000000000000018
---[ end trace 81629a40a9f90dca ]---
Kernel panic - not syncing: Fatal exception
Pid: 1549, comm: async/2 Tainted: G      D    ----------------   2.6.32-131.0.15.el6.x86_64 #1
Call Trace:
 [<ffffffff814dac28>] ? panic+0x78/0x143
 [<ffffffff814dec74>] ? oops_end+0xe4/0x100
 [<ffffffff81040cdb>] ? no_context+0xfb/0x260
 [<ffffffff81040f65>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff8112f009>] ? zone_statistics+0x99/0xc0
 [<ffffffff81041033>] ? bad_area_nosemaphore+0x13/0x20
 [<ffffffff8104170d>] ? __do_page_fault+0x31d/0x480
 [<ffffffff8110fa25>] ? mempool_alloc_slab+0x15/0x20
 [<ffffffff8110fb33>] ? mempool_alloc+0x63/0x140
 [<ffffffff8125d53d>] ? cfq_service_tree_add+0x43d/0x530
 [<ffffffff814e0c3e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff814ddfe5>] ? page_fault+0x25/0x30
 [<ffffffff8110fb2c>] ? mempool_alloc+0x5c/0x140
 [<ffffffff81356790>] ? scsi_init_sgtable+0x40/0x70
 [<ffffffff81356a8a>] ? scsi_init_io+0x3a/0x170
 [<ffffffffa018c371>] ? sd_prep_fn+0x761/0xea0 [sd_mod]
 [<ffffffff8110d3d0>] ? sync_page+0x0/0x50
 [<ffffffff81249dd3>] ? blk_peek_request+0xd3/0x210
 [<ffffffff81355f93>] ? scsi_request_fn+0x63/0x590
 [<ffffffff8110d3d0>] ? sync_page+0x0/0x50
 [<ffffffff81247b52>] ? __generic_unplug_device+0x32/0x40
 [<ffffffff81247b8e>] ? generic_unplug_device+0x2e/0x50
 [<ffffffff81242914>] ? blk_unplug+0x34/0x70
 [<ffffffff81242962>] ? blk_backing_dev_unplug+0x12/0x20
 [<ffffffff811a21be>] ? block_sync_page+0x3e/0x50
 [<ffffffff8110d408>] ? sync_page+0x38/0x50
 [<ffffffff814dc1ea>] ? __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff811a8d70>] ? blkdev_get_block+0x0/0x70
 [<ffffffff8110d3a7>] ? __lock_page+0x67/0x70
 [<ffffffff8108e1a0>] ? wake_bit_function+0x0/0x50
 [<ffffffff8110f2aa>] ? do_read_cache_page+0xca/0x180
 [<ffffffff811a9cf0>] ? blkdev_readpage+0x0/0x20
 [<ffffffff8110f3a9>] ? read_cache_page_async+0x19/0x20
 [<ffffffff8110f3be>] ? read_cache_page+0xe/0x20
 [<ffffffff811e0910>] ? read_dev_sector+0x30/0xa0
 [<ffffffff811e3651>] ? read_lba+0x101/0x110
 [<ffffffff811e3a85>] ? find_valid_gpt+0xd5/0x6b0
 [<ffffffff811e40df>] ? efi_partition+0x7f/0x360
 [<ffffffff814dad34>] ? printk+0x41/0x45
 [<ffffffff811e1635>] ? rescan_partitions+0x1a5/0x470
 [<ffffffffa018b281>] ? sd_open+0x81/0x1f0 [sd_mod]
 [<ffffffff811aa456>] ? __blkdev_get+0x1b6/0x3c0
 [<ffffffff811aa670>] ? blkdev_get+0x10/0x20
 [<ffffffff811e0ad5>] ? register_disk+0x155/0x170
 [<ffffffff81251526>] ? add_disk+0xa6/0x160
 [<ffffffffa018e80b>] ? sd_probe_async+0x13b/0x210 [sd_mod]
 [<ffffffff8108e4c6>] ? add_wait_queue+0x46/0x60
 [<ffffffff81096302>] ? async_thread+0x102/0x250
 [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20
 [<ffffffff81096200>] ? async_thread+0x0/0x250
 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd60>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20

Version-Release number of selected component (if applicable):
 Only Redhat enterprise Linux 6.1, 6.0 will not panic. 6.0 with 6.1 kernel will panic.
How reproducible:
 Only can be reproduced on one machine now.
Steps to Reproduce:
The step to reproduced:
1  Install option cards on SUT.
2. Install RHEL6.1 on SUT.
3. Boot the OS.
Configruation:
Platform: X86_64
CPU:  4 x E7- 4860
DIMM: 304 GB
OS:   RHEL6.1
Option cards:
       slot0:x7281
       slot1:Erie-Ext
       slot2:Erie-Int
       slot3:x4446
       slot4:Pallene-Q
       slot6:CX2
       slot8:Niantic
       slot9:Pallene-E
  
Actual results:
 Panic and hang.
Expected results:
 No panic.
Additional info:

Comment 2 RHEL Program Management 2011-10-07 15:41:19 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 hejia 2012-05-07 08:18:48 UTC
hi,all I googled and got the information here. we unfortunatelly meet the similiar bug with similiar call trace panic.
because there is no mod dependancy of mpt2sas.ko for sd_mod.ko, kernel will load the 2 modules concurrently.
if we manually identify the dependancy as mpt2sas.ko : sd_mod.ko, the bug disappears.
root cause is still under investigating

Comment 4 Sven Oehme 2012-09-28 15:47:14 UTC
i see exactly the same problem when i upgraded my systems to 2.6.32-279.5.2.el6.x86_64
with the Stock 6.3 kernel 2.6.32-279.el6.x86_64 the problem doesn't  exist .

Comment 10 Ewan D. Milne 2013-01-17 14:36:14 UTC
I think this is a duplicate of 888417.  Would the reporter of this
problem please try the test kernel with a fix available at:

http://people.redhat.com/emilne/RPMS/.bz888417/

and update this bug with whether the test kernel fixes the problem.

Comment 11 Ethan 2014-09-30 01:23:14 UTC
Sorry, I moved to other BU, so couldn't test the fix.

Comment 12 Jiri Benc 2014-12-17 22:54:47 UTC
Closing per comment 11.