Bug 609146 - Kernel crash on mkfs.ext4 with thin provisioned iSCSI LUN
Kernel crash on mkfs.ext4 with thin provisioned iSCSI LUN
Status: CLOSED DUPLICATE of bug 610054
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
x86_64 Linux
low Severity high
: rc
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
:
Depends On: 610054
Blocks: 594856
  Show dependency treegraph
 
Reported: 2010-06-29 10:08 EDT by Shyam Iyer
Modified: 2015-04-28 00:18 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-06 13:06:11 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Kernel panic log with upstream (4.54 KB, application/octet-stream)
2010-06-29 10:13 EDT, Shyam Iyer
no flags Details

  None (edit)
Description Shyam Iyer 2010-06-29 10:08:24 EDT
Description of problem:
mkfs.ext4 on a thin provisioned LUN crashes the kernel

Version-Release number of selected component (if applicable):
RHEL6 any latest kernel.
Upstream kernel 2.6.35-rcX

How reproducible:
Always

Steps to Reproduce:
1.Create a thin provisioned LUN
2.mkfs.ext4 /dev/sdX
3.
  
Actual results:
The kernel crashes on libiscsi_tcp in the kernel

Expected results:
The filesystem should be formatted properly

Additional info:

1) mkfs.ext4 -K /dev/sdX does not crash the kernel

2) Formating the LUN on a RHEL5 kernel does not crash

3) Changing partition type to GPT from MBR also does not cause the issue to occur.
Comment 2 Shyam Iyer 2010-06-29 10:13:02 EDT
Created attachment 427696 [details]
Kernel panic log with upstream
Comment 3 RHEL Product and Program Management 2010-06-29 10:23:17 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 5 Mike Snitzer 2010-06-29 12:13:25 EDT
I sent this in mail earlier but figured I'd capture it in this BZ too:

The BLKDISCARD ioctl does _not_ issue a discard with a barrier
(bio->bi_rw does not have BIO_RW_BARRIER set).  But discards through
ext4 do use BIO_RW_BARRIER.

I can avoid the NULL pointer if I have all discards use a barrier (I
verified this by having DM always set BIO_RW_BARRIER for discards).

This explains why Shyam didn't see the NULL pointer when he:
1) formatted the device using RHEL5's mkfs.ext4 (which doesn't issue
   BLKDISCARD ioctl).
2) then used RHEL6's ext4 with -o discard to test FS generated discards  
   work.
Comment 6 Shyam Iyer 2010-06-29 13:39:36 EDT
I created the thin provisioned LUN using scsi_debug

modprobe scsi_debug dev_size_mb=100 unmap_max_desc=16 unmap_granularity=2048 sector_size=4096

When I run mkfs.ext4 on this device the issue is not seen.

Mike - You mentioned in email that this needs to be exported as an iSCSI device. I guess that is what I am missing.. How to do that ?
Comment 7 Mike Snitzer 2010-06-29 13:48:37 EDT
(In reply to comment #6)
> I created the thin provisioned LUN using scsi_debug
> 
> modprobe scsi_debug dev_size_mb=100 unmap_max_desc=16 unmap_granularity=2048
> sector_size=4096
> 
> When I run mkfs.ext4 on this device the issue is not seen.
> 
> Mike - You mentioned in email that this needs to be exported as an iSCSI
> device. I guess that is what I am missing.. How to do that ?    

That is left as an exercise to the user ;)
More seriously: RHEL6 provides scsi-target-utils (aka tgtd)

I've not used tgtd for a bit so I'm short on configuration details... and to be clear: I was just speculating that this software-only iSCSI target (which exports a scsi-debug TP lun) would allow us to reproduce the libiscsi_tcp NULL pointer.  I didn't actually try it.
Comment 8 Shyam Iyer 2010-06-29 13:57:05 EDT
Got you.. this time :)
Comment 9 Mike Christie 2010-07-01 15:21:57 EDT
Found the problem.

scsi_bufflen(scsi_cmnd) reports the len as the size of the operation and has sc_data_direction set to DMA_TO_DEVICE, for the WRITE_SAME_16 command. The iscsi layer then sets things up thinking it needs to write that much data. When the iscsi layer goes to access the scatterlist there is not going to be one, and we oops trying to access it.
Comment 10 Mike Snitzer 2010-07-01 17:53:33 EDT
We are now fairly confident that the patches associated with fixing BZ 610054 will also fix this iSCSI NULL pointer.

I'll have a brew scratch build x86_64 kernel available for test tomorrow.

I'll post the kernel on my people page for Dell to download and test.
Comment 11 Mike Snitzer 2010-07-02 10:44:55 EDT
(In reply to comment #10)
> I'll have a brew scratch build x86_64 kernel available for test tomorrow.
> 
> I'll post the kernel on my people page for Dell to download and test.    

Shyam, please download the test kernel rpm from:
http://people.redhat.com/msnitzer/RPMS/bz609146/

I have also provided the 'kernel-devel' package in case Dell 3rd party modules must be built.

Dell can access these RPMS directly for remote testing within Dell.

Setting to NEEDINFO, I look forward to getting Dell's test results.
Comment 12 Shyam Iyer 2010-07-02 13:53:45 EDT
Mike,

I will need the kernel-firmware to install this.

-Shyam
Comment 13 Mike Snitzer 2010-07-02 14:20:30 EDT
(In reply to comment #12)
> Mike,
> 
> I will need the kernel-firmware to install this.

Please just use the kernel-firmware that you currently have on your RHEL6 system.  Install the rpm with: rpm -ivh --nodeps kernel-2.6.32-42.el6.bz609146.x86_64.rpm
Comment 14 Shyam Iyer 2010-07-06 12:21:36 EDT
The results are positive with some quick testing.
Comment 15 Mike Snitzer 2010-07-06 13:06:11 EDT
Marking as a duplicate of BZ 610054 because its fixes address this BZ as a side-effect.

*** This bug has been marked as a duplicate of bug 610054 ***

Note You need to log in before you can comment on or make changes to this bug.