Description of problem:
mkfs.ext4 on a thin provisioned LUN crashes the kernel
Version-Release number of selected component (if applicable):
RHEL6 any latest kernel.
Upstream kernel 2.6.35-rcX
Steps to Reproduce:
1.Create a thin provisioned LUN
The kernel crashes on libiscsi_tcp in the kernel
The filesystem should be formatted properly
1) mkfs.ext4 -K /dev/sdX does not crash the kernel
2) Formating the LUN on a RHEL5 kernel does not crash
3) Changing partition type to GPT from MBR also does not cause the issue to occur.
Created attachment 427696 [details]
Kernel panic log with upstream
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release. Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release. This request is not yet committed for
I sent this in mail earlier but figured I'd capture it in this BZ too:
The BLKDISCARD ioctl does _not_ issue a discard with a barrier
(bio->bi_rw does not have BIO_RW_BARRIER set). But discards through
ext4 do use BIO_RW_BARRIER.
I can avoid the NULL pointer if I have all discards use a barrier (I
verified this by having DM always set BIO_RW_BARRIER for discards).
This explains why Shyam didn't see the NULL pointer when he:
1) formatted the device using RHEL5's mkfs.ext4 (which doesn't issue
2) then used RHEL6's ext4 with -o discard to test FS generated discards
I created the thin provisioned LUN using scsi_debug
modprobe scsi_debug dev_size_mb=100 unmap_max_desc=16 unmap_granularity=2048 sector_size=4096
When I run mkfs.ext4 on this device the issue is not seen.
Mike - You mentioned in email that this needs to be exported as an iSCSI device. I guess that is what I am missing.. How to do that ?
(In reply to comment #6)
> I created the thin provisioned LUN using scsi_debug
> modprobe scsi_debug dev_size_mb=100 unmap_max_desc=16 unmap_granularity=2048
> When I run mkfs.ext4 on this device the issue is not seen.
> Mike - You mentioned in email that this needs to be exported as an iSCSI
> device. I guess that is what I am missing.. How to do that ?
That is left as an exercise to the user ;)
More seriously: RHEL6 provides scsi-target-utils (aka tgtd)
I've not used tgtd for a bit so I'm short on configuration details... and to be clear: I was just speculating that this software-only iSCSI target (which exports a scsi-debug TP lun) would allow us to reproduce the libiscsi_tcp NULL pointer. I didn't actually try it.
Got you.. this time :)
Found the problem.
scsi_bufflen(scsi_cmnd) reports the len as the size of the operation and has sc_data_direction set to DMA_TO_DEVICE, for the WRITE_SAME_16 command. The iscsi layer then sets things up thinking it needs to write that much data. When the iscsi layer goes to access the scatterlist there is not going to be one, and we oops trying to access it.
We are now fairly confident that the patches associated with fixing BZ 610054 will also fix this iSCSI NULL pointer.
I'll have a brew scratch build x86_64 kernel available for test tomorrow.
I'll post the kernel on my people page for Dell to download and test.
(In reply to comment #10)
> I'll have a brew scratch build x86_64 kernel available for test tomorrow.
> I'll post the kernel on my people page for Dell to download and test.
Shyam, please download the test kernel rpm from:
I have also provided the 'kernel-devel' package in case Dell 3rd party modules must be built.
Dell can access these RPMS directly for remote testing within Dell.
Setting to NEEDINFO, I look forward to getting Dell's test results.
I will need the kernel-firmware to install this.
(In reply to comment #12)
> I will need the kernel-firmware to install this.
Please just use the kernel-firmware that you currently have on your RHEL6 system. Install the rpm with: rpm -ivh --nodeps kernel-2.6.32-42.el6.bz609146.x86_64.rpm
The results are positive with some quick testing.
Marking as a duplicate of BZ 610054 because its fixes address this BZ as a side-effect.
*** This bug has been marked as a duplicate of bug 610054 ***