RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 609146 - Kernel crash on mkfs.ext4 with thin provisioned iSCSI LUN
Summary: Kernel crash on mkfs.ext4 with thin provisioned iSCSI LUN
Keywords:
Status: CLOSED DUPLICATE of bug 610054
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 610054
Blocks: 594856
TreeView+ depends on / blocked
 
Reported: 2010-06-29 14:08 UTC by Shyam Iyer
Modified: 2015-04-28 04:18 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-06 17:06:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Kernel panic log with upstream (4.54 KB, application/octet-stream)
2010-06-29 14:13 UTC, Shyam Iyer
no flags Details

Description Shyam Iyer 2010-06-29 14:08:24 UTC
Description of problem:
mkfs.ext4 on a thin provisioned LUN crashes the kernel

Version-Release number of selected component (if applicable):
RHEL6 any latest kernel.
Upstream kernel 2.6.35-rcX

How reproducible:
Always

Steps to Reproduce:
1.Create a thin provisioned LUN
2.mkfs.ext4 /dev/sdX
3.
  
Actual results:
The kernel crashes on libiscsi_tcp in the kernel

Expected results:
The filesystem should be formatted properly

Additional info:

1) mkfs.ext4 -K /dev/sdX does not crash the kernel

2) Formating the LUN on a RHEL5 kernel does not crash

3) Changing partition type to GPT from MBR also does not cause the issue to occur.

Comment 2 Shyam Iyer 2010-06-29 14:13:02 UTC
Created attachment 427696 [details]
Kernel panic log with upstream

Comment 3 RHEL Program Management 2010-06-29 14:23:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 5 Mike Snitzer 2010-06-29 16:13:25 UTC
I sent this in mail earlier but figured I'd capture it in this BZ too:

The BLKDISCARD ioctl does _not_ issue a discard with a barrier
(bio->bi_rw does not have BIO_RW_BARRIER set).  But discards through
ext4 do use BIO_RW_BARRIER.

I can avoid the NULL pointer if I have all discards use a barrier (I
verified this by having DM always set BIO_RW_BARRIER for discards).

This explains why Shyam didn't see the NULL pointer when he:
1) formatted the device using RHEL5's mkfs.ext4 (which doesn't issue
   BLKDISCARD ioctl).
2) then used RHEL6's ext4 with -o discard to test FS generated discards  
   work.

Comment 6 Shyam Iyer 2010-06-29 17:39:36 UTC
I created the thin provisioned LUN using scsi_debug

modprobe scsi_debug dev_size_mb=100 unmap_max_desc=16 unmap_granularity=2048 sector_size=4096

When I run mkfs.ext4 on this device the issue is not seen.

Mike - You mentioned in email that this needs to be exported as an iSCSI device. I guess that is what I am missing.. How to do that ?

Comment 7 Mike Snitzer 2010-06-29 17:48:37 UTC
(In reply to comment #6)
> I created the thin provisioned LUN using scsi_debug
> 
> modprobe scsi_debug dev_size_mb=100 unmap_max_desc=16 unmap_granularity=2048
> sector_size=4096
> 
> When I run mkfs.ext4 on this device the issue is not seen.
> 
> Mike - You mentioned in email that this needs to be exported as an iSCSI
> device. I guess that is what I am missing.. How to do that ?    

That is left as an exercise to the user ;)
More seriously: RHEL6 provides scsi-target-utils (aka tgtd)

I've not used tgtd for a bit so I'm short on configuration details... and to be clear: I was just speculating that this software-only iSCSI target (which exports a scsi-debug TP lun) would allow us to reproduce the libiscsi_tcp NULL pointer.  I didn't actually try it.

Comment 8 Shyam Iyer 2010-06-29 17:57:05 UTC
Got you.. this time :)

Comment 9 Mike Christie 2010-07-01 19:21:57 UTC
Found the problem.

scsi_bufflen(scsi_cmnd) reports the len as the size of the operation and has sc_data_direction set to DMA_TO_DEVICE, for the WRITE_SAME_16 command. The iscsi layer then sets things up thinking it needs to write that much data. When the iscsi layer goes to access the scatterlist there is not going to be one, and we oops trying to access it.

Comment 10 Mike Snitzer 2010-07-01 21:53:33 UTC
We are now fairly confident that the patches associated with fixing BZ 610054 will also fix this iSCSI NULL pointer.

I'll have a brew scratch build x86_64 kernel available for test tomorrow.

I'll post the kernel on my people page for Dell to download and test.

Comment 11 Mike Snitzer 2010-07-02 14:44:55 UTC
(In reply to comment #10)
> I'll have a brew scratch build x86_64 kernel available for test tomorrow.
> 
> I'll post the kernel on my people page for Dell to download and test.    

Shyam, please download the test kernel rpm from:
http://people.redhat.com/msnitzer/RPMS/bz609146/

I have also provided the 'kernel-devel' package in case Dell 3rd party modules must be built.

Dell can access these RPMS directly for remote testing within Dell.

Setting to NEEDINFO, I look forward to getting Dell's test results.

Comment 12 Shyam Iyer 2010-07-02 17:53:45 UTC
Mike,

I will need the kernel-firmware to install this.

-Shyam

Comment 13 Mike Snitzer 2010-07-02 18:20:30 UTC
(In reply to comment #12)
> Mike,
> 
> I will need the kernel-firmware to install this.

Please just use the kernel-firmware that you currently have on your RHEL6 system.  Install the rpm with: rpm -ivh --nodeps kernel-2.6.32-42.el6.bz609146.x86_64.rpm

Comment 14 Shyam Iyer 2010-07-06 16:21:36 UTC
The results are positive with some quick testing.

Comment 15 Mike Snitzer 2010-07-06 17:06:11 UTC
Marking as a duplicate of BZ 610054 because its fixes address this BZ as a side-effect.

*** This bug has been marked as a duplicate of bug 610054 ***


Note You need to log in before you can comment on or make changes to this bug.