RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 574266 - Data Changes In-Flight After I/O Submission
Summary: Data Changes In-Flight After I/O Submission
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-03-17 01:22 UTC by Ihab Hamadi
Modified: 2011-02-01 12:47 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-01 12:47:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ihab Hamadi 2010-03-17 01:22:46 UTC
Description of problem: When using Emulex Lightpulse 8Gb FC HBAs (Model #'s 12000, 12002, 12004) with BlockGuard (T10-DIF Implementation by Emulex + Support for DIX), The data submitted for I/O can change after scsi_host_template->queuecommand() is called. This results in the guard tag (CRC or IP-Checksum) being invalid, which then gets flagged by the adapter. 

The pages containing the data are supposed to be locked-down until the I/O completes, but there seems to be an issue with the page-cache that could result in time-windows where the pages get written to after they are submitted for I/O (e.g. after a write starts).


Version-Release number of selected component (if applicable): This issue has existed for a while.  It was recently observed with 2.6.33


How reproducible: The issue is reproducible


Steps to Reproduce:
1.You need to have a 8GB FC HBA from Emulex with BlockGuard FW
2.You need to have T10-DIF capable drives (e.g HITACHI  HUS153073VLF40)
3.Load lpfc, enabling BlockGuard  (insmod lpfc.ko lpfc_enable_bg=1 lpfc_prot_mask=0x11 lpfc_log_verbose=0x40)

# lsscsi --version
version: 0.21  2008/07/10
(A later version of lsscsi would work as well)
# lsscsi -p
[10:0:0:0]   disk    HITACHI  HUS153073VLF400  F410  /dev/sda   DIF/Type1  T10-DIF-TYPE1-IP    
[10:0:1:0]   disk    SEAGATE  ST3300656FC      MSE2  /dev/sdb   DIF/Type1  T10-DIF-TYPE1-IP    
[10:0:2:0]   disk    HITACHI  HUS153073VLF400  F410  /dev/sdc   DIF/Type1  T10-DIF-TYPE1-IP    
[10:0:3:0]   disk    HITACHI  HUS154545VLF400  F4B0  /dev/sdd   DIF/Type1  T10-DIF-TYPE1-IP    
4-In terminal 1, run the following:
#dd if=/dev/urandom of=/dev/sda bs=1k seek=100 
5-In terminal 2, run the following:
#dd if=/dev/urandom of=/dev/sda bs=1k seek=1000
6-In terminal 3, run syn periodically (by hand)
#sync

The system log will show you something similar to:
pfc 0000:0b:00.0: 0:(0):9030 FCP cmd x2a failed <0/0> status: x3 result: xd Data: x841 x8d4
lpfc 0000:0b:00.0: 0:9069 BLKGRD: BG ERROR in cmd 0x2a lba 0x8f24 blk cnt 0x250 bgstat=0x9 bghm=0x49d90
lpfc 0000:0b:00.0: 0:9055 BLKGRD: guard_tag error
end_request: I/O error, dev sda, sector 36644

  
Actual results: I/O failure due to invalid checksum/CRC


Expected results: I/O should not fail


Additional info:

Comment 1 Christof Schmitt 2010-05-31 11:36:58 UTC
I am seeing a similar problem on internal tests, see
http://marc.info/?l=linux-scsi&m=127530531808556&w=2

Adding a debug patch in sd_prep_fn shows that the guard tag does not match the user data on some write requests. I see the problem on the 2.6.34 kernel, but only on the ext2 filesystem, not on ext3/4.

Comment 2 Ric Wheeler 2010-06-01 13:15:14 UTC
Are you testing on RHEL5.4 or just upstream?

Thanks!

Comment 4 Christof Schmitt 2010-06-01 13:22:13 UTC
Upstream 2.6.34. Since the above comment mentioned 2.6.33, i wanted to know if this is the same problem. According to the e-mail discussion this is the same problem, and buffers can change while I/O is in flight.

Comment 5 Ric Wheeler 2010-06-01 13:47:44 UTC
I think that this BZ was filed incorrectly against RHEL5.4. Definitely an interesting upstream thread and something that we need to fix in the future.

For RHEL5.x or RHEL6.0, we do not support DIF/DIX via file systems. The narrow focus was for support of DIF/DIX for raw devices.

Regards,

Ric

Comment 8 RHEL Program Management 2010-06-07 16:07:15 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 10 Tom Coughlan 2010-07-01 23:24:17 UTC
Just for reference, there are three other RHEL 6 BZs related to DIF: 

Bug 549918 - kernel panic when running xfstests on ext4 over emulated DIF scsi disk

This one has a release note for 6.0, indicating that DIF can only be used with O_DIRECT.

That BZ also reports a possible bug in scsi_debug that causes a kernel panic when it is used to test DIF. If true, that will be queued for 6.1. 

Bug 549913 - xfs errors when running xfstests on emulated DIF scsi disk.
(closed)

Bug 606161 - xfs errors when running xfstests on emulated DIF scsi disk.

This BZ is against the Storage Admin Guide, to get the same info that is in the release note into the guide.

Comment 11 Tom Coughlan 2010-07-14 19:29:55 UTC
(In reply to comment #0)
> Description of problem: 
> 
> Steps to Reproduce:
  
> 4-In terminal 1, run the following:
> #dd if=/dev/urandom of=/dev/sda bs=1k seek=100 
> 5-In terminal 2, run the following:
> #dd if=/dev/urandom of=/dev/sda bs=1k seek=1000
> 6-In terminal 3, run syn periodically (by hand)
> #sync

Ihab,

So, you need to confine your BlockGuard/DIF testing to O_DIRECT. 

In the above, I believe all you need to do is add oflag=direct to the dd command line. Please try that. 

You may have access to other tests, like dt, that can also be set to use direct I/O, or you can use raw devices, or XFS in direct I/O mode. 

We appreciate your effort in testing this. 

Tom

Comment 12 RHEL Program Management 2011-01-07 04:28:07 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 13 Suzanne Logcher 2011-01-07 16:25:32 UTC
This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 14 RHEL Program Management 2011-02-01 05:57:42 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 15 Ric Wheeler 2011-02-01 12:47:58 UTC
Thanks for the testing!

As Tom noted above, DIF/DIX works only in O_DIRECT mode, so I will close this BZ on the assumption that your tests pass with O_DIRECT.

Please reopen if you see issues when using O_DIRECT, thanks!


Note You need to log in before you can comment on or make changes to this bug.