Bug 186008 - Bad page state after issue SCSI Write via SCSI Generic (sg) driver
Summary: Bad page state after issue SCSI Write via SCSI Generic (sg) driver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Mike Christie
QA Contact:
URL:
Whiteboard:
: 239450 (view as bug list)
Depends On:
Blocks: 217099 246028
TreeView+ depends on / blocked
 
Reported: 2006-03-20 23:05 UTC by Tony De La Cruz
Modified: 2018-10-19 23:27 UTC (History)
10 users (show)

Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-15 16:13:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Contains various component versions; Kernel error output (4.13 KB, text/plain)
2006-03-20 23:05 UTC, Tony De La Cruz
no flags Details
sg.c sets PG_reserved for highmem pages (1.30 KB, application/octet-stream)
2007-05-08 19:57 UTC, yanling.qi@lsi.com
no flags Details
grab a rerfecnce to pages we are allocating (1.50 KB, patch)
2007-07-13 11:13 UTC, Mike Christie
no flags Details | Diff
program to reproduce problem (5.01 KB, text/x-csrc)
2007-07-13 17:41 UTC, Mike Christie
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0791 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6 2007-11-14 18:25:55 UTC

Description Tony De La Cruz 2006-03-20 23:05:57 UTC
Description of problem:
Kernel error 'kernel: Bad page state at __free_pages_ok' occurs while executing 
SCSI writes via SG driver when using iSCSI driver and the transfer length 
exceeds the host's memory page size.


Version-Release number of selected component (if applicable):
SG: 30531   3.5.31 [20040516]
SFNet iSCSI Driver Version ...4:0.1.11(12-Jan-2005)

 
How reproducible:
Easily reproducable

Steps to Reproduce:
1. Use function ioctl(sg_fd, SG_IO, &io_hdr) where the CDB opcode is 0x2A and 
transfer length is greater than the host's memory page size.

  
Actual results:
Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ...
dit-43-203 kernel: Bad page state at __free_pages_ok (in process 'tony', page 
c1075520)

Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ...
dit-43-203 kernel: flags:0x20000060 mapping:c44eb849 mapcount:1 count:1

Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ...
dit-43-203 kernel: Backtrace:

Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ...
dit-43-203 kernel: Trying to fix it up, but a reboot is needed


Expected results:
No kernel error

Additional info:
I've run extensive write IOs using a Qlogic hba with nb=64 and also iSCSI with 
nb=8 and encountered NO problems after prolong periods of time (24+ hrs). But 
when using nb=9 and using the iSCSI driver, âBad page state at __free_pages_okâ 
occurs within the first few IOs.

Comment 1 Tony De La Cruz 2006-03-20 23:05:57 UTC
Created attachment 126368 [details]
Contains various component versions; Kernel error output

Comment 2 Mike Christie 2006-03-21 21:00:50 UTC
This looks like it is going to need a fix to the kernel iscsi_sfnet driver.
Changing components to kernel.

Comment 10 Mike Christie 2007-05-08 16:08:23 UTC
*** Bug 239450 has been marked as a duplicate of this bug. ***

Comment 11 yanling.qi@lsi.com 2007-05-08 19:57:17 UTC
Created attachment 154358 [details]
sg.c sets PG_reserved for highmem pages

This is a patch to the scsi/sg.c. The sg driver will set PG_reserved for
highmem pages at sg_page_malloc() and clear the bit/count at sg_page_free()
time. I did test and it worked great. Do you both see any side impacts?

Comment 12 RHEL Program Management 2007-05-09 10:40:34 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 yanling.qi@lsi.com 2007-05-09 16:16:29 UTC
Mike,

Seems this issue is specific to RHEL4 kernel.

I tried the same test on a SLES10SP1 with open-iscsi driver (lk 2.6.16.37-
0.23). It works fine.
What happens is that both "alloc_pages()" and "__get_free_pages()" will set 
page_count to 1 for base page and sub-pages. Because page_count =1, the 
subpages will not be recycled.

It seems the mm code has changed alloc_pages and __get_free_pages()'s behavior 
along the way from 2.6.9 to 2.6.16.

Therefore, we don't have an issue in the upstream kernel and RHEL5.

0 page:ffff81007f8da240 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1
1 page:ffff81007f8da278 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1
2 page:ffff81007f8da2b0 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1
3 page:ffff81007f8da2e8 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1
4 page:ffff81007f8da320 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1
5 page:ffff81007f8da358 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1
6 page:ffff81007f8da390 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1
7 page:ffff81007f8da3c8 flags:0x0100000000004000 mapping:0000000000000000 
mapcount:0 count:1

Thanks,
Yanling


Comment 15 Tom Sightler 2007-05-23 21:38:04 UTC
Are there any test kernels which have this patch?  This issue is causing us
grief as we have been using iSCSI on x86_32 to drive various devices (tape
changers, etc) with great success, however, for some reason (perhaps different
default page sizes?) on x86_64 systems this problem bites us.

We'll try the patch.

Thanks,
Tom


Comment 17 Mike Christie 2007-07-13 11:13:10 UTC
Created attachment 159150 [details]
grab a rerfecnce to pages we are allocating

Based on feedback I sent this patch to rh kernel.

I am still not sure if it is better to try to backport the mm fix, or just say
we only support block SG IO, or go with this work around, so I left that to our
kernel reviewers to help out on.

Comment 18 Jeff Moyer 2007-07-13 17:11:52 UTC
Could someone please attach the reproducer to the bugzilla?

Comment 19 Mike Christie 2007-07-13 17:41:13 UTC
Created attachment 159221 [details]
program to reproduce problem

You can use sg3_utils's sg_dd with

sg_dd of=/dev/sg10 if=/dev/zero count=65

to reproduce the problem. If you do not have easy access to sg3_utils then the
program attached will hit it too. Run the program with

./a.out /dev/sg10 20

The last argument can be any valid size, but you need to run higher numbers to
hit this.

Comment 20 Don Howard 2007-07-18 21:39:15 UTC
A patch for this issue has been included in build 2.6.9-55.20.EL.

Comment 22 John Poelstra 2007-08-29 16:26:51 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6 Beta released on RHN (also available at partners.redhat.com).  

Requested action: Please verify that your issue is fixed to ensure that it is
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 23 John Poelstra 2007-09-05 22:28:04 UTC
A fix for this issue should have been included in the packages contained in 
the RHEL4.6-Snapshot1 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed to ensure that it is 
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed, 
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent 
symptoms of the problem you are having and change the status of the bug to 
FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test 
results to Issue Tracker.  If you need assistance accessing 
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 24 John Poelstra 2007-09-12 00:44:33 UTC
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on
partners.redhat.com.  

Please verify that your issue is fixed to ensure that it is included in this
update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 25 John Poelstra 2007-09-20 04:32:31 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6-Snapshot3 on partners.redhat.com.  

Please verify that your issue is fixed to ensure that it is included in this
update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.


Comment 26 John Poelstra 2007-09-26 23:37:19 UTC
A fix for this issue should be included in the packages contained in
RHEL4.6-Snapshot4--available now on partners.redhat.com.  

Please verify that your issue is fixed ASAP to ensure that it is included in
this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 27 John Poelstra 2007-10-05 02:59:30 UTC
A fix for this issue should be included in the packages contained in
RHEL4.6-Snapshot5--available now on partners.redhat.com.  

Please verify that your issue is fixed ASAP to ensure that it is included in
this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 29 errata-xmlrpc 2007-11-15 16:13:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html



Note You need to log in before you can comment on or make changes to this bug.