Description of problem: Kernel error 'kernel: Bad page state at __free_pages_ok' occurs while executing SCSI writes via SG driver when using iSCSI driver and the transfer length exceeds the host's memory page size. Version-Release number of selected component (if applicable): SG: 30531 3.5.31 [20040516] SFNet iSCSI Driver Version ...4:0.1.11(12-Jan-2005) How reproducible: Easily reproducable Steps to Reproduce: 1. Use function ioctl(sg_fd, SG_IO, &io_hdr) where the CDB opcode is 0x2A and transfer length is greater than the host's memory page size. Actual results: Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ... dit-43-203 kernel: Bad page state at __free_pages_ok (in process 'tony', page c1075520) Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ... dit-43-203 kernel: flags:0x20000060 mapping:c44eb849 mapcount:1 count:1 Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ... dit-43-203 kernel: Backtrace: Message from syslogd@dit-43-203 at Thu Mar 16 17:59:45 2006 ... dit-43-203 kernel: Trying to fix it up, but a reboot is needed Expected results: No kernel error Additional info: I've run extensive write IOs using a Qlogic hba with nb=64 and also iSCSI with nb=8 and encountered NO problems after prolong periods of time (24+ hrs). But when using nb=9 and using the iSCSI driver, âBad page state at __free_pages_okâ occurs within the first few IOs.
Created attachment 126368 [details] Contains various component versions; Kernel error output
This looks like it is going to need a fix to the kernel iscsi_sfnet driver. Changing components to kernel.
*** Bug 239450 has been marked as a duplicate of this bug. ***
Created attachment 154358 [details] sg.c sets PG_reserved for highmem pages This is a patch to the scsi/sg.c. The sg driver will set PG_reserved for highmem pages at sg_page_malloc() and clear the bit/count at sg_page_free() time. I did test and it worked great. Do you both see any side impacts?
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Mike, Seems this issue is specific to RHEL4 kernel. I tried the same test on a SLES10SP1 with open-iscsi driver (lk 2.6.16.37- 0.23). It works fine. What happens is that both "alloc_pages()" and "__get_free_pages()" will set page_count to 1 for base page and sub-pages. Because page_count =1, the subpages will not be recycled. It seems the mm code has changed alloc_pages and __get_free_pages()'s behavior along the way from 2.6.9 to 2.6.16. Therefore, we don't have an issue in the upstream kernel and RHEL5. 0 page:ffff81007f8da240 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 1 page:ffff81007f8da278 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 2 page:ffff81007f8da2b0 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 3 page:ffff81007f8da2e8 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 4 page:ffff81007f8da320 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 5 page:ffff81007f8da358 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 6 page:ffff81007f8da390 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 7 page:ffff81007f8da3c8 flags:0x0100000000004000 mapping:0000000000000000 mapcount:0 count:1 Thanks, Yanling
Are there any test kernels which have this patch? This issue is causing us grief as we have been using iSCSI on x86_32 to drive various devices (tape changers, etc) with great success, however, for some reason (perhaps different default page sizes?) on x86_64 systems this problem bites us. We'll try the patch. Thanks, Tom
Created attachment 159150 [details] grab a rerfecnce to pages we are allocating Based on feedback I sent this patch to rh kernel. I am still not sure if it is better to try to backport the mm fix, or just say we only support block SG IO, or go with this work around, so I left that to our kernel reviewers to help out on.
Could someone please attach the reproducer to the bugzilla?
Created attachment 159221 [details] program to reproduce problem You can use sg3_utils's sg_dd with sg_dd of=/dev/sg10 if=/dev/zero count=65 to reproduce the problem. If you do not have easy access to sg3_utils then the program attached will hit it too. Run the program with ./a.out /dev/sg10 20 The last argument can be any valid size, but you need to run higher numbers to hit this.
A patch for this issue has been included in build 2.6.9-55.20.EL.
A fix for this issue should have been included in the packages contained in the RHEL4.6 Beta released on RHN (also available at partners.redhat.com). Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot1 on partners.redhat.com. Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on partners.redhat.com. Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot3 on partners.redhat.com. Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should be included in the packages contained in RHEL4.6-Snapshot4--available now on partners.redhat.com. Please verify that your issue is fixed ASAP to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should be included in the packages contained in RHEL4.6-Snapshot5--available now on partners.redhat.com. Please verify that your issue is fixed ASAP to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html