Bug 239450 - Kernel panic when performing data transfer is greater than 4K through sg device node through iscsi_sfnet driver
Kernel panic when performing data transfer is greater than 4K through sg devi...
Status: CLOSED DUPLICATE of bug 186008
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity high
: ---
: ---
Assigned To: Mike Christie
Martin Jenner
Depends On:
  Show dependency treegraph
Reported: 2007-05-08 11:11 EDT by yanling.qi@lsi.com
Modified: 2007-11-16 20:14 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-05-08 12:08:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description yanling.qi@lsi.com 2007-05-08 11:11:52 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)

Description of problem:
When sg driver accepts a sg_io request from user space such as SMagent, it invokes kernel API __get_free_pages() to allocate multiple pages for holding user space data IO request. The allocated pages will consist of one base page and a number of sub pages. Linux page object has internal count (active uses) and flags (what is used for). The pages have the following attributes after they are allocated by the sg driver. 
0 page:000001007fb89ac0 flags:0x01000000 mapping:0000000000000000 mapcount:0 count:1
1 page:000001007fb89af8 flags:0x01000004 mapping:0000000000000000 mapcount:0 count:0
2 page:000001007fb89b30 flags:0x01000004 mapping:0000000000000000 mapcount:0 count:0
Please note that only the base page has count=1 and all subpages have count=0. 

After the request reaches iscsi-sfnet initiator driver, the iscsi-sfnet driver will send a buffer with multiple pages one by one through network interface API. 
 rc = sock->ops->sendpage(sock, pg, pg_offset, len, flags);
At the network layer (linux/net/ipv4/tcp.c), the sendpage() operation will perform get_page() first and then put_page() later. The get_page() will increase the page’s count by 1. The put_page() will perform the following (linux/mm/swap.c) 
void put_page(struct page *page)
        if (unlikely(PageCompound(page))) {
                page = (struct page *)page->private;
                if (put_page_testzero(page)) {
                        void (*dtor)(struct page *page);
                        dtor = (void (*)(struct page *))page[1].mapping;
        if (!PageReserved(page) && put_page_testzero(page))
Please note that if the count is 0, the page will be released and recycled to the free-page pool. 

At the time when sg driver is ready to free its allocated pages by invoking free_pages(), the sub-pages is already re-used by someone else. We will get ‘’’Bad page kernel expeption’’ such as the following 
Bad page state at __free_pages_ok (in process 'java', page 000001007fb89b30)
flags:0x0100103c mapping:0000010075a4eaf0 mapcount:0 count:2
Call Trace:<ffffffff8015d37f>{bad_page+112} <ffffffff8015d713>{__free_pages_ok+154} 
      <ffffffffa01d9fa5>{:sg:sg_remove_scat+276} <ffffffffa01da13e> {:sg:sg_finish_rem_req+238} 
      <ffffffffa01da56a>{:sg:sg_new_read+1050} <ffffffffa01dcb48>{:sg:sg_ioctl+929} 
      <ffffffff8030a0f5>{thread_return+0} <ffffffff801d42e6>{selinux_file_ioctl+711} 
      <ffffffff8030ab88>{schedule_timeout+224} <ffffffff8016bfb6>{find_extend_vma+22} 
      <ffffffff8014c6b0>{unqueue_me+138} <ffffffff8014c8ce>{do_futex+441} 
      <ffffffff80135752>{autoremove_wake_function+0} <ffffffff80135752>{autoremove_wake_function+0} 
      <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8012a122>{sg_ioctl_trans+832} 
      <ffffffff8019e8ac>{compat_sys_ioctl+235} <ffffffff80125bbb>{sysenter_do_call+27}
In the above exception, the page with page address 000001007fb89b30 has been used by block IOs with active count 2 and memory mapped. Because the sg driver tries to free a page that is mapped and active, the kernel need a hard-panic to recover the Bad page problem. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.find a sg device node whose scsi-device is provided by iscsi-sfnet driver. Say /dev/sg10
2.sg_dd of=/dev/sg10 if=/dev/zero count=65

Actual Results:
Expect 65 blocks being written to the iscsi device

Expected Results:
kernel panic

Additional info:
Comment 1 yanling.qi@lsi.com 2007-05-08 11:13:50 EDT
This bug is releated to Bug 239447.
Comment 2 Mike Christie 2007-05-08 12:07:05 EDT
I am taking this bz. We need to work out some issues upstream first I think.
Comment 3 Mike Christie 2007-05-08 12:08:21 EDT

*** This bug has been marked as a duplicate of 186008 ***

Note You need to log in before you can comment on or make changes to this bug.