Bug 239450 - Kernel panic when performing data transfer is greater than 4K through sg device node through iscsi_sfnet driver
Summary: Kernel panic when performing data transfer is greater than 4K through sg devi...
Keywords:
Status: CLOSED DUPLICATE of bug 186008
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Mike Christie
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-08 15:11 UTC by yanling.qi@lsi.com
Modified: 2007-11-17 01:14 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 16:08:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description yanling.qi@lsi.com 2007-05-08 15:11:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)

Description of problem:
When sg driver accepts a sg_io request from user space such as SMagent, it invokes kernel API __get_free_pages() to allocate multiple pages for holding user space data IO request. The allocated pages will consist of one base page and a number of sub pages. Linux page object has internal count (active uses) and flags (what is used for). The pages have the following attributes after they are allocated by the sg driver. 
0 page:000001007fb89ac0 flags:0x01000000 mapping:0000000000000000 mapcount:0 count:1
1 page:000001007fb89af8 flags:0x01000004 mapping:0000000000000000 mapcount:0 count:0
2 page:000001007fb89b30 flags:0x01000004 mapping:0000000000000000 mapcount:0 count:0
Please note that only the base page has count=1 and all subpages have count=0. 

After the request reaches iscsi-sfnet initiator driver, the iscsi-sfnet driver will send a buffer with multiple pages one by one through network interface API. 
 rc = sock->ops->sendpage(sock, pg, pg_offset, len, flags);
At the network layer (linux/net/ipv4/tcp.c), the sendpage() operation will perform get_page() first and then put_page() later. The get_page() will increase the page’s count by 1. The put_page() will perform the following (linux/mm/swap.c) 
void put_page(struct page *page)
{
        if (unlikely(PageCompound(page))) {
                page = (struct page *)page->private;
                if (put_page_testzero(page)) {
                        void (*dtor)(struct page *page);
  
                        dtor = (void (*)(struct page *))page[1].mapping;
                        (*dtor)(page);
                }
                return;
        }
        if (!PageReserved(page) && put_page_testzero(page))
               __page_cache_release(page);
} 
Please note that if the count is 0, the page will be released and recycled to the free-page pool. 

At the time when sg driver is ready to free its allocated pages by invoking free_pages(), the sub-pages is already re-used by someone else. We will get ‘’’Bad page kernel expeption’’ such as the following 
Bad page state at __free_pages_ok (in process 'java', page 000001007fb89b30)
flags:0x0100103c mapping:0000010075a4eaf0 mapcount:0 count:2
Backtrace:
Call Trace:<ffffffff8015d37f>{bad_page+112} <ffffffff8015d713>{__free_pages_ok+154} 
      <ffffffffa01d9fa5>{:sg:sg_remove_scat+276} <ffffffffa01da13e> {:sg:sg_finish_rem_req+238} 
      <ffffffffa01da56a>{:sg:sg_new_read+1050} <ffffffffa01dcb48>{:sg:sg_ioctl+929} 
      <ffffffff8030a0f5>{thread_return+0} <ffffffff801d42e6>{selinux_file_ioctl+711} 
      <ffffffff8030ab88>{schedule_timeout+224} <ffffffff8016bfb6>{find_extend_vma+22} 
      <ffffffff8014c6b0>{unqueue_me+138} <ffffffff8014c8ce>{do_futex+441} 
      <ffffffff80135752>{autoremove_wake_function+0} <ffffffff80135752>{autoremove_wake_function+0} 
      <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8012a122>{sg_ioctl_trans+832} 
      <ffffffff8019e8ac>{compat_sys_ioctl+235} <ffffffff80125bbb>{sysenter_do_call+27}
In the above exception, the page with page address 000001007fb89b30 has been used by block IOs with active count 2 and memory mapped. Because the sg driver tries to free a page that is mapped and active, the kernel need a hard-panic to recover the Bad page problem. 



Version-Release number of selected component (if applicable):
 2.6.9-42.EL

How reproducible:
Always


Steps to Reproduce:
1.find a sg device node whose scsi-device is provided by iscsi-sfnet driver. Say /dev/sg10
2.sg_dd of=/dev/sg10 if=/dev/zero count=65


Actual Results:
Expect 65 blocks being written to the iscsi device

Expected Results:
kernel panic

Additional info:

Comment 1 yanling.qi@lsi.com 2007-05-08 15:13:50 UTC
This bug is releated to Bug 239447.

Comment 2 Mike Christie 2007-05-08 16:07:05 UTC
I am taking this bz. We need to work out some issues upstream first I think.

Comment 3 Mike Christie 2007-05-08 16:08:21 UTC

*** This bug has been marked as a duplicate of 186008 ***


Note You need to log in before you can comment on or make changes to this bug.