Bug 184535

Summary: [BETA RHEL4 U3] brokenness in cfq_dispatch_requests
Product: Red Hat Enterprise Linux 4 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, k.georgiou, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:34:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    

Description Issue Tracker 2006-03-09 18:21:47 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2006-03-09 18:21:54 UTC
From User-Agent: XML-RPC

This causes I/O to hang for extremely long amounts of time. That particular fix has been in mainline kernels for some time now. cfq_dispatch_requests() intends to pull a single request off
each process queue, but was actually just pulling requests off the head of
the list.

It's a pretty clear code bug
This event sent from IssueTracker by dmilburn  [Support Engineering Group]
 issue 88208

Comment 3 Issue Tracker 2006-03-09 18:22:14 UTC
From User-Agent: XML-RPC

Patch: 

--- linux-2.6.9-22.0.1.EL/drivers/block/cfq-iosched.c   2004-10-18
14:54:37.000000000
-0700
+++
/build/mfasheh/linux-2.6.9-22.0.1.EL_bio_tracing/drivers/block/cfq-iosched.c
2006-02-07 10:33:34.942588000 -0800
@@ -381,7 +381,7 @@ static int cfq_dispatch_requests(request
 restart:
        good_queues = 0;
        list_for_each_safe(entry, tmp, &cfqd->rr_list) {
-               cfqq = list_entry_cfqq(cfqd->rr_list.next);
+               cfqq = list_entry_cfqq(entry);

                BUG_ON(RB_EMPTY(&cfqq->sort_list));


Status set to: Waiting on Tech

This event sent from IssueTracker by dmilburn  [Support Engineering Group]
 issue 88208

Comment 21 Issue Tracker 2006-03-15 17:17:00 UTC
Greg, I still dont think there is any difference between the existing code
and what you are proposing.

Currently we have:

>>>cfqq = list_entry_cfqq(cfqd->rr_list.next);

And you are proposing changing that to:

>>>cfqq = list_entry_cfqq(entry);

Since list_for_each_safe sets entry to (head)->next, passing entry or
cfqd->rr_list.next
to list_entry_cfqq() would do the same thing.  Right???

/**
 * list_for_each_safe   -       iterate over a list safe against removal of
list entry
 * @pos:        the &struct list_head to use as a loop counter.
 * @n:          another &struct list_head to use as temporary storage
 * @head:       the head for your list.
 */
#define list_for_each_safe(pos, n, head) \\
        for (pos = (head)->next, n = pos->next; pos != (head); \\
                pos = n, n = pos->next)
                                                                           
                                         



This event sent from IssueTracker by martinez 
 issue 88208

Comment 22 Marizol Martinez 2006-03-15 17:20:12 UTC
Update from Oracle:

Following comment by Mark Fasheh, the originator of this patch because he
doesn't have an issuetracker login.

There is a difference, and it's that the second line works and the first
doesn't.

They would clearly _not_ do the same thing.

-               cfqq = list_entry_cfqq(cfqd->rr_list.next);

This line takes the element at the head of the list on _every_ loop
iteration whereas

+               cfqq = list_entry_cfqq(entry);]

takes the element pointed to by "entry". If you read the code you'll see
that the loop clearly intends to service each list element once, whereas
what it actually does is service the head element many times.

__________

Update from Red Hat:

Sunil, you/Mark are correct.  My mistake, I'l get this fixed.

Larry Woodman



Comment 24 Jason Baron 2006-03-16 22:10:58 UTC
*** Bug 182577 has been marked as a duplicate of this bug. ***

Comment 26 Bob Johnson 2006-03-29 18:49:54 UTC
A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 27 Bob Johnson 2006-04-11 17:21:39 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.

Comment 31 Red Hat Bugzilla 2006-08-10 22:34:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html