Bug 632631

Summary: block: fix s390 tape block driver crash that occurs when it switches the IO scheduler
Product: Red Hat Enterprise Linux 6 Reporter: Mike Snitzer <msnitzer>
Component: kernelAssignee: Mike Snitzer <msnitzer>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0CC: bdonahue, brueckner, coughlan, cward, dhoward, jpirko, plyons
Target Milestone: rcKeywords: OtherQA, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the s390 tape block driver crashed whenever it tried to switch the I/O scheduler. With this update, an official in-kernel API (elevator_change()) is used to switch the I/O scheduler safely, thus, the crashes no longer occurs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 12:05:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 641408    
Bug Blocks: 633864    

Description Mike Snitzer 2010-09-10 15:39:02 UTC
Description of problem:
Block layer queue initialization interface changes that were made to allow DM to initialize its request_queue more precisely had the unfortunate side-effect of introducing a panic in the s390 tape block driver because it explicitly switches the IO scheduler that it uses.  Until recently switching the IO scheduler was accomplished in a fragile manner.  Linux now has an official in-kernel API for switch the IO scheduler safely: elevator_change()

This impacts s390's tape block driver which we enable in RHEL6:
config-s390x:CONFIG_S390_TAPE_BLOCK=y

See the following for more detail:
http://lkml.org/lkml/2010/8/16/181

And these upstream commits for the 2 fixes destined for 2.6.36:
http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=5dd531a03ad721b41911dd
http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=52cc2eef31587b22ce9fbe

(this BZ should be a considered for 0day, but I can't seem to set rhel-6.0.z -- the pull down won't pull down...)

Comment 3 Aristeu Rozanski 2010-09-23 19:05:27 UTC
Patch(es) available on kernel-2.6.32-73.el6

Comment 5 Chris Ward 2010-10-20 10:03:49 UTC
Mike, I'm not sure who to direct this too, as i see no partner/customer reference in the bugzilla. You're the reporter.

Here are a few arches to test 0day erratum; please test and post feedback before Wed 27th

http://people.redhat.com/kzhang/kernel-2.6.32-71.4.1.el6.x86_64.rpm
http://people.redhat.com/kzhang/kernel-2.6.32-71.4.1.el6.ppc64.rpm
http://people.redhat.com/kzhang/kernel-2.6.32-71.4.1.el6.s390x.rpm

Thanks. Let me know if you need additional arches.

Comment 6 Mike Snitzer 2010-10-20 13:19:50 UTC
(In reply to comment #5)
> Mike, I'm not sure who to direct this too, as i see no partner/customer
> reference in the bugzilla. You're the reporter.

I'm also the one who fixed it... this was motivated by upstream changes that had a side-effect on s390's tape driver.

> http://people.redhat.com/kzhang/kernel-2.6.32-71.4.1.el6.x86_64.rpm
> http://people.redhat.com/kzhang/kernel-2.6.32-71.4.1.el6.ppc64.rpm
> http://people.redhat.com/kzhang/kernel-2.6.32-71.4.1.el6.s390x.rpm

Provided these kernels also include the fix for bug#641408 then simply loading the tape driver on s390 should suffice for testing.

Unfortunately, it looks like 641408 never progressed toward 0day like it needed too!  This BZ depends on it... grr.

Anyway, I have no idea about s390 tape drivers though.  Likely best to have Hendrik (our IBM onsite s390 person) broker this -- but only once we include the fix for 641408.

Comment 7 Chris Ward 2010-10-20 14:22:26 UTC
Okay, so then #641408 is a TestBlocker, sounds like. No reason to push this to IBM until that bug is added to the kernel and it's re-spun then. If it's not added, should this bug be dropped?

Could you ping Hendrik once (if) all the pieces are in place and ask him to test?

Comment 8 Mike Snitzer 2010-10-20 14:48:45 UTC
(In reply to comment #7)
> Okay, so then #641408 is a TestBlocker, sounds like. No reason to push this to
> IBM until that bug is added to the kernel and it's re-spun then. If it's not
> added, should this bug be dropped?

We need the fix 641408.  I'm not seeing how dropping this bug was ever an option.

Do we just drop bugs?

> Could you ping Hendrik once (if) all the pieces are in place and ask him to
> test?

That really shouldn't be on my plate.  I cc'd Hendrik.  But I'm not QE.

Comment 9 Chris Ward 2010-10-20 15:20:53 UTC
I've been told it's unlikely at this point that this bug will make it into the 6.0.z kernel.

To confirm your opinion, if this bug isn't included in the 6.0.z erratum, 641408 should still be included. eg, there shouldn't be any negative side effects if we include fixes for 641408 but not this bug 632631?

Yes, sometimes we do drop bugs if we feel including them introduces uncontrolled risk. And being unable to test (due to dependency issues, eg) ... introduces huge knowns into the picture which could /possibly/ be more uncontrolled than we would like.

I'm just trying to help clarify how we should proceed, given all the expected pieces aren't where they should be. 

Thanks.

Comment 10 Chris Ward 2010-10-20 15:56:41 UTC
I got my bugs backward. Apologies for confusing folks in comment #9.

641408 isn't approved for 6.0.z at this time. 632631 (this bug) is, as .z 633864

Contact sly with requests to get 641408 included too. mmatouse (eus pm) says it's too late officially, but sly might be able to work magic.

And to ensure I was clear, we generally only consider pulling bugs and respining if there is concern that by including only a partial fix would be risk regressions. According to Tom C, this isn't a concern, in this case.

Comment 11 Martin Prpič 2010-11-11 11:47:35 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, the s390 tape block driver crashed whenever it tried to switch the I/O scheduler. With this update, an official in-kernel API (elevator_change()) is used to switch the I/O scheduler safely, thus, the crashes no longer occurs.

Comment 13 Chris Ward 2011-04-06 11:08:10 UTC
~~ Partners and Customers ~~

This bug was included in RHEL 6.1 Beta. Please confirm the status of this request as soon as possible.

If you're having problems accessing 6.1 bits, are delayed in your test execution or find in testing that the request was not addressed adequately, please let us know.

Thanks!

Comment 14 errata-xmlrpc 2011-05-19 12:05:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html