Bug 618862

Summary: Driver Update Disks containing updated/replacement drivers do not function
Product: Red Hat Enterprise Linux 6 Reporter: Jon Masters <jcm>
Component: anacondaAssignee: Martin Sivák <msivak>
Status: CLOSED CURRENTRELEASE QA Contact: Release Test Team <release-test-team-automation>
Severity: medium Docs Contact:
Priority: urgent    
Version: 6.0CC: chyang, dcantrell, ddumas, emcnabb, haruo.tomita, jcm, john.cooper, ltroan, martinez, martin.wilck, qcai, rwilliam, syeghiay
Target Milestone: rcKeywords: Regression
Target Release: 6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: anaconda-13.21.78-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 19:52:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 253443    
Attachments:
Description Flags
/tmp/ dir loglevel=debug
none
/tmp/ tarball with loglevel=7
none
/tmp/ dir without loglevel specified.
none
/tmp/ tarball with loglevel=debug
none
console output from test session showing driver was loaded but not noticed
none
2.6.32-59.1 x86_64 driver disk containing sym53c8xx version sym-2.2.3.rhtest60s10
none
2.6.32-59.1 x86_64 driver disk containing virtio_blk from 2.6.32-61
none
console output from test session showing driver was loaded but not noticed (virtio_blk)
none
2.6.32-59.1 x86_64 driver disk containing virtio_blk from 2.6.32-61
none
2.6.32-63 x86_64 driver disk containing sym53c8xx version sym-2.2.3.rhtest60s10
none
2.6.32-63 x86_64 driver disk containing virtio_blk from 2.6.32-63
none
anaconda log file from aborted test installation with 20100815 x86_64 Server nightly compose
none
Screenshot of the installed system with fixed anaconda none

Description Jon Masters 2010-07-27 21:37:28 UTC
There are several problems with the current RHEL6 implementation of Driver
Update Disks (DUDs) when replacing an existing in-kernel RHEL6 driver (this is
the common case actually, in spite of it seeming like it would not be). The
problem manifests as follows (the example driver is a sym53c8xx SCSI driver for the KVM SCSI disk device that we test against - example disk attached, though you will need to rebuild this if the kernel changes from 2.6.32-52 as we have not yet fully frozen the kernel ABI versions in RHEL6 and might break compatibility with Driver Update Disks that are not rebuilt - I can rebuild, as can cyang as another point of contact for you):

1). Anaconda starts up and loads drivers for certain devices.
        - can confirm this in shell under e.g. /sys/module/sym53c8xx/version
2). A Driver Update Disk is inserted containing a replacement driver.
3). Anaconda correctly unloads the existing drivers, and has udev reload all
    of them, including the newly updated Driver Update.
        - can confirm that the version of sym53c8xx changes, for example
4). Anaconda then fails to realize anything changed, presumably because no
    additional devices were detected other than what was there before.
5). Anaconda then prompts for manual driver selection.
        - not clear if it tries to load the driver again here because the
          correct one is already loaded, but it probably does this right.
6). Anaconda then says it cannot find any e.g. storage devices in the example,
    even though the correct driver is loaded, and fdisk on a shell works.
        - not entirely sure why, but presume this is because it does not
          notice that the existing storage device is still there or does
          not re-scan devices correctly after loading the Driver Update Disk.

Note. It is correct for Anaconda to unload device drivers and reload all of
them. We want the behavior that nearly works already because it allows us to
replace any driver that might already have been loaded. We just need Anaconda
to notice that a driver it came with was the one that changed. Here are some
notes on some pseudo-code suggestions for implementing some fixes to this.

2 points in install loop
        1). Before loading driver disk.
                - For each directory in /sys/module (for example "sym53c8xx")
                - Store the version from e.g. /sys/module/sym53c8xx/version
                - One version file per driver (not all drivers have a version
                  and this version is not always incremented when changed)
                - Must handle lack of a version file without failure.
                - There should not be more than 100 modules loaded in general
                  if that aids sizing any data structures that you need.
        2). After loading driver disk
                - If new devices appeared/existing behavior, then proceed.
                - If no new devices appeared, re-read module version data and
                  compare with previously stored versions. If a change was
                  detected, then the driver has been updated so no need to
                  prompt the user for manual driver selection
                - If no version changes are detected, prompt the user anyway
                  and they would need to choose the updated driver manually.
                  This should only be the case for "bad" drivers without
                  correct version data and the vendor can fix that.
                - But in any case, if the driver chosen by the user manually is
                  loaded (file in /sys/module/<drivername>) and there are no
                  device changes, then you can still assume it replaced one.

There should be no error out unless no driver was updated, the user was asked
to choose one manually, no new devices appeared, *and* that driver was not
loaded correctly even at that point. Only error out if nothing changed, no
devices appeared, and even the manual driver selected does not show up as a
correctly loaded driver. Sure, you can't tell for sure that manual selection
and loading caused the driver to do something, but as long as the devices
are present for Anaconda to proceed, it should at least continue.

Jon.

Comment 2 Chris Lumens 2010-07-29 17:10:58 UTC
Can you add loglevel=debug, run through the driver disk loading code, get dumped back to the "No driver found" dialog, press ctrl-Z to get dropped to a shell, and attach all the log files out of /tmp to this bug report?

Comment 4 Chao Yang 2010-07-30 06:09:18 UTC
Created attachment 435471 [details]
/tmp/ dir loglevel=debug

Comment 5 Chao Yang 2010-07-30 07:00:54 UTC
Created attachment 435480 [details]
/tmp/ tarball with loglevel=7

Comment 6 Chao Yang 2010-07-30 07:03:02 UTC
Created attachment 435481 [details]
/tmp/ dir without loglevel specified.

Comment 7 Chao Yang 2010-07-30 07:54:54 UTC
Created attachment 435490 [details]
/tmp/ tarball with loglevel=debug

Comment 8 Chao Yang 2010-07-30 08:15:33 UTC
Hi Chirs,

I have attached /tmp/ tarball, with and without loglevel specified. I found something: Whether the loglevel parameter is added or not, the anaconda.log always have the DEBUG log output. Any idea on this?

Best,
Chao

Comment 9 Chris Lumens 2010-07-30 14:15:11 UTC
We always record to the files at DEBUG, but change what gets logged to the console.  That way we can see everything in bug reports but users don't get potentially confusing debugging messages.  In other words, don't worry about it.

Comment 10 Chris Lumens 2010-07-30 14:23:16 UTC
Can you reproduce this problem with a module other than sym53c8xx.ko?  If you look at the syslog, you'll see the following:

<4>WARNING: at /builddir/build/BUILD/kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/include/asm/dma-mapping.h:154 ___free_dma_mem_cluster+0x102/0x110 [sym53c8xx]() (Not tainted)
<4>Hardware name: KVM
<4>Modules linked in: jbd mbcache cdrom crc_t10dif sym53c8xx(-) scsi_transport_spi virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix ipv6 iscsi_ibft pcspkr edd floppy iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs [last unloaded: sd_mod]
<4>Pid: 359, comm: rmmod Not tainted 2.6.32-52.el6.x86_64 #1
<4>Call Trace:
<4> [<ffffffff8106b473>] warn_slowpath_common+0x83/0xc0
<4> [<ffffffff8106b4c4>] warn_slowpath_null+0x14/0x20
<4> [<ffffffffa0138af2>] ___free_dma_mem_cluster+0x102/0x110 [sym53c8xx]
<4> [<ffffffffa01388d2>] __sym_mfree+0xd2/0x100 [sym53c8xx]
<4> [<ffffffffa0138969>] __sym_mfree_dma+0x69/0xf0 [sym53c8xx]
<4> [<ffffffffa01310d5>] sym_hcb_free+0x65/0x1f0 [sym53c8xx]
<4> [<ffffffffa012e936>] sym_free_resources+0x46/0x90 [sym53c8xx]
<4> [<ffffffffa012ea33>] sym_detach+0xb3/0xe0 [sym53c8xx]
<4> [<ffffffffa012ea9b>] sym2_remove+0x3b/0x60 [sym53c8xx]
<4> [<ffffffff81272be7>] pci_device_remove+0x37/0x70
<4> [<ffffffff81332c1f>] __device_release_driver+0x6f/0xe0
<4> [<ffffffff81332d58>] driver_detach+0xc8/0xd0
<4> [<ffffffff81331abe>] bus_remove_driver+0x8e/0x110
<4> [<ffffffff81333522>] driver_unregister+0x62/0xa0
<4> [<ffffffff81272ef4>] pci_unregister_driver+0x44/0xb0
<4> [<ffffffffa0139975>] sym2_exit+0x15/0x47 [sym53c8xx]
<4> [<ffffffff810acbe6>] sys_delete_module+0x1a6/0x280
<4> [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
<4>---[ end trace 9955f74768a9ca36 ]---

I wonder if the end result of that trace is that the rmmod of the original sym53c8xx.ko from the initrd fails, so the module doesn't get unloaded, so we don't even try to load the new one.

Comment 11 Denise Dumas 2010-08-03 21:22:26 UTC
Chao Yang, have you had a chance to try this problem with a module other than sym53c8xx.ko?

Comment 12 Chao Yang 2010-08-04 05:49:05 UTC
Hi Denise,

Sure. I will try another driver, and let you know ASAP.

Best,
Chao

Comment 13 Jon Masters 2010-08-06 17:12:46 UTC
Let's get this tested today. I'll spin another Driver Disk shortly with a SATA or virtIO or other storage driver in it.

Jon.

Comment 19 Jon Masters 2010-08-10 04:30:05 UTC
Created attachment 437759 [details]
console output from test session showing driver was loaded but not noticed

Comment 22 Jon Masters 2010-08-10 04:55:33 UTC
Created attachment 437762 [details]
2.6.32-59.1 x86_64 driver disk containing sym53c8xx version sym-2.2.3.rhtest60s10

Comment 23 Jon Masters 2010-08-10 05:34:35 UTC
Created attachment 437768 [details]
2.6.32-59.1 x86_64 driver disk containing virtio_blk from 2.6.32-61

Comment 24 Jon Masters 2010-08-10 05:37:27 UTC
Exactly the same behavior with virtio_blk. The updated driver is loaded (as witnessed by the obvious kernel debug message I inserted), but Anaconda behaves exactly as before. I have attached two driver disks for two different drivers - the virtio one will work out of the box with KVM, the SCSI one only requires you swap the virtio for a SCSI device - and think this is Anaconda's again issue now. I should have some more time in a couple of days, I have other stuff aswell.

Jon.

Comment 25 Jon Masters 2010-08-10 05:57:14 UTC
Created attachment 437774 [details]
console output from test session showing driver was loaded but not noticed (virtio_blk)

Comment 26 Jon Masters 2010-08-10 05:58:23 UTC
All of this is using 20100809.0 compose, so here you have two drivers - one for SCSI and one for virtio - and both of them load and work, but are not detected by Anaconda. Please keep me in the loop. Thanks!

Comment 27 Jon Masters 2010-08-10 06:24:03 UTC
Comment 25 contains the wrong log file. I am attaching the right one.

NOTE: A bug in virt-manager resulted in the deletion of the SCSI disk and swapping back to virtio still leaving also a SCSI device, which is why the sym53c8xx driver was still loaded and caused a backtrace in that example too. But in the corrected log file you will see there is no kernel warning when unloading the modules. The warning is not directly related to the issue here and the modules do function correctly in any case. Anyway, now we have proof that running with virtio_blk (totally different driver) behaves exactly the same way.

Comment 28 Jon Masters 2010-08-10 06:30:40 UTC
Created attachment 437779 [details]
2.6.32-59.1 x86_64 driver disk containing virtio_blk from 2.6.32-61

Comment 31 David Lehman 2010-08-10 17:06:38 UTC
The attached /tmp tarballs do not include /tmp/storage.log, which is critical.

Comment 32 Jon Masters 2010-08-10 17:10:24 UTC
Dave: can you not trivially reproduce this for yourself now? I've attached two driver disks that work under KVM, the virtio_blk one works "out of the box" with any RHEL6 virtual machine you create in virt-manager...

Jon.

Comment 34 Jon Masters 2010-08-10 17:39:26 UTC
Additionally, in those examples, the kmod was not installed on the target. My guess is Anaconda decides it didn't load anything so shouldn't install it.

Comment 35 Jon Masters 2010-08-16 06:49:48 UTC
The disk is detected and the driver is loaded during install, but the installation aborts with an error due to a python regex type issue:

TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str'
</binding><binding fileName="/tmp/anaconda-tb-qvxy0m" name="pythonUnhandledException" type="text">anaconda 13.21.76 exception report
Traceback (most recent call first):
  File "/usr/lib/anaconda/backend.py", line 77, in copyFirmware
    for f in glob.glob(DD_EXTRACTED+"/lib/firmware/*"):
  File "/usr/lib/anaconda/backend.py", line 98, in doPostInstall
    self.copyFirmware(anaconda)

This likely should have come up in your testing. I am attaching the full log, along with updated copies of the Driver Update Disks built against kernel 2.6.32-63.el6.x86_64. The test installation in this case was performed using the virtio_blk driver on a stock KVM guest with virtio (default) and 20100815 x86_64, minimal server install using text mode. i.e. a completely default KVM guest. To reproduce, create a KVM guest (F13 used here), use the driver disk. Boom.

Jon.

Comment 36 Jon Masters 2010-08-16 06:51:33 UTC
Created attachment 438898 [details]
2.6.32-63 x86_64 driver disk containing sym53c8xx version sym-2.2.3.rhtest60s10

Comment 37 Jon Masters 2010-08-16 06:53:04 UTC
Created attachment 438899 [details]
2.6.32-63 x86_64 driver disk containing virtio_blk from 2.6.32-63

Comment 38 Jon Masters 2010-08-16 06:53:56 UTC
Created attachment 438900 [details]
anaconda log file from aborted test installation with 20100815 x86_64 Server nightly compose

Comment 40 Jon Masters 2010-08-16 07:02:53 UTC
Please let me know when there's another anaconda build to test. Thanks.

Comment 41 Evan McNabb 2010-08-16 15:18:33 UTC
During a default beaker install of RHEL6.0-20100816.n.0_nfs-Server-x86_64, I hit the traceback reported in comment #35. Not sure if this should be opened up in a new BZ or not.

Performing post-installation configuration
anaconda 13.21.76 exception report
Traceback (most recent call first):
  File "/usr/lib/anaconda/backend.py", line 77, in copyFirmware
    for f in glob.glob(DD_EXTRACTED+"/lib/firmware/*"):
  File "/usr/lib/anaconda/backend.py", line 98, in doPostInstall
    self.copyFirmware(anaconda)
  File "/usr/lib/anaconda/yuminstall.py", line 1829, in doPostInstall
    AnacondaBackend.doPostInstall(self, anaconda)
  File "/usr/lib/anaconda/backend.py", line 299, in doPostInstall
    anaconda.backend.doPostInstall(anaconda)
  File "/usr/lib/anaconda/dispatch.py", line 208, in moveStep
    rc = stepFunc(self.anaconda)
  File "/usr/lib/anaconda/dispatch.py", line 126, in gotoNext
    self.moveStep()
  File "/usr/lib/anaconda/cmdline.py", line 187, in run
    anaconda.dispatch.gotoNext()
  File "/usr/bin/anaconda", line 1115, in <module>
    anaconda.intf.run(anaconda)
TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str'

install exited abnormally [1/1] 
The system will be rebooted when you press Ctrl-C or Ctrl-Alt-Delete.

Comment 42 David Cantrell 2010-08-16 19:06:38 UTC
*** Bug 623546 has been marked as a duplicate of this bug. ***

Comment 43 Marizol Martinez 2010-08-16 20:12:36 UTC
Jon -- Looks like an updated anaconda just landed (anaconda-13.21.77-1).

Comment 44 Jon Masters 2010-08-17 07:06:25 UTC
Yes, and it will be tested with the nightly from 20100817 once it's available. Using an "updates" image is nice, but it's not the same as an actual test. We need to know that actual media tests work, all the way through, out of the box :)

Comment 46 Jon Masters 2010-08-17 07:50:22 UTC
TEST RESULTS
------------

The updates image supplied by Martin does get through the install, HOWEVER, the kmod is not installed on the finish system and the updated driver is not used:

$ rpm -qa | grep kmod
<nothing>

This urgently requires fixing. Please investigate why the kmod RPM is not targeted for installation, and perform any necessary testing.

Thanks for initial fix - we're half way there :)

Jon.

Comment 50 Martin Sivák 2010-08-17 12:52:19 UTC
Created attachment 439101 [details]
Screenshot of the installed system with fixed anaconda

Ok, there was another problem with matching the module name to proper provide symbol during the install. That should be fixed now too.

My logs show, that the module got installed.

So expect this to hit anaconda-13.21.78-1 tonight. I can provide updates image before that.

Comment 51 Jon Masters 2010-08-17 17:38:59 UTC
If you're online, please do provide an updates image. Otherwise, I will test the moment a compose is available to do so.

Comment 52 Jon Masters 2010-08-18 13:49:36 UTC
This testing is still blocking on a full 20100818 compose being available. Hopefully in a few hours, since that is being worked on.

Comment 54 Jon Masters 2010-08-19 09:03:14 UTC
Provisional testing of 20100818 shows that driver disks are now working, under virtualized systems, with (in this example) a virtio_blk driver correctly installing, with the RPMs on the virtual disk and so forth. I am able to upgrade the kernel from -63 to -64 and the driver is picked up by the upgraded kernel, which boots correctly with the new driver. HOWEVER: We still need to do some bare metal testing, to be sure.

Comment 55 Radek Vykydal 2010-08-19 09:09:29 UTC
*** Bug 625322 has been marked as a duplicate of this bug. ***

Comment 56 Jon Masters 2010-08-19 09:26:57 UTC
Currently have a graphical install running too.

Comment 57 Jon Masters 2010-08-19 19:28:36 UTC
CONFIRMED with graphical install. Bare metal tests will commence.

Comment 59 Jon Masters 2010-08-30 15:40:04 UTC
My testing is done and I can verify that this is working on bare metal (laptop with replacement network drivers for wired and wireless in this test).

Comment 60 Robert M Williams 2010-08-30 16:37:22 UTC
moving to verified based on comment 59

Comment 61 Jon Masters 2010-08-31 03:51:26 UTC
Thanks

Comment 62 releng-rhel@redhat.com 2010-11-10 19:52:09 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.