Bug 480317 - guest reports repeatedly ATA error
guest reports repeatedly ATA error
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Michal Novotny
Virtualization Bugs
:
: 526662 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-16 08:35 EST by Karel Volný
Modified: 2014-02-02 17:36 EST (History)
18 users (show)

See Also:
Fixed In Version: xen-3.0.3-102.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 04:59:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
bootscreen (8.71 KB, image/png)
2009-02-20 07:53 EST, Karel Volný
no flags Details
Qemu Libata fix (887 bytes, patch)
2009-11-26 09:47 EST, Michal Novotny
no flags Details | Diff

  None (edit)
Description Karel Volný 2009-01-16 08:35:00 EST
Description of problem:
I tried to install Fedora 10 as a guest on RHEL-5 host. The installation process got frozen at the end, which I believe to be a consequence of this error (as it was the bootloader installation phase). Despite that, I am able to boot the guest system, but I am getting a lot of ATA errors.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-128.el5
xen-3.0.3-80.el5

How reproducible:
always

Steps to Reproduce:
1. (run xen enabled system)
2. virt-install -n F10 -r 512 -f F10.img -s 10 --vnc --hvm -c ./boot.iso
3. perform the default installation
4. reboot the guest
  
Actual results:
the guest console is flooded with repetitions of the following error message:

ata2: soft resetting link
ata2.00: configured for MWDMA2
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         cdb 1e 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
         res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
ata2.00: status: { DRDY ERR }
ata2: soft resetting link


Expected results:
no errors occur

Additional info:
the physical hardware is Dell Precision 490

what I find strange is that the virt-manager reports the virtual disk drive as IDE (hda) while during the guest installation, it was detected as sda
Comment 1 Karel Volný 2009-02-20 07:53:56 EST
Created attachment 332698 [details]
bootscreen

I have experienced the same problem, installation freezing at the end, also with recent Rawhide

unfortunately, after rebooting the guest, I am unable to boot it, see the screenshot - pay attention also to the reported hard drive size
Comment 2 Karel Volný 2009-02-20 07:57:28 EST
I forgot to mention that the virtual guest at the screenshot uses disk partition instead of image file as the harddrive device
Comment 4 Sergey Tuchkin 2009-03-02 07:23:38 EST
Reproduced on FC10 guest with an image file as harddrive divice hda
The host is Scienfific Linux 5.2 x86_64, xen-3.0.3-64.el5_2.9.x86_64
Comment 5 Chris Lalancette 2009-03-02 09:27:46 EST
Can you try passing "clocksource=acpi_pm" to the guest kernel before you boot it, and see if that makes a difference?  There is a bug in F-10 having to do with paravirtualized clocks, and I'm wondering if this is another instance of it.

I'm also going to change the component to "xen" for the time being; this is either a bug in the guest emulation (i.e. xen), or it's a bug in the guest kernel (in which case we would move it to F-10 kernel).  But it's definitely not python-virtinst's problem.

Chris Lalancette
Comment 6 Sergey Tuchkin 2009-03-02 09:54:17 EST
Yes, I tried, but it didn't help - I see the same ata2 errors in dmesg output:

[root@fc10 ~]# cat /proc/cmdline 
ro root=/dev/VolGroup00/LogVol00 rhgb quiet clocksource=acpi_pm
[root@fc10 ~]# dmesg|tail
ata2.00: status: { DRDY ERR }
ata2: soft resetting link
ata2.00: configured for MWDMA2
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
         res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
ata2.00: status: { DRDY ERR }
ata2: soft resetting link
[root@fc10 ~]# uname -a
Linux fc10.xen.home 2.6.27.15-170.2.24.fc10.x86_64 #1 SMP Wed Feb 11 23:14:31 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

And I'm agree that python-virtinst is not the source of this problem
Comment 7 Karel Volný 2009-03-05 05:33:19 EST
(In reply to comment #5)
> Can you try passing "clocksource=acpi_pm" to the guest kernel before you
> boot it, and see if that makes a difference?

the same for me, passing this option does not help

tried using Rawhide, kernel 2.6.29-0.179.rc6.git5.fc11.x86_64
Comment 8 Michal Novotny 2009-06-09 07:36:18 EDT
Karel: What about appending domain configuration file to this BZ ?

Sergey: I have ran into this issue using xen-3.0.3-87.el5 RPMs with kernel-xen-2.6.18-146.el5xen too. This may be kernel-xen problem as well as IOEMU problem and most definitely not python-virtinst problem because this is done even when VM is installed. I tried F10 i386 FV guest...
Comment 9 Michal Novotny 2009-06-09 11:08:15 EDT
I've been poking about this in IOEMU code but no luck since but it may be some kernel thing because I found some information on fedora-kernel-list at:

http://www.mail-archive.com/fedora-kernel-list@redhat.com/msg00087.html

May be related to this 2.6.20+ kernels but not "pci=nomsi" because this is not working either. Maybe some kernel issue.

Michal
Comment 10 Sam Wilson 2009-06-15 03:10:40 EDT
I have been running into this issue as well however it may still be related to lib-virt somehow as once I created a disk(secondary) in virt-manager set to "SCSI Disk" there were no errors when trying to access this disk where there is a stream  of soft resetting link errors when accessing the IDE created device (which shows as /dev/sda1).

Sam.
Comment 11 Michal Novotny 2009-10-09 08:58:38 EDT
Hi Sam,
well, you're talking about libvirt relations or something like that. I don't think it's the issue but for clarification, could you provide us your libvirt version and exact steps you did to see and not to see those errors?

Thanks,
Michal
Comment 12 Jeff Bastian 2009-10-09 12:20:30 EDT
I'm also hitting this error installing early builds of RHEL 6.0 on a RHEL 5.4 host with 
   kernel-xen-2.6.18-164.el5
   libvirt-0.6.3-20.1.el5_4
   libvirt-python-0.6.3-20.1.el5_4
   python-virtinst-0.400.3-5.el5
   virt-manager-0.6.1-8.el5
   xen-3.0.3-94.el5
   xen-libs-3.0.3-94.el5

I started the RHEL 6 install with virt-install:
  virt-install -n rhel6 -r 512 --vcpus=1 -f /var/lib/xen/images/rhel6 \
    -b xenbr0 --vnc --noautoconsole -v --os-type=linux --os-variant=fedora11 \
    -c /tmp/rhel6/boot.iso

Note that I used an OS variant of fedora11 since rhel6 is not listed yet for virt-install.

On the first boot after installation it spit out hundreds of these errors, but it eventually booted all the way.

This thread implies this is fixed upstream:
  http://www.mail-archive.com/linux-ide@vger.kernel.org/msg14513.html
Comment 14 Andrew Jones 2009-10-12 04:31:13 EDT
*** Bug 526662 has been marked as a duplicate of this bug. ***
Comment 16 Paolo Bonzini 2009-11-25 10:45:43 EST
Upstream patch is here: http://www.mail-archive.com/qemu-devel@nongnu.org/msg11844.html

The backport to Xen's qemu is almost trivial.
Comment 17 Michal Novotny 2009-11-26 07:55:47 EST
(In reply to comment #16)
> Upstream patch is here:
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg11844.html
> 
> The backport to Xen's qemu is almost trivial.  

Thanks for pointing this out. I'll backport this one ...

Michal
Comment 18 Michal Novotny 2009-11-26 09:47:32 EST
Created attachment 374020 [details]
Qemu Libata fix

Well, I have this backported but I am unable to reproduce it even with Fedora 10 and Fedora 12 x86_64... This is the patch but could somebody tell me how to reproduce it since I am unable to reproduce it?

Michal
Comment 20 Karel Volný 2009-11-26 11:17:28 EST
(In reply to comment #18)
> Well, I have this backported but I am unable to reproduce it even with Fedora
> 10 and Fedora 12 x86_64...

could that be it is somehow hardware dependent?

(unfortunately, I can't reinstall my machine to RHEL-5 right now to try)
Comment 21 Andrew Jones 2009-11-26 11:34:37 EST
When I boot a RHEL-6 64b fv guest with xen -100 I get tons of and tons of ata
errors on the console. After applying the patch in comment 18 I don't get those
errors to the console anymore, but dmesg still shows a few of these.

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: BMDMA stat 0x5
ata2.00: cmd a0/01:00:00:80:00/00:00:00:00:00/a0 tag 0 dma 16512 in
         cdb 5a 00 2a 00 00 00 00 00  80 00 00 00 00 00 00 00
         res 48/20:02:00:1c:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link

The same results for f11 (2.6.30.9-96).
Comment 22 Michal Novotny 2009-11-26 17:31:06 EST
(In reply to comment #21)
> When I boot a RHEL-6 64b fv guest with xen -100 I get tons of and tons of ata
> errors on the console. After applying the patch in comment 18 I don't get those
> errors to the console anymore, but dmesg still shows a few of these.
> 
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x5
> ata2.00: cmd a0/01:00:00:80:00/00:00:00:00:00/a0 tag 0 dma 16512 in
>          cdb 5a 00 2a 00 00 00 00 00  80 00 00 00 00 00 00 00
>          res 48/20:02:00:1c:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> 
> The same results for f11 (2.6.30.9-96).  

Well, maybe the upstream qemu patch does this because when an error is here it's not showing other HSM violations so it's showing just few of them. So did this improve the situation?

Michal
Comment 23 Andrew Jones 2009-11-27 05:28:44 EST
Now that I look again closer, the error I reported in comment 21 is different than originally reported error in this bug. I have DRDY DRQ and the original report was for DRDY ERR. It looks like the proposed patch does eliminate the DRDY ERRs. So the DRDY DRQ errors are something else and deserve a different bug.
Comment 24 Andrew Jones 2009-11-27 05:46:18 EST
Ok, I just backedup and doubled checked without the patch. The error I have continuously output to the console is Emask 0x2 { DRDY DRQ ERR }. So I never reproduced exactly the same thing as the originator. This may not make a difference, but should maybe be investigated. I'll sort it out and open a new bug for it if necessary.

As far as this bug goes, I believe the patch works. When not violating the HSM when avoid getting constant exceptions.
Comment 26 Ioannis Aslanidis 2009-11-30 07:03:25 EST
I still see the same errors under a fully-virtualized environment:

Linux fedora-11-64 2.6.30.9-96.fc11.x86_64 #1 SMP Wed Nov 4 00:02:04 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

{{{
 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
         cdb 4a 01 00 00 10 00 00 00  08 00 00 00 00 00 00 00
         res 41/50:03:00:08:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
 ata2.00: status: { DRDY ERR }
 ata2: soft resetting link
 ata2.00: configured for MWDMA2
 ata2: EH complete
}}}

Apart from that, the guest tends to hand every few days.
Comment 27 Michal Novotny 2009-12-07 05:32:13 EST
(In reply to comment #26)
> I still see the same errors under a fully-virtualized environment:
> 
> Linux fedora-11-64 2.6.30.9-96.fc11.x86_64 #1 SMP Wed Nov 4 00:02:04 EST 2009
> x86_64 x86_64 x86_64 GNU/Linux
> 
> {{{
>  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>  ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
>          cdb 4a 01 00 00 10 00 00 00  08 00 00 00 00 00 00 00
>          res 41/50:03:00:08:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
>  ata2.00: status: { DRDY ERR }
>  ata2: soft resetting link
>  ata2.00: configured for MWDMA2
>  ata2: EH complete
> }}}
> 
> Apart from that, the guest tends to hand every few days.  

Well, this maybe kernel related... Does it do with older/newer kernels?

Michal
Comment 28 Ioannis Aslanidis 2009-12-07 05:51:53 EST
Seems to be doing it with all fedora 11 kernels I tried, including the last one. It may be related to bug #543947
Comment 29 Michal Novotny 2009-12-07 06:19:49 EST
(In reply to comment #28)
> Seems to be doing it with all fedora 11 kernels I tried, including the last
> one. It may be related to bug #543947  

Well, I can't claim I understand that stuff well but could you also try with F10 or F12 kernels? If this is no issue on F10 and F12 kernels, it may be related to bug you wrote above...

Michal
Comment 30 Ioannis Aslanidis 2009-12-07 06:50:45 EST
I can tell you for sure that it does not happen with Fedora 9 kernels. I did not try with Fedora 12 or Fedora 10.
Comment 31 Ioannis Aslanidis 2009-12-10 04:41:23 EST
Any updates on this?
Comment 32 Michal Novotny 2009-12-10 11:10:33 EST
Well, I did some testing with Fedora 8, 9 and Fedora 10 kernels (all 32 bit, i386, guests) just to be sure and this problem didn't occur on those guests but DRDY DRQ messages are here in dmesg output but not DRDY DRQ ERR ones. It seems like it's related to BZ #543947. Also, I've not been able to install Fedora 12 again - there were some errors - we need to be sure...

Michal
Comment 33 Michal Novotny 2009-12-10 12:29:31 EST
Well, I managed to install Fedora 12, 32-bit guest and I saw no DRDY DRQ ERR errors, only DRDY DRQ messages so it seems the problem is really related to bug #543947 because I saw no such issue on other guest than Fedora 11.

Michal
Comment 43 Alan Lehman 2009-12-25 11:30:33 EST
I am seeing this problem with Fedora 12 fully virtualized guest on RHEL 5.3.

kernel-2.6.18-164.6.1.el5xen
xen-3.0.3-94.el5_4.2

This string of errors is logged every few seconds whenever the guest is up:

Dec 24 19:05:24 web1 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 24 19:05:24 web1 kernel: ata2.00: ST_FIRST: DRQ=1 with device error, dev_stat 0x49
Dec 24 19:05:24 web1 kernel: ata2.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in
Dec 24 19:05:24 web1 kernel:         cdb 12 00 00 00 24 00 00 00  00 00 00 00 00 00 00 00
Dec 24 19:05:24 web1 kernel:         res 49/20:01:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
Dec 24 19:05:24 web1 kernel: ata2.00: status: { DRDY DRQ ERR }
Dec 24 19:05:24 web1 kernel: ata2: soft resetting link
Dec 24 19:05:25 web1 kernel: ata2.00: configured for MWDMA2
Dec 24 19:05:25 web1 kernel: ata2: EH complete
Comment 44 Alan Lehman 2009-12-25 11:55:08 EST
A little more info on my post above:

host hardware: Proliant DL365 Opteron 
I tried clocksource=acpi_pm, but it made no difference.

guest: 2.6.31.6-166.fc12.x86_64
Comment 45 Paolo Bonzini 2009-12-29 07:11:47 EST
Alan, packages that fix this bug will be available shortly.
Comment 53 XinSun 2010-01-04 03:16:12 EST
According to Comment #52, check this bug on xen-3.0.3-102.el5 and rhel5.4  for x86_64 and i386 platform:
1.(run xen enabled system)
2. virt-install -n F10 -r 512 -f F10.img -s 10 --vnc --hvm -c
/root/Fedora-10-i386-DVD.iso
3. perform the default installation
4. reboot the guest
5. dmesg | grep ata2

After step5, I get follow results:

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: BMDMA stat 0x5
ata2.00: cmd a0/01:00:00:80:00/00:00:00:00:00/a0 tag 0 dma 16512 in
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link
ata2.00: configured for MWDMA2
ata2: EH complete

These ata2 message is about  {DRDY DRQ}, not the original {DRDY ERR}. So this bug is fixed on xen-3.0.3-102.el5 and change bug's status to verified.
Comment 55 errata-xmlrpc 2010-03-30 04:59:22 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0294.html
Comment 56 Paolo Bonzini 2010-04-08 11:44:28 EDT
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).

Note You need to log in before you can comment on or make changes to this bug.