Bug 1563525
Summary: | Cisco VIC caused VM paused forever | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Chen <cchen> |
Component: | qemu-kvm-rhev | Assignee: | Alex Williamson <alex.williamson> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | xiywang |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | cchen, chayang, juzhang, knoel, michen, pezhang, rbalakri, siliu, virt-maint, xiywang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-11 22:25:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Chen
2018-04-04 05:53:33 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1563524 and https://bugzilla.redhat.com/show_bug.cgi?id=1563525 is same issue? *** Bug 1563524 has been marked as a duplicate of this bug. *** Hi Jun Yi, Sorry I closed 1563524 as duplicate of this one. Best Regards, Chen The device generated an uncorrected AER error as seen in messages: Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0000 Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type =Transaction Layer, id=0420(Receiver ID) Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: device [10b5:8632] error status/mask=00200000/001000 00 Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: [21] ACS Violation (First) Apr 4 13:34:53 rhel7-bare kvm: 5 guests now active Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: AER: Device recovery failed Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0000 Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type =Transaction Layer, id=0420(Receiver ID) Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: device [10b5:8632] error status/mask=00200000/001000 00 Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: [21] ACS Violation (First) Apr 4 13:34:53 rhel7-bare kernel: pcieport 0000:04:04.0: AER: Device recovery failed There are multiple levels of switches used in this system: 00:03.0->{03:00.0->04:04.0}->{06:00.0->07:01.0}->{09:00.0->0a:00.0}->0b:00.0 PLX switch Cisco switch Cisco switch The ACS violation seems to be detected by the downstream port of the PLX switch and forwarded up to the PCIe root port. This is a hardware issue, not a software issue. QEMU will pause the VM for data collection upon receiving an uncorrected AER error. Customer should work with hardware vendors to determine the cause of the violation. The customer case has been closed, can we also close this bz? |