Bug 1215618 - Unhelpful error message on Power when SMT is enabled
Summary: Unhelpful error message on Power when SMT is enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: ppc64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-27 10:19 UTC by Andrea Bolognani
Modified: 2019-08-15 04:31 UTC (History)
15 users (show)

Fixed In Version: qemu-kvm-rhev-2.3.0-19.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-04 16:40:54 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2546 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2015-12-04 21:11:56 UTC

Description Andrea Bolognani 2015-04-27 10:19:54 UTC
Description of problem:

  SMT must be turned off on Power for KVM to work, but if it happens
  to be turned on KVM doesn't start and the error message displayed
  is not very helpful.


Version-Release number of selected component (if applicable):

  qemu-kvm-rhev-2.2.0-8.el7.ppc64


How reproducible:

  Always.


Steps to Reproduce:

  1. Turn on SMT with

       # ppc64_cpu --smt=on

  2. Run KVM with

       # /usr/libexec/qemu-kvm -nographic


Actual results:

  # /usr/libexec/qemu-kvm -nographic
  error: kvm run failed Device or resource busy
  NIP 0000000000000100   LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000
  MSR 0000000000000000 HID0 0000000000000000  HF 0000000000000000 idx 1
  TB 00000000 00000000 DECR 00000000
  GPR00 0000000000000000 0000000000000000 0000000000000000 0000000007fb0000
  GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  CR 00000000  [ -  -  -  -  -  -  -  -  ]             RES ffffffffffffffff
  FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPSCR 0000000000000000
   SRR0 0000000000000000  SRR1 0000000000000000    PVR 00000000004b0201 VRSAVE 0000000000000000
  SPRG0 0000000000000000 SPRG1 0000000000000000  SPRG2 0000000000000000  SPRG3 0000000000000000
  SPRG4 0000000000000000 SPRG5 0000000000000000  SPRG6 0000000000000000  SPRG7 0000000000000000
   CFAR 0000000000000000
   SDR1 0000000000000002   DAR 0000000000000000  DSISR 0000000000000000


Expected results:

  An helpful error message (something along the lines of "SMT must be
  turned off") is displayed.


Additional info:

  None.

Comment 2 Facundo Garat 2015-05-17 00:34:13 UTC
Hi, hope this provide more information:

Setup:
  - RHEV-M 3.5 with RHEL 7
  - RHEV-H 3.4.4 on Power8  - RHEV for IBM POWER release 3.4.4 build 31 service (pkvm2_1_1)
  - SMT doesn't start enable by default, i had to add smt-enabled=on in grub

Problem:
if ppc64_cpu --smt=off machines boots without any issues.
if ppc64_cpu --smt=on machines on RHEV doesn't start with this error messages:
2015-05-16 21:27:20,850 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-68) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM ubuntu1504-1 is down with error. Exit message: Lost connection with qemu process.

or with this error message:
2015-05-16 21:26:38,506 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-14) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM ubuntu1504-1 is
 down with error. Exit message: Domain not found: no domain with matching uuid '997670ed-4c06-4369-b4c7-b50bc80afcf4' (ubuntu1504-1).

Comment 3 David Gibson 2015-05-25 04:05:01 UTC
I'm confused by comment 2.  In the initial description you give a version of the qemu-kvm-rhev package which exists in the RHEL based hypervisor.  In comment 2 you say it's using an IBM PowerKVM based hypervisor.

Laurent,

I saw that you posted a patch for this upstream.  Can you add a mailing list archive link to this BZ?

Comment 4 Andrea Bolognani 2015-05-25 06:26:51 UTC
The bug was reported by me and the setup described is that of one of our own p8 servers, while in comment 2 Facundo is describing his own setup. So that's two different people describing two different setups :)

Hope this clears things up.

Comment 5 Laurent Vivier 2015-06-02 11:32:28 UTC
David, the link to the ML archive is:

https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg03570.html

Comment 6 Miroslav Rezanina 2015-08-24 15:08:54 UTC
Fix included in qemu-kvm-rhev-2.3.0-19.el7

Comment 8 Xujun Ma 2015-08-26 03:54:54 UTC
Reproduced the issue on old version:

Version-Release number of selected component (if applicable):
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-1.ael7b.ppc64le.rpm

Steps to Reproduce:
1. Turn on SMT with
  # ppc64_cpu --smt=on

2. Run KVM with
  # /usr/libexec/qemu-kvm -nographic

Results:
[root@ibm-p8-rhevm-11 ~]# /usr/libexec/qemu-kvm -nographic
error: kvm run failed Device or resource busy
NIP 0000000000000100   LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000 CPU#0
MSR 0000000000000000 HID0 0000000000000000  HF 0000000000000000 idx 1
TB 00000000 00000000 DECR 00000000
GPR00 0000000000000000 0000000000000000 0000000000000000 0000000007fb0000
GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
CR 00000000  [ -  -  -  -  -  -  -  -  ]             RES ffffffffffffffff
FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPSCR 0000000000000000
 SRR0 0000000000000000  SRR1 0000000000000000    PVR 00000000004b0201 VRSAVE 0000000000000000
SPRG0 0000000000000000 SPRG1 0000000000000000  SPRG2 0000000000000000  SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000  SPRG6 0000000000000000  SPRG7 0000000000000000
 CFAR 0000000000000000
 SDR1 0000000000000002   DAR 0000000000000000  DSISR 0000000000000000

Verified the issue on the latest version:

Version-Release number of selected component (if applicable):
Qemu-kvm-rhev: qemu-img-rhev-2.3.0-19.el7.ppc64le

Steps to Reproduce:
1. Turn on SMT with
  # ppc64_cpu --smt=on

2. Run KVM with
  # /usr/libexec/qemu-kvm -nographic

Results:
[root@ibm-p8-rhevm-11 ~]# /usr/libexec/qemu-kvm -nographic
error: kvm run failed Device or resource busy
This is probably because your SMT is enabled.
VCPU can only run on primary threads with all secondary threads offline.
NIP 0000000000000100   LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000 CPU#0
MSR 0000000000000000 HID0 0000000000000000  HF 0000000000000000 idx 1
TB 00000000 00000000 DECR 00000000
GPR00 0000000000000000 0000000000000000 0000000000000000 000000001ffb0000
GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
CR 00000000  [ -  -  -  -  -  -  -  -  ]             RES ffffffffffffffff
FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPSCR 0000000000000000
 SRR0 0000000000000000  SRR1 0000000000000000    PVR 00000000004b0201 VRSAVE 0000000000000000
SPRG0 0000000000000000 SPRG1 0000000000000000  SPRG2 0000000000000000  SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000  SPRG6 0000000000000000  SPRG7 0000000000000000
 CFAR 0000000000000000
 SDR1 0000000000000004   DAR 0000000000000000  DSISR 0000000000000000


There is  an helpful error message  is displayed:This is probably because your SMT is enabled.
VCPU can only run on primary threads with all secondary threads offline.

I think it's more friendly to delete register infomation.Could you help confirm whether we could call this verified pass?

Comment 9 Laurent Vivier 2015-08-26 08:50:28 UTC
Yes, this is verified. Register information cannot be removed because it is generated at a different level of QEMU.

Comment 17 errata-xmlrpc 2015-12-04 16:40:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html


Note You need to log in before you can comment on or make changes to this bug.