Bug 1215618

Summary: Unhelpful error message on Power when SMT is enabled
Product: Red Hat Enterprise Linux 7 Reporter: Andrea Bolognani <abologna>
Component: qemu-kvm-rhevAssignee: Laurent Vivier <lvivier>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.1CC: dgibson, facundo.garat, knoel, lvivier, michen, mrezanin, ngu, qzhang, rbalakri, sraje, thuth, virt-maint, xuhan, xuma, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.3.0-19.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-04 16:40:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrea Bolognani 2015-04-27 10:19:54 UTC
Description of problem:

  SMT must be turned off on Power for KVM to work, but if it happens
  to be turned on KVM doesn't start and the error message displayed
  is not very helpful.


Version-Release number of selected component (if applicable):

  qemu-kvm-rhev-2.2.0-8.el7.ppc64


How reproducible:

  Always.


Steps to Reproduce:

  1. Turn on SMT with

       # ppc64_cpu --smt=on

  2. Run KVM with

       # /usr/libexec/qemu-kvm -nographic


Actual results:

  # /usr/libexec/qemu-kvm -nographic
  error: kvm run failed Device or resource busy
  NIP 0000000000000100   LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000
  MSR 0000000000000000 HID0 0000000000000000  HF 0000000000000000 idx 1
  TB 00000000 00000000 DECR 00000000
  GPR00 0000000000000000 0000000000000000 0000000000000000 0000000007fb0000
  GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  CR 00000000  [ -  -  -  -  -  -  -  -  ]             RES ffffffffffffffff
  FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  FPSCR 0000000000000000
   SRR0 0000000000000000  SRR1 0000000000000000    PVR 00000000004b0201 VRSAVE 0000000000000000
  SPRG0 0000000000000000 SPRG1 0000000000000000  SPRG2 0000000000000000  SPRG3 0000000000000000
  SPRG4 0000000000000000 SPRG5 0000000000000000  SPRG6 0000000000000000  SPRG7 0000000000000000
   CFAR 0000000000000000
   SDR1 0000000000000002   DAR 0000000000000000  DSISR 0000000000000000


Expected results:

  An helpful error message (something along the lines of "SMT must be
  turned off") is displayed.


Additional info:

  None.

Comment 2 Facundo Garat 2015-05-17 00:34:13 UTC
Hi, hope this provide more information:

Setup:
  - RHEV-M 3.5 with RHEL 7
  - RHEV-H 3.4.4 on Power8  - RHEV for IBM POWER release 3.4.4 build 31 service (pkvm2_1_1)
  - SMT doesn't start enable by default, i had to add smt-enabled=on in grub

Problem:
if ppc64_cpu --smt=off machines boots without any issues.
if ppc64_cpu --smt=on machines on RHEV doesn't start with this error messages:
2015-05-16 21:27:20,850 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-68) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM ubuntu1504-1 is down with error. Exit message: Lost connection with qemu process.

or with this error message:
2015-05-16 21:26:38,506 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-14) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM ubuntu1504-1 is
 down with error. Exit message: Domain not found: no domain with matching uuid '997670ed-4c06-4369-b4c7-b50bc80afcf4' (ubuntu1504-1).

Comment 3 David Gibson 2015-05-25 04:05:01 UTC
I'm confused by comment 2.  In the initial description you give a version of the qemu-kvm-rhev package which exists in the RHEL based hypervisor.  In comment 2 you say it's using an IBM PowerKVM based hypervisor.

Laurent,

I saw that you posted a patch for this upstream.  Can you add a mailing list archive link to this BZ?

Comment 4 Andrea Bolognani 2015-05-25 06:26:51 UTC
The bug was reported by me and the setup described is that of one of our own p8 servers, while in comment 2 Facundo is describing his own setup. So that's two different people describing two different setups :)

Hope this clears things up.

Comment 5 Laurent Vivier 2015-06-02 11:32:28 UTC
David, the link to the ML archive is:

https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg03570.html

Comment 6 Miroslav Rezanina 2015-08-24 15:08:54 UTC
Fix included in qemu-kvm-rhev-2.3.0-19.el7

Comment 8 Xujun Ma 2015-08-26 03:54:54 UTC
Reproduced the issue on old version:

Version-Release number of selected component (if applicable):
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-1.ael7b.ppc64le.rpm

Steps to Reproduce:
1. Turn on SMT with
  # ppc64_cpu --smt=on

2. Run KVM with
  # /usr/libexec/qemu-kvm -nographic

Results:
[root@ibm-p8-rhevm-11 ~]# /usr/libexec/qemu-kvm -nographic
error: kvm run failed Device or resource busy
NIP 0000000000000100   LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000 CPU#0
MSR 0000000000000000 HID0 0000000000000000  HF 0000000000000000 idx 1
TB 00000000 00000000 DECR 00000000
GPR00 0000000000000000 0000000000000000 0000000000000000 0000000007fb0000
GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
CR 00000000  [ -  -  -  -  -  -  -  -  ]             RES ffffffffffffffff
FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPSCR 0000000000000000
 SRR0 0000000000000000  SRR1 0000000000000000    PVR 00000000004b0201 VRSAVE 0000000000000000
SPRG0 0000000000000000 SPRG1 0000000000000000  SPRG2 0000000000000000  SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000  SPRG6 0000000000000000  SPRG7 0000000000000000
 CFAR 0000000000000000
 SDR1 0000000000000002   DAR 0000000000000000  DSISR 0000000000000000

Verified the issue on the latest version:

Version-Release number of selected component (if applicable):
Qemu-kvm-rhev: qemu-img-rhev-2.3.0-19.el7.ppc64le

Steps to Reproduce:
1. Turn on SMT with
  # ppc64_cpu --smt=on

2. Run KVM with
  # /usr/libexec/qemu-kvm -nographic

Results:
[root@ibm-p8-rhevm-11 ~]# /usr/libexec/qemu-kvm -nographic
error: kvm run failed Device or resource busy
This is probably because your SMT is enabled.
VCPU can only run on primary threads with all secondary threads offline.
NIP 0000000000000100   LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000 CPU#0
MSR 0000000000000000 HID0 0000000000000000  HF 0000000000000000 idx 1
TB 00000000 00000000 DECR 00000000
GPR00 0000000000000000 0000000000000000 0000000000000000 000000001ffb0000
GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
CR 00000000  [ -  -  -  -  -  -  -  -  ]             RES ffffffffffffffff
FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPSCR 0000000000000000
 SRR0 0000000000000000  SRR1 0000000000000000    PVR 00000000004b0201 VRSAVE 0000000000000000
SPRG0 0000000000000000 SPRG1 0000000000000000  SPRG2 0000000000000000  SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000  SPRG6 0000000000000000  SPRG7 0000000000000000
 CFAR 0000000000000000
 SDR1 0000000000000004   DAR 0000000000000000  DSISR 0000000000000000


There is  an helpful error message  is displayed:This is probably because your SMT is enabled.
VCPU can only run on primary threads with all secondary threads offline.

I think it's more friendly to delete register infomation.Could you help confirm whether we could call this verified pass?

Comment 9 Laurent Vivier 2015-08-26 08:50:28 UTC
Yes, this is verified. Register information cannot be removed because it is generated at a different level of QEMU.

Comment 17 errata-xmlrpc 2015-12-04 16:40:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html