Bug 1257781

Summary:	The prompt is confusing when boot a guest with larger vcpu number than host physical cpu
Product:	Red Hat Enterprise Linux 7	Reporter:	Qunfang Zhang <qzhang>
Component:	qemu-kvm-rhev	Assignee:	David Gibson <dgibson>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.2	CC:	bugproxy, dgibson, drjones, dyuan, dzheng, gsun, hannsj_uhl, knoel, lersek, michen, mrezanin, mtosatti, mzhan, ngu, pbonzini, shuyu, virt-maint, xuhan, xuma, zhengtli, zhwang
Target Milestone:	rc	Keywords:	Reopened
Target Release:	7.2
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-kvm-rhev-2.3.0-22.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-12-04 16:54:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1201513

Description Qunfang Zhang 2015-08-28 02:53:26 UTC

Description of problem:
Currently, the max vcpu supported on power kvm is 160, but on x86 it is 240. We should fix it to match the x86 limitation.

On x86 host:
kernel-3.10.0-308.el7.x86_64
qemu-kvm-rhev-2.3.0-19.el7.x86_64

 # /usr/libexec/qemu-kvm -smp 242 -monitor stdio
 Number of SMP cpus requested (242), exceeds max cpus supported by machine `pc-i440fx-rhel7.2.0' (240)

On power host:
kernel-3.10.0-306.0.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-19.el7.ppc64le

# /usr/libexec/qemu-kvm test.qcow2 -m 512G -smp 161 -monitor stdio
Warning: Number of SMP cpus requested (161) exceeds the recommended cpus supported by KVM (160)
Number of SMP cpus requested (161) exceeds the maximum cpus supported by KVM (160) 

Version-Release number of selected component (if applicable):
kernel-3.10.0-306.0.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-19.el7.ppc64le

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Qunfang Zhang 2015-08-28 03:08:53 UTC

If the fix should be in kernel, please help adjust the component, thanks.

Comment 3 Gu Nini 2015-08-28 08:06:31 UTC

On my host with 152 available cpus, the limit is 152 as follows:

# /usr/libexec/qemu-kvm -name virtioblk-0828-le -m 257024 -smp 160 -monitor stdio
Warning: Number of SMP cpus requested (160) exceeds the recommended cpus supported by KVM (152)
Number of SMP cpus requested (160) exceeds the maximum cpus supported by KVM (152)

So it seems on a power host, the guest vcpu limit is decided by available cpu number in the host instead of a fix value as that on x86. If so, above warning prompt has some syntax problem since it said 'the recommended cpus supported by KVM is **152**, a variable'.

Comment 4 Qunfang Zhang 2015-08-28 09:05:53 UTC

(In reply to Gu Nini from comment #3)
> On my host with 152 available cpus, the limit is 152 as follows:
> 
> # /usr/libexec/qemu-kvm -name virtioblk-0828-le -m 257024 -smp 160 -monitor
> stdio
> Warning: Number of SMP cpus requested (160) exceeds the recommended cpus
> supported by KVM (152)
> Number of SMP cpus requested (160) exceeds the maximum cpus supported by KVM
> (152)
> 
> So it seems on a power host, the guest vcpu limit is decided by available
> cpu number in the host instead of a fix value as that on x86. If so, above
> warning prompt has some syntax problem since it said 'the recommended cpus
> supported by KVM is **152**, a variable'.

Yes, firstly the warning prompt confuses the user and it's not good. And as we have no larger host more than 160 cpu in hand, could not test whether more vcpu (>160) is supported on power.  Waiting for the testing result from IBM guys and thanks a lot!

Comment 5 David Gibson 2015-08-30 23:28:28 UTC

This doesn't look like a bug to me.  qemu is printing a warning message when the number of requested vcpus exceeds the "soft" vCPU limit.  That "soft" limit is equal to the number of host threads, because adding more vCPUS will not perform well.

Since this is a warning and doesn't prevent the guest from starting, I think this is correct behaviour.

Comment 6 IBM Bug Proxy 2015-08-31 02:12:06 UTC

------- Comment From fnovak.com 2015-08-31 02:01 EDT-------
(In reply to comment #5)
> This doesn't look like a bug to me.  qemu is printing a warning message when
> the number of requested vcpus exceeds the "soft" vCPU limit.  That "soft"
> limit is equal to the number of host threads, because adding more vCPUS will
> not perform well.
> Since this is a warning and doesn't prevent the guest from starting, I think
> this is correct behaviour.

Yeah and this is the case across archs.. one should not define a virtual system, larger than the physical system, and certainly the more constraints one puts on it, the worse the likelihood of a happy ending,,,

For experimental purposes, fine, if someone wants to push number of cpus,etc.. to see what boundaries are, but expect to see some warnings..

Comment 7 Qunfang Zhang 2015-08-31 03:26:06 UTC

(In reply to David Gibson from comment #5)
> This doesn't look like a bug to me.  qemu is printing a warning message when
> the number of requested vcpus exceeds the "soft" vCPU limit.  That "soft"
> limit is equal to the number of host threads, because adding more vCPUS will
> not perform well.
> 
> Since this is a warning and doesn't prevent the guest from starting, I think
> this is correct behaviour.

Hi, David

Thanks for the comment. But I still have some questions:

1. The warning confuses user.  Eg, if I boot up a guest with 16 vcpu on a host with 8 physical cpu, qemu prompts something like "KVM supports 8 CPU". (Detail info please check comment 3.

2. On x86 host, we could boot up a guest with 240 vcpu even on a host with only 8 physical CPU. This is useful sometimes, eg, I just want to check the current kvm cpu scalability. But we will not arrange function test with that overcommit configuration.  I think qemu should handle such situation with the same method like x86. What's your opinion? 

I re-open the bug and adjust the bug summary to track the 1.  Please fix me if there's something wrong.

Regards,
Qunfang

Comment 8 David Gibson 2015-08-31 05:14:36 UTC

I misread something in comment 3.  I think the first message:

Warning: Number of SMP cpus requested (160) exceeds the recommended cpus supported by KVM (152)

is correct.  It says starting a guest with this many vCPUS is a bad idea, because it *is* a bad idea (except for certain testing circumstances.

The second error message:

Number of SMP cpus requested (160) exceeds the maximum cpus supported by KVM (152)

is not correct and indicates that qemu thinks the hard cpu limit has been exceeded.


I will need to investigate why it is printing both messages instead of just one.


I'm a bit confused by the soft CPU limit.  The value used on Power (number of host threads) makes sense to me.  On x86 it is fixed at 240, but I can't see any sensible reason for that.

Comment 9 Karen Noel 2015-08-31 20:43:48 UTC

(In reply to David Gibson from comment #8) 
> 
> I'm a bit confused by the soft CPU limit.  The value used on Power (number
> of host threads) makes sense to me.  On x86 it is fixed at 240, but I can't
> see any sensible reason for that.

On X86_64, 240 is the max number of vcpus tested by Red Hat OEM partners, with the same number of physical cpus (threads).

Comment 10 David Gibson 2015-09-01 04:50:42 UTC

Karen,

The thing I'm questioning isn't so much why it is 240, but why it's fixed, rather than varying with the number of host CPUs, like on Power.

Comment 11 David Gibson 2015-09-01 06:06:52 UTC

Ugh.  Ok, I see why qemu is failing completely and not just giving a warning.

A downstream only patch is forcing the hard limit to be equal to the soft limit (see bug 998708).  I'm going to consult some people to figure out how to deal with this.

Comment 12 Laszlo Ersek 2015-09-01 11:39:14 UTC

David mentioned downstream commit c6c39a1 and bug 998708.

If I understand right, in one sentece the problem is:

On x86 hosts, the user is allowed to oversubscribe the host CPUs as long as the absolute number remains in the supported range, whereas on Power hosts, oversubscription is completely disabled.

I think that first of all we should come up with an architecture-independent definition for what the "soft limit" means. Once that's done, we can see if QEMU needs an update too. Just my opinion.

Comment 13 David Gibson 2015-09-02 04:42:53 UTC

Laszlo, that's a good summary of the outcome, though not the cause.

I agree I'd like to come up with a consensus as to what the soft limit is supposed to mean.  To my mind the Power approach - a recommended maximum for this specific system - seems more useful.

In the short term (RHEL 7.2) does fixing the qemu hard limit to 240, while leaving the soft limit at whatever the kernel reports seem like a reasonable approach?

Comment 14 Laszlo Ersek 2015-09-02 09:09:11 UTC

(Although you've asked Paolo, I'll give you my mind :))

I don't recall all the reasoning behind the current patch (ie. commit c6c39a1). However, given that 7.2 deadlines are pressing, I think you could go ahead with a surgical fix (one that later on we could back out even) -- refer to commit 90b21e81, which is a similarly host-dependent workaround. (That specific workaround is obsolete now, because Paolo found the right fix for that issue, but backing out that workaround will be part of BZ 1188656, so let me not digress more than this.)

So, I'm thinking I'd be willing to ACK a downstream patch like this:

> diff --git a/kvm-all.c b/kvm-all.c
> index dc71496..9bcbe93 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1509,8 +1509,10 @@ static int kvm_init(MachineState *ms)
>      soft_vcpus_limit = kvm_recommended_vcpus(s);
>      hard_vcpus_limit = kvm_max_vcpus(s);
>  
> +#ifndef HOST_PPC64
>      /* RHEL doesn't support nr_vcpus > soft_vcpus_limit */
>      hard_vcpus_limit = soft_vcpus_limit;
> +#endif
>  
>      while (nc->name) {
>          if (nc->num > soft_vcpus_limit) {

The patch may not be correct, but it shows the idea.

... Hm yes I think the above is not correct, on a ppc64 RHEL7 host this will set the hard limit to 2048:

kvm_max_vcpus() [kvm-all.c]
  kvm_check_extension(..., KVM_CAP_MAX_VCPUS)
    kvm_ioctl()
      ioctl()
        kvm_vm_ioctl_check_extension() [kernel: arch/powerpc/kvm/powerpc.c]
          returns KVM_MAX_VCPUS

and "arch/powerpc/include/asm/kvm_host.h" defines KVM_MAX_VCPUS as NR_CPUS. Finally, in "redhat/configs/generic/powerpc64/CONFIG_NR_CPUS", we have

CONFIG_NR_CPUS=2048

So what about:

> diff --git a/kvm-all.c b/kvm-all.c
> index dc71496..49b5704 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1509,8 +1509,15 @@ static int kvm_init(MachineState *ms)
>      soft_vcpus_limit = kvm_recommended_vcpus(s);
>      hard_vcpus_limit = kvm_max_vcpus(s);
>  
> +#ifdef HOST_PPC64
> +    hard_vcpus_limit = 240;
> +    if (soft_vcpus_limit > hard_vcpus_limit) {
> +        soft_vcpus_limit = hard_vcpus_limit;
> +    }
> +#else
>      /* RHEL doesn't support nr_vcpus > soft_vcpus_limit */
>      hard_vcpus_limit = soft_vcpus_limit;
> +#endif
>  
>      while (nc->name) {
>          if (nc->num > soft_vcpus_limit) {

Comment 15 Miroslav Rezanina 2015-09-03 11:37:11 UTC

Fix included in qemu-kvm-rhev-2.3.0-22.el7

Comment 16 Qunfang Zhang 2015-09-06 10:06:17 UTC

Verified the bug with qemu-kvm-rhev-2.3.0-22.el7.ppc64le:

1) Boot up a guest with 240 vcpu on a host with 80 physical cpu:

# /usr/libexec/qemu-kvm -name qzhang-test -machine pseries,accel=kvm,usb=off -m 4G -smp 240 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=rhel-7.2-0904.0-virtio-scsi.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive file=RHEL-7.2-20150904.0-Server-ppc64le-dvd1.iso,if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on,downscript=/etc/qemu-ifdown -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c
Warning: Number of SMP cpus requested (240) exceeds the recommended cpus supported by KVM (80)
Warning: Number of hotpluggable cpus requested (240) exceeds the recommended cpus supported by KVM (80)
QEMU 2.3.0 monitor - type 'help' for more information
(qemu) 

QEMU gives more suitable prompt than before.

Check the vcpu inside guest, it's 240. Do some basic operation (reboot, ping external host, shutdown), all works well.

2) Boot up guest with 241 vcpu, QEMU fails to boot the guest and prompt:

Warning: Number of SMP cpus requested (241) exceeds the recommended cpus supported by KVM (80)
Number of SMP cpus requested (241) exceeds the maximum cpus supported by KVM (240)
[root@ibm-p8-rhevm-13 home]# 

Based on above, I think the issue is fixed. 

David,

Just one more additional question to confirm with you here. What does the "Number of hotpluggable cpus" mean here?

Thanks,
Qunfang

Comment 17 David Gibson 2015-09-07 02:34:17 UTC

"Number of hotpluggable cpus" here just means the maximum allowed number of CPUs, including both CPUs configured initially and any which could be added to the guest later.  Since we don't yet have CPU hotplug support for Power, this is the same as the initial number of CPUs in practice.

Comment 18 Qunfang Zhang 2015-09-07 03:35:13 UTC

(In reply to David Gibson from comment #17)
> "Number of hotpluggable cpus" here just means the maximum allowed number of
> CPUs, including both CPUs configured initially and any which could be added
> to the guest later.  Since we don't yet have CPU hotplug support for Power,
> this is the same as the initial number of CPUs in practice.

Get it, thanks for the explanation. Setting to VERIFIED according to comment 16.

Comment 20 errata-xmlrpc 2015-12-04 16:54:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html