Bug 1444728
| Summary: | Pacemaker can not realize the change of cpu count when it was running. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Xu Yin <xyin> | ||||
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 7.3 | CC: | abeekhof, cluster-maint, mnovacek, qguo | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 7.4 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pacemaker-1.1.16-9.el7 | Doc Type: | Enhancement | ||||
| Doc Text: |
Feature: Pacemaker will now detect CPU cores that have been hot-plugged into a virtual machine used as a cluster node.
Reason: Pacemaker throttles the number of actions it will execute concurrently based on the system load and the number of CPU cores.
Result: Pacemaker's action throttling is now more accurate when CPU cores are hot-plugged.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-01 17:54:39 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Hi, I thought Pacemaker checked the CPU count each time, but you're right, it's cached after the first check. That makes sense, since it can check frequently, and the extra I/O and processing at every check, for changes that are extremely rare, would be inefficient. My planned solution is: * Pacemaker can subscribe to udev notifications of hot-plug events. This would automatically keep the CPU count current, as long as the libgudev1 library package is available, and udev is functioning properly. * We could implement a user command to manually refresh the CPU count, something like "pcs cluster cpu-update". This would be a fallback method if the automatic method was unavailable or not working. I've thought of a cleaner solution that doesn't add any overhead: We already process /proc/stat every throttle check, to get the I/O load. That file has an identifiable line for each CPU, so we can can count the cores there. Fixed upstream: https://github.com/ClusterLabs/pacemaker/pull/1267 The CPU count is now updated from /proc/stat every time it is needed. The I/O checks were determined to be unnecessary and removed. QA: How to test: Prep: 1. Configure a cluster of at least 2 VMs (as full cluster nodes) on a single host (which is not part of the cluster). For this example, I'll assume all VMs are created with 1 vcpu each. It's fine to use a different number and adjust the commands accordingly. 2. Choose one of the VMs (I'll call it $VM). Ensure it is stopped. 3. On the host, give the VM an additional CPU with: virsh setvcpus $VM 2 --maximum --config Confirm the change with: virsh dumpxml $VM | grep vcpu which will show something like: <vcpu placement='static' current='1'>2</vcpu> 4. Start $VM and login. Verify the number of CPUs is 1 (e.g. grep processor /proc/cpuinfo). Edit /etc/sysconfig/pacemaker and set PCMK_debug=crmd to get more throttling messages in the pacemaker detail log. Repeat this test with the old and new packages: 1. On the VM, install the packages, reboot, and login. 2. Verify that the VM has 1 cpu. 3. pcs cluster start --all 4. Watch the pacemaker detail log for messages containing "throttle". Before the fix, you will see messages like: crmd: debug: throttle_load_avg: Current load is 0.270000 (full: 0.27 0.15 0.10 3/159 15885) crmd: debug: throttle_io_load: Current IO load is 0.000000 The "IO load" will always be 0. After the fix, you will see: crmd: debug: throttle_mode: Current load is 0.000000 across 1 core(s) with no IO load messages. The "Current load" is the 1-minute load average as reported by /proc/loadavg, top, etc. 5. Do something to generate CPU load. This can be as simple as "while true; do true; done", or using the "stress" command, or anything else. Watch the "Current load" go up in the log. If the load goes up enough, you will see a messages like: crmd: notice: throttle_check_thresholds: High CPU load detected: 1.180000 crmd: info: throttle_send_command: New throttle mode: 0100 (was 0000) crmd: debug: throttle_update: Host rhel7-1 supports a maximum of 2 jobs and throttle mode 0100. New job limit is 1 6. On the host, hot-plug an addition CPU with: virsh setvcpus $VM 2 7. On the VM, verify the number of CPUs again. Continue watching the logs; with the older packages, nothing will change. With the new packages, the log message will show the correct number of cores, and the throttle mode should change back. It would be worthwhile to check the new packages on all supported architectures just to verify that the load and number of cores are reported correctly (even after hot-plugging, if supported). Docs: not sure if this is worth a release note or not. I have verified that hot-added cpus are correctly detected by pacemaker-1.1.16-9 ---- [root@big-02 ~]# virsh setvcpus virt-135.cluster-qe.lab.eng.brq.redhat.com 2 --maximum --config [root@big-02 ~]# virsh dumpxml virt-135.cluster-qe.lab.eng.brq.redhat.com | grep \<vcpu <vcpu placement='static' current='1'>2</vcpu> [root@big-02 ~]# virsh start virt-135.cluster-qe.lab.eng.brq.redhat.com Domain virt-135.cluster-qe.lab.eng.brq.redhat.com started ... [root@virt-135 ~]# grep 'core id' /proc/cpuinfo core id : 0 [root@virt-135 ~]# pcs cluster stop --all virt-135: Stopping Cluster (pacemaker)... virt-134: Stopping Cluster (pacemaker)... virt-136: Stopping Cluster (pacemaker)... virt-135: Stopping Cluster (corosync)... virt-134: Stopping Cluster (corosync)... virt-136: Stopping Cluster (corosync)... [root@virt-135 ~]# ( while /bin/true; do yes > /dev/null; done ) & before the fix (pacemaker-1.1.16-1.el7.x86_64) ============================================== [root@virt-135 ~]# pcs cluster start --all virt-136: Starting Cluster... virt-134: Starting Cluster... virt-135: Starting Cluster... /var/log/cluster/corosync.log: ... May 24 16:10:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_cib_load: cib load: 0.000333 (1 ticks in 30s) May 24 16:10:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_load_avg: Current load is 0.050000 (full: 0.05 0.03 0.05 2/166 8967) May 24 16:10:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_io_load: Current IO load is 0.000000 # insert new cpu [root@big-01 ~]# virsh setvcpus virt-135.cluster-qe.lab.eng.brq.redhat.com 2 [root@virt-135 ~]# grep 'core id' /proc/cpuinfo core id : 0 core id : 0 May 24 16:14:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_io_load: Current IO load is 0.000000 May 24 16:14:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: info: throttle_send_command: New throttle mode: 0001 (was 0000) May 24 16:14:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_update: Host virt-135 supports a maximum of 2 jobs and throttle mode 0001. New job limit is 1 May 24 16:14:59 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_cib_load: cib load: 0.000000 (0 ticks in 30s) May 24 16:14:59 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_load_avg: Current load is 0.800000 (full: 0.80 0.32 0.15 4/175 9630) May 24 16:14:59 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_handle_load: Noticeable CPU load detected: 0.800000 May 24 16:14:59 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_io_load: Current IO load is 0.000000 May 24 16:15:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_cib_load: cib load: 0.000000 (0 ticks in 30s) May 24 16:15:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_load_avg: Current load is 0.880000 (full: 0.88 0.39 0.18 3/175 9703) May 24 16:15:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: info: throttle_handle_load: Moderate CPU load detected: 0.880000 May 24 16:15:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_io_load: Current IO load is 0.000000 May 24 16:15:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: info: throttle_send_command: New throttle mode: 0010 (was 0001) >> May 24 16:15:29 [8485] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_update: Host virt-135 supports a maximum of 2 jobs and throttle mode 0010. New job limit is 1 no cpu addition detected after the fix (pacemaker-1.1.16-9) ================================== /var/log/cluster/corosync.log: ... May 24 14:40:10 [2629] virt-135 crmd: debug: throttle_mode: Current load is 0.170000 across 1 core(s) May 24 14:40:10 [2629] virt-135 crmd: info: throttle_send_command: New throttle mode: 0000 (was ffffffff) May 24 14:40:10 [2629] virt-135 crmd: debug: throttle_update: Host virt-135 supports a maximum of 2 jobs and throttle mode 0000. New job limit is 2 May 24 14:40:40 [2629] virt-135 crmd: debug: throttle_cib_load: cib load: 0.000333 (1 ticks in 30s) May 24 14:40:40 [2629] virt-135 crmd: debug: throttle_mode: Current load is 0.100000 across 1 core(s) May 24 14:41:10 [2629] virt-135 crmd: debug: throttle_cib_load: cib load: 0.000333 (1 ticks in 30s) May 24 14:41:10 [2629] virt-135 crmd: debug: throttle_mode: Current load is 0.060000 across 1 core(s) [root@big-01 ~]# virsh setvcpus virt-135.cluster-qe.lab.eng.brq.redhat.com 2 [root@virt-135 ~]# grep 'core id' /proc/cpuinfo core id : 0 core id : 0 /var/log/cluster/corosync.log: ... May 24 14:57:41 [2629] virt-135 crmd: info: throttle_send_command: New throttle mode: 0000 (was 0100) >> May 24 14:57:41 [2629] virt-135 crmd: debug: throttle_update: Host virt-135 supports a maximum of 2 jobs and throttle mode 0000. New job limit is 2 May 24 14:58:11 [2629] virt-135 crmd: debug: throttle_cib_load: cib load: 0.000000 (0 ticks in 30s) May 24 14:58:11 [2629] virt-135 crmd: debug: throttle_mode: Current load is 2.420000 across 2 core(s) ... [root@big-01 ~]# virsh setvcpus virt-135.cluster-qe.lab.eng.brq.redhat.com 1 ... May 24 15:08:41 [2629] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_cib_load: cib load: 0.000000 (0 ticks in 30s) May 24 15:08:41 [2629] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_mode: Current load is 2.870000 across 1 core(s) May 24 15:08:41 [2629] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: notice: throttle_check_thresholds: High CPU load detected: 2.870000 May 24 15:08:41 [2629] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: info: throttle_send_command: New throttle mode: 0100 (was 0010) May 24 15:08:41 [2629] virt-135.cluster-qe.lab.eng.brq.redhat.com crmd: debug: throttle_update: Host virt-135 supports a maximum of 2 jobs and throttle mode 0100. New job limit is 1 ... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1862 |
Created attachment 1273519 [details] Test Time: April 25, 13:20 Description of problem: Pacemaker can not realize the change of cpu count when it was running. Version-Release number of selected component (if applicable): Same issue in my tes env and customer env My test env: ~~~ pacemaker-1.1.15-11.el7_3.2.x86_64 pacemaker-cluster-libs-1.1.15-11.el7_3.2.x86_64 pacemaker-libs-1.1.15-11.el7_3.2.x86_64 pacemaker-cli-1.1.15-11.el7_3.2.x86_64 ~~~ Customer's env: ~~~ pacemaker-1.1.13-10.el7.x86_64 pacemaker-cli-1.1.13-10.el7.x86_64 pacemaker-cluster-libs-1.1.13-10.el7.x86_64 pacemaker-libs-1.1.13-10.el7.x86_64 ~~~ How reproducible: Steps to Reproduce: 1. Hot-plug cpu to test vm, in my test case, I change the cpu count from 2 to 12, as "notice: High CPU load detected: XX", "we know the "XX"'s value is calculated by the cpu count * 1.6, so before we add cpu, it should be 3.2, and after change, it should be 19.2; and verfiy the cpu added successfully. ~~~ Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(4) Converting physical 4 to logical package 2 Apr 24 13:21:50 pcmk1 kernel: CPU2 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(6) Converting physical 6 to logical package 3 Apr 24 13:21:50 pcmk1 kernel: CPU3 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(8) Converting physical 8 to logical package 4 Apr 24 13:21:50 pcmk1 kernel: CPU4 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(a) Converting physical 10 to logical package 5 Apr 24 13:21:50 pcmk1 kernel: CPU5 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(c) Converting physical 12 to logical package 6 Apr 24 13:21:50 pcmk1 kernel: CPU6 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(e) Converting physical 14 to logical package 7 Apr 24 13:21:50 pcmk1 kernel: CPU7 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(10) Converting physical 16 to logical package 8 Apr 24 13:21:50 pcmk1 kernel: CPU8 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(12) Converting physical 18 to logical package 9 Apr 24 13:21:50 pcmk1 kernel: CPU9 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(14) Converting physical 20 to logical package 10 Apr 24 13:21:50 pcmk1 kernel: CPU10 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: APIC(16) Converting physical 22 to logical package 11 Apr 24 13:21:50 pcmk1 kernel: CPU11 has been hot-added Apr 24 13:21:50 pcmk1 kernel: smpboot: Booting Node 0 Processor 10 APIC 0x14 Apr 24 13:21:50 pcmk1 kernel: Disabled fast string operations Apr 24 13:21:50 pcmk1 kernel: mce: CPU supports 0 MCE banks ... # numactl --hardware available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 node 0 size: 2047 MB node 0 free: 915 MB node distances: node 0 0: 10 ~~~ 2. Running something for adding workload, in my test case, I run a simple script for that ~~~ # cat /tmp/workload #!/bin/python while True: pass ~~~ 3. You can notice the high cpu load from the /var/log/messages like following. ~~~ Apr 24 13:29:16 pcmk1 crmd[1840]: notice: High CPU load detected: 7.600000 Apr 24 13:29:45 pcmk1 root: zuodasi Apr 24 13:29:46 pcmk1 crmd[1840]: notice: High CPU load detected: 7.760000 .... ~~~ but the above notice should be printed when load is higher than "18.9" 4. We can let pacemkaer re-know the cpu count by restart pacemaker, after restart, it seems be normal. ~~~ ... Apr 24 13:29:16 pcmk1 crmd[1840]: notice: High CPU load detected: 7.600000 Apr 24 13:29:45 pcmk1 root: zuodasi Apr 24 13:29:46 pcmk1 crmd[1840]: notice: High CPU load detected: 7.760000 Apr 24 13:29:53 pcmk1 systemd: Stopping Pacemaker High Availability Cluster Manager... Apr 24 13:29:53 pcmk1 pacemakerd[1773]: notice: Caught 'Terminated' signal Apr 24 13:29:53 pcmk1 pacemakerd[1773]: notice: Shutting down Pacemaker Apr 24 13:29:53 pcmk1 pacemakerd[1773]: notice: Stopping crmd ... Apr 24 13:29:59 pcmk1 systemd: Starting Corosync Cluster Engine... ... Apr 24 13:32:31 pcmk1 crmd[8319]: notice: High CPU load detected: 24.120001 Apr 24 13:33:01 pcmk1 crmd[8319]: notice: High CPU load detected: 29.160000 ... ~~~ Actual results: Pacemaker doesn't know the cpu count has been changed. Expected results: Pacemaker can re-calculate the workload after hot-plug cpu or other hardware resource. Additional info: Sosreport uploaded, please help to check. Test Time: April 25, 13:20 Environment: ESXi6 RHEL7.3 (3.10.0-514.el7.x86_64) pacemaker-1.1.15-11.el7_3.2.x86_64,