Bug 1286462 - Vdsm daemon failed to start, because incorrect cpu affinity
Summary: Vdsm daemon failed to start, because incorrect cpu affinity
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.17.11
Hardware: All
OS: Linux
urgent
high
Target Milestone: ovirt-3.6.1
: 4.17.12
Assignee: Francesco Romani
QA Contact: Artyom
URL:
Whiteboard: virt
Depends On:
Blocks: RHEV3.6PPC
TreeView+ depends on / blocked
 
Reported: 2015-11-29 18:56 UTC by Artyom
Modified: 2016-02-21 11:06 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-12-16 12:19:31 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-3.6.z+
rule-engine: blocker+
mgoldboi: planning_ack+
michal.skrivanek: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
vdsm log (9.39 MB, text/plain)
2015-11-29 18:56 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1279431 0 high CLOSED enable vdsm taskset pinning by default 2021-02-22 00:41:40 UTC
oVirt gerrit 49402 0 master MERGED lib: daemon: autodetect online cpus for affinity Never
oVirt gerrit 49460 0 master MERGED daemon: revert cpu-affinity enabling by default Never
oVirt gerrit 49463 0 ovirt-3.6 MERGED daemon: revert cpu-affinity enabling by default Never
oVirt gerrit 49612 0 ovirt-3.6 MERGED lib: daemon: autodetect online cpus for affinity Never
oVirt gerrit 49613 0 ovirt-3.6 MERGED daemon: reformat __set_cpu_affinity Never

Internal Links: 1279431

Description Artyom 2015-11-29 18:56:42 UTC
Created attachment 1100263 [details]
vdsm log

Description of problem:
Vdsm daemon failed to start on hosts without cpu under number 1, because incorrect cpu affinity with traceback:
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 166, in run
    __set_cpu_affinity()
  File "/usr/share/vdsm/vdsm", line 280, in __set_cpu_affinity
    taskset.set(os.getpid(), cpu_set, all_tasks=True)
  File "/usr/lib/python2.7/site-packages/vdsm/taskset.py", line 82, in set
    raise Error(rc, out, err)
Error: Process failed with rc=1 out=["pid 129019's current affinity list: 8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152"] err=["taskset: failed to set pid 129019's affinity: Invalid argument"]


Version-Release number of selected component (if applicable):
vdsm-4.17.11-0.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Start vdsm daemon on host that not have cpu under number 1
# cat /proc/cpuinfo 
processor       : 8
cpu             : POWER8E (raw), altivec supported
clock           : 3690.000000MHz
revision        : 2.1 (pvr 004b 0201)

processor       : 16
cpu             : POWER8E (raw), altivec supported
clock           : 3690.000000MHz
revision        : 2.1 (pvr 004b 0201)
....
2.
3.

Actual results:
vdsm daemon failed to start with above exception

Expected results:
vdsm succeed to start

Additional info:
problem in vdsm config file
('cpu_affinity', '1',
            'Comma separated whitelist of CPU cores on which VDSM is allowed '
            'to run. The default is "1", meaning VDSM can be scheduled by '
            ' the OS to run on the second core of the system. '
            'Valid examples: "1", "0,1", "0,2,3"')
if I change it to 'cpu_affinity', '', all works fine

Comment 1 Gil Klein 2015-11-29 19:21:04 UTC
Seems to be related the enablement of BZ #1279431

Comment 2 Michal Skrivanek 2015-11-29 19:27:52 UTC
Decreasing Severity as there is a configuration workaround to pin to a different cpu or disable it altogether

This is not ppc specific, any platform with offline cpu1 would demonstrate the same. We should have go with 0

Comment 3 Francesco Romani 2015-11-30 20:40:23 UTC
patches merged on both master and 3.6 branch -> MODIFIED

Comment 4 Sandro Bonazzola 2015-12-01 15:25:36 UTC
This bug is referenced in 4.17.12 git log and has target release unset.
Please check

Comment 5 Artyom 2015-12-07 14:57:14 UTC
Verified on vdsm-4.17.12-0.el7ev.noarch

Comment 6 Sandro Bonazzola 2015-12-16 12:19:31 UTC
According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.


Note You need to log in before you can comment on or make changes to this bug.