Bug 1286462 - Vdsm daemon failed to start, because incorrect cpu affinity
Vdsm daemon failed to start, because incorrect cpu affinity
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: General (Show other bugs)
4.17.11
All Linux
urgent Severity high (vote)
: ovirt-3.6.1
: 4.17.12
Assigned To: Francesco Romani
Artyom
virt
: Regression, Triaged
Depends On:
Blocks: RHEV3.6PPC
  Show dependency treegraph
 
Reported: 2015-11-29 13:56 EST by Artyom
Modified: 2016-02-21 06:06 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-16 07:19:31 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
rule-engine: blocker+
mgoldboi: planning_ack+
michal.skrivanek: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
vdsm log (9.39 MB, text/plain)
2015-11-29 13:56 EST, Artyom
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 49402 master MERGED lib: daemon: autodetect online cpus for affinity Never
oVirt gerrit 49460 master MERGED daemon: revert cpu-affinity enabling by default Never
oVirt gerrit 49463 ovirt-3.6 MERGED daemon: revert cpu-affinity enabling by default Never
oVirt gerrit 49612 ovirt-3.6 MERGED lib: daemon: autodetect online cpus for affinity Never
oVirt gerrit 49613 ovirt-3.6 MERGED daemon: reformat __set_cpu_affinity Never

  None (edit)
Description Artyom 2015-11-29 13:56:42 EST
Created attachment 1100263 [details]
vdsm log

Description of problem:
Vdsm daemon failed to start on hosts without cpu under number 1, because incorrect cpu affinity with traceback:
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 166, in run
    __set_cpu_affinity()
  File "/usr/share/vdsm/vdsm", line 280, in __set_cpu_affinity
    taskset.set(os.getpid(), cpu_set, all_tasks=True)
  File "/usr/lib/python2.7/site-packages/vdsm/taskset.py", line 82, in set
    raise Error(rc, out, err)
Error: Process failed with rc=1 out=["pid 129019's current affinity list: 8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152"] err=["taskset: failed to set pid 129019's affinity: Invalid argument"]


Version-Release number of selected component (if applicable):
vdsm-4.17.11-0.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Start vdsm daemon on host that not have cpu under number 1
# cat /proc/cpuinfo 
processor       : 8
cpu             : POWER8E (raw), altivec supported
clock           : 3690.000000MHz
revision        : 2.1 (pvr 004b 0201)

processor       : 16
cpu             : POWER8E (raw), altivec supported
clock           : 3690.000000MHz
revision        : 2.1 (pvr 004b 0201)
....
2.
3.

Actual results:
vdsm daemon failed to start with above exception

Expected results:
vdsm succeed to start

Additional info:
problem in vdsm config file
('cpu_affinity', '1',
            'Comma separated whitelist of CPU cores on which VDSM is allowed '
            'to run. The default is "1", meaning VDSM can be scheduled by '
            ' the OS to run on the second core of the system. '
            'Valid examples: "1", "0,1", "0,2,3"')
if I change it to 'cpu_affinity', '', all works fine
Comment 1 Gil Klein 2015-11-29 14:21:04 EST
Seems to be related the enablement of BZ #1279431
Comment 2 Michal Skrivanek 2015-11-29 14:27:52 EST
Decreasing Severity as there is a configuration workaround to pin to a different cpu or disable it altogether

This is not ppc specific, any platform with offline cpu1 would demonstrate the same. We should have go with 0
Comment 3 Francesco Romani 2015-11-30 15:40:23 EST
patches merged on both master and 3.6 branch -> MODIFIED
Comment 4 Sandro Bonazzola 2015-12-01 10:25:36 EST
This bug is referenced in 4.17.12 git log and has target release unset.
Please check
Comment 5 Artyom 2015-12-07 09:57:14 EST
Verified on vdsm-4.17.12-0.el7ev.noarch
Comment 6 Sandro Bonazzola 2015-12-16 07:19:31 EST
According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.

Note You need to log in before you can comment on or make changes to this bug.