Bug 1324731

Summary: Missing executables not handled correctly
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: Marcin Sobczyk <msobczyk>
Status: CLOSED WONTFIX QA Contact: Petr Kubica <pkubica>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.16.31CC: bugs, fromani, mperina, nsoffer, pstehlik
Target Milestone: ---Flags: mperina: ovirt-4.4?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-03 15:38:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1496381    

Description Nir Soffer 2016-04-07 06:35:43 UTC
Description of problem:

When trying to run missing executable, vdsm expect to get the following error:

>>> subprocess.check_call(["foobar"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 535, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib64/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

This error is typically not handled and will fail the current flow, showing
a very clear traceback.

However when vdsm is configured to use cpu_affinity, it run *all* commands
via taskset, which fail with exit code 1:

>>> subprocess.check_call(["taskset", "-c", "1", "foobar"])
taskset: failed to execute foobar: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['taskset', '-c', '1', 'foobar']' returned non-zero exit status 1

In Vdsm log, we see a failure from the taskset command:

Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1 /sbin/fuser /tmp/tmpwTY0g3 (cwd None)
Storage.Misc.excCmd: DEBUG: FAILED: <err> = 'taskset: failed to execute /sbin/fuser: No such file or directory\n'; <rc> = 1

But vdsm code running the command is not aware that taskset is being
used, and consider the error as the actual command error. In the case
of fuser, exit code 1 means the file is not being opened by anyone.
This may lead to catastrophic results.

When running commands via the shell, it fail using a special exit code
127:

>>> subprocess.check_call(["sh", "-c", "foobar"])
sh: foobar: command not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sh', '-c', 'foobar']' returned non-zero exit status 127

Similar tools running other processes are tyring to simulate the shell
behavior when an executable is not found.

>>> subprocess.check_call(["nice", "-n", "5", "foobar"])
nice: foobar: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['nice', '-n', '5', 'foobar']' returned non-zero exit status 127

See http://tldp.org/LDP/abs/html/exitcodes.html for common special exit codes.

We tow issues:

1. execCmd is wrapping calls using nice, taskset and other wrappers, but it
   does not perform any checks on exit code, detecting failure of the wrapper
   commands, returning invalid output to the caller.

2. taskset is not reporting missing executable in a helpful way like other
   tools (e.g. nice).

Version-Release number of selected component (if applicable):
- 4.16.31 when cpu_affinity is enabled
- 4.17 and later

How reproducible:
Always

Steps to Reproduce:
1. Run vdsm tests on a system without "fuser" or "which" commands

Actual results:
Clear OSError about missing executable

Expected results:
False results (.e.g. fuser output is "")

Additional info:

If vdsm dependencies are correct, it should not be possible to remove a command
vdsm depends on.

Comment 1 Nir Soffer 2016-04-07 07:03:28 UTC
Same issue exists when using ionice, sudo, and setsid - this is not related to 
cpu_afinity and exist in all vdsm versions.

>>> subprocess.check_call(["ionice", "-c", "3", "bleep", "1"])
ionice: failed to execute bleep: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ionice', '-c', '3', 'bleep', '1']' returned non-zero exit status 1

>>> subprocess.check_call(["sudo", "foobar"])
sudo: foobar: command not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', 'foobar']' returned non-zero exit status 1

>>> subprocess.check_call(["setsid", "-w", "foobar"])
setsid: failed to execute foobar: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['setsid', '-w', 'foobar']' returned non-zero exit status 1

Comment 2 Nir Soffer 2016-04-07 10:39:43 UTC
I sent a pull request for util-linux fixing this issue in taskset and ionice
(and all other commands running other commands). Lets for response.
https://github.com/karelzak/util-linux/pull/311

Comment 3 Francesco Romani 2016-04-07 10:56:51 UTC
(In reply to Nir Soffer from comment #2)
> I sent a pull request for util-linux fixing this issue in taskset and ionice
> (and all other commands running other commands). Lets for response.
> https://github.com/karelzak/util-linux/pull/311

Very nice!

Comment 4 Yaniv Bronhaim 2016-04-10 01:54:58 UTC
we currently don't require any specific version of util-linux. please update if it gets in and we'll add this requirement

Comment 5 Oved Ourfali 2016-04-11 12:25:26 UTC
Yaniv - please target this one to the relevant milestone.

Comment 6 Red Hat Bugzilla Rules Engine 2016-04-12 13:33:34 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 7 Red Hat Bugzilla Rules Engine 2016-04-12 13:43:52 UTC
This request has been proposed for two releases. This is invalid flag usage. The ovirt-future release flag has been cleared. If you wish to change the release flag, you must clear one release flag and then set the other release flag to ?.

Comment 8 Nir Soffer 2017-03-06 22:16:05 UTC
Yaniv, what info is needed?

Comment 9 Yaniv Bronhaim 2017-03-07 11:17:10 UTC
Based on comment #2 you posted https://github.com/karelzak/util-linux/pull/311 to fix the issue. what else is needed ?

Comment 10 Yaniv Kaul 2017-09-04 14:13:10 UTC
(In reply to Yaniv Bronhaim from comment #9)
> Based on comment #2 you posted
> https://github.com/karelzak/util-linux/pull/311 to fix the issue. what else
> is needed ?

Yaniv - please discuss with Nir and agree on the status of the bug

Comment 11 Nir Soffer 2017-09-26 10:56:06 UTC
Yaniv, the suggested change was discussed in the mailing list:
https://www.spinics.net/lists/util-linux-ng/msg12763.html
and there was agreement about this direction, but completing the work to get this
accepted required lot of time and I never had the time to work on it.

I think we can try to push a smaller change fixing only taskset (the original patch
tried to fix this issue in all util-linux executables).

Comment 12 Martin Perina 2017-11-21 13:44:13 UTC
According to the last comment it won't be finished in 4.2, so moving to 4.3

Comment 13 Marcin Sobczyk 2020-03-03 15:38:30 UTC
Given the latest comments by the bug author in [1] I think we should close it.

[1] https://gerrit.ovirt.org/#/c/106859/