Hide Forgot
Description of problem: When trying to run missing executable, vdsm expect to get the following error: >>> subprocess.check_call(["foobar"]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/subprocess.py", line 535, in check_call retcode = call(*popenargs, **kwargs) File "/usr/lib64/python2.7/subprocess.py", line 522, in call return Popen(*popenargs, **kwargs).wait() File "/usr/lib64/python2.7/subprocess.py", line 710, in __init__ errread, errwrite) File "/usr/lib64/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory This error is typically not handled and will fail the current flow, showing a very clear traceback. However when vdsm is configured to use cpu_affinity, it run *all* commands via taskset, which fail with exit code 1: >>> subprocess.check_call(["taskset", "-c", "1", "foobar"]) taskset: failed to execute foobar: No such file or directory Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['taskset', '-c', '1', 'foobar']' returned non-zero exit status 1 In Vdsm log, we see a failure from the taskset command: Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1 /sbin/fuser /tmp/tmpwTY0g3 (cwd None) Storage.Misc.excCmd: DEBUG: FAILED: <err> = 'taskset: failed to execute /sbin/fuser: No such file or directory\n'; <rc> = 1 But vdsm code running the command is not aware that taskset is being used, and consider the error as the actual command error. In the case of fuser, exit code 1 means the file is not being opened by anyone. This may lead to catastrophic results. When running commands via the shell, it fail using a special exit code 127: >>> subprocess.check_call(["sh", "-c", "foobar"]) sh: foobar: command not found Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sh', '-c', 'foobar']' returned non-zero exit status 127 Similar tools running other processes are tyring to simulate the shell behavior when an executable is not found. >>> subprocess.check_call(["nice", "-n", "5", "foobar"]) nice: foobar: No such file or directory Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['nice', '-n', '5', 'foobar']' returned non-zero exit status 127 See http://tldp.org/LDP/abs/html/exitcodes.html for common special exit codes. We tow issues: 1. execCmd is wrapping calls using nice, taskset and other wrappers, but it does not perform any checks on exit code, detecting failure of the wrapper commands, returning invalid output to the caller. 2. taskset is not reporting missing executable in a helpful way like other tools (e.g. nice). Version-Release number of selected component (if applicable): - 4.16.31 when cpu_affinity is enabled - 4.17 and later How reproducible: Always Steps to Reproduce: 1. Run vdsm tests on a system without "fuser" or "which" commands Actual results: Clear OSError about missing executable Expected results: False results (.e.g. fuser output is "") Additional info: If vdsm dependencies are correct, it should not be possible to remove a command vdsm depends on.
Same issue exists when using ionice, sudo, and setsid - this is not related to cpu_afinity and exist in all vdsm versions. >>> subprocess.check_call(["ionice", "-c", "3", "bleep", "1"]) ionice: failed to execute bleep: No such file or directory Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['ionice', '-c', '3', 'bleep', '1']' returned non-zero exit status 1 >>> subprocess.check_call(["sudo", "foobar"]) sudo: foobar: command not found Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', 'foobar']' returned non-zero exit status 1 >>> subprocess.check_call(["setsid", "-w", "foobar"]) setsid: failed to execute foobar: No such file or directory Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['setsid', '-w', 'foobar']' returned non-zero exit status 1
I sent a pull request for util-linux fixing this issue in taskset and ionice (and all other commands running other commands). Lets for response. https://github.com/karelzak/util-linux/pull/311
(In reply to Nir Soffer from comment #2) > I sent a pull request for util-linux fixing this issue in taskset and ionice > (and all other commands running other commands). Lets for response. > https://github.com/karelzak/util-linux/pull/311 Very nice!
we currently don't require any specific version of util-linux. please update if it gets in and we'll add this requirement
Yaniv - please target this one to the relevant milestone.
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
This request has been proposed for two releases. This is invalid flag usage. The ovirt-future release flag has been cleared. If you wish to change the release flag, you must clear one release flag and then set the other release flag to ?.
Yaniv, what info is needed?
Based on comment #2 you posted https://github.com/karelzak/util-linux/pull/311 to fix the issue. what else is needed ?
(In reply to Yaniv Bronhaim from comment #9) > Based on comment #2 you posted > https://github.com/karelzak/util-linux/pull/311 to fix the issue. what else > is needed ? Yaniv - please discuss with Nir and agree on the status of the bug
Yaniv, the suggested change was discussed in the mailing list: https://www.spinics.net/lists/util-linux-ng/msg12763.html and there was agreement about this direction, but completing the work to get this accepted required lot of time and I never had the time to work on it. I think we can try to push a smaller change fixing only taskset (the original patch tried to fix this issue in all util-linux executables).
According to the last comment it won't be finished in 4.2, so moving to 4.3
Given the latest comments by the bug author in [1] I think we should close it. [1] https://gerrit.ovirt.org/#/c/106859/