Bug 1662449
Summary: | Bad error handling when running lvm commands in vdsm | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Frank DeLorey <fdelorey> |
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> |
Status: | CLOSED ERRATA | QA Contact: | Liran Rotenberg <lrotenbe> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 4.2.7 | CC: | fdelorey, gwatson, jcoscia, lsurette, nashok, nsoffer, rbarry, rdlugyhe, srevivo, stirabos, tnisan, vpagar, ycui |
Target Milestone: | ovirt-4.3.1 | ||
Target Release: | 4.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
If a user with an invalid sudo configuration uses sudo to run commands, sudo appends a "last login" message to the command output. When this happens, VDSM fails to run lvm commands. Previously, the VDSM log did not contain helpful information about what caused those failures.
The current release improves error handling in the VDSM code running lvm commands. Now, if VDSM fails, an error message clearly states that there was invalid output from the lvm commands, and shows the output added by sudo. Although this change does not fix the root cause, an invalid sudo configuration, it makes it easier to understand the issue.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-05-08 12:36:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1664147, 1666215, 1666603 | ||
Bug Blocks: |
Description
Frank DeLorey
2018-12-28 15:44:26 UTC
Created attachment 1517228 [details]
Agent log
Created attachment 1517229 [details]
Broker log
Created attachment 1517230 [details]
vdsm log where issue starts
Created attachment 1517231 [details]
Latest vdsm log
Created attachment 1517245 [details]
rpms from host
ovirt-ha-broker continuously restarts since it's unable to connect the storage server: MainThread::WARNING::2018-12-28 07:02:47,618::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: Couldn't connect to VDSM within 60 seconds MainThread::INFO::2018-12-28 07:02:50,009::broker::45::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.2.18 started we have to investigate what can cause it. Frank, can you please attach the output of pvs vgs lvs executed on that host? Frank, can you enable debug logs in vdsm, restart vdsm and post log with the failure in debug mode? To enable debug log, edit /etc/vdsm/logger.conf, and replace the log level to DEBUG for these loggers: - root - storge - virt - vds This will show the actual pvs command that vdsm uses, and the response from lvm, which may explain what is wrong with lvm output. (In reply to Frank DeLorey from comment #0) > 2018-12-28 04:23:37,632-0500 ERROR (vm/de454dd5) > [storage.StorageDomainCache] Error while looking for domain > `d355d3c0-c6eb-4a46-9dfc-c9a902ec3d46` (sdc:174) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 169, in > _findUnfetchedDomain > return mod.findDomain(sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line > 1637, in findDomain > return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) > File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line > 1575, in findDomainPath > vg = lvm.getVG(sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 935, in > getVG > vg = _lvminfo.getVg(vgName) # returns single VG namedtuple > File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 581, in > getVg > vgs = self._reloadvgs(vgName) > File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 389, in > _reloadvgs > pv_name = fields[pvNameIdx] > IndexError: list index out of range This means that we got unexpected output from pvs, and storage initialization failed, which cause vdsm to remain in "not ready" state. Frank, the attached patch will log the offending line that cause the IndexError. Can you get this patch applied on one of the hosts so we can get more info on this error? Nir, I've provided the customer with a patched lvm.py file. I've tested it here, i.e. restarted vdsm and activated the host. Regards, GFW. The issue is because of custom configuration in the pam.d/auth-ac. With the Nir++ patch https://gerrit.ovirt.org/#/c/96505/, we are getting below traceback. ==== Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 169, in _findUnfetchedDomain return mod.findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1637, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1575, in findDomainPath vg = lvm.getVG(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 953, in getVG vg = _lvminfo.getVg(vgName) # returns single VG namedtuple File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 599, in getVg vgs = self._reloadvgs(vgName) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 406, in _reloadvgs raise InvalidLine("pv_name", line) InvalidLine: Missing field 'pv_name' in line 'Last login: Fri Dec 28 21:41:33 EST 2018' === So we are getting an unexpected value "Last login: Fri Dec 28 21:41:33 EST 2018" which is a "lastlog" output. While checking further I can see that there is a custom configuration in "/etc/pam.d/system-auth". ==== session required pam_lastlog.so showfailed ==== In the latest sudo rpm, it adds new configuration in /etc/pam.d/sudo (that's the reason, the customer began to experience the issue after updating the server) session include system-auth So whenever a sudo command is executed, it also provides "lastlog" in the output since the customer have the above configuration in system-auth. The vdsm get this unexpected value and fails during processing the vgs and pvs output. With the same configuration in my test environment. ==== # su vdsm -s /bin/sh sh-4.2$ /usr/bin/sudo -n /sbin/lvm vgs Last login: Sat Dec 29 13:56:06 IST 2018 ===> last log output VG #PV #LV #SN Attr VSize VFree ec909468-4a08-49fb-885b-f707792ce6f3 1 13 0 wz--n- 48.62g 31.25g rhel_lab-orac-11g-client 1 3 0 wz--n- <59.51g 64.00m ==== My "hsm/init" also fails with the same error after adding the "pam_lastlog" in system-auth with latest sudo package. I have asked the customer to remove the mentioned line from system-auth and restart the services. Thanks, Nir for the quick patch. Thansk nijin! The fact that user configuration can change the output from any command is crazy. There is no way to use sudo in a management program in this way. I think we don't have anything to do here expect improving the log, and maybe file sudo bug. sudo should never change application output For 4.4 we can more the lvm command execution to supervdsm or another process running as root. How to test the patch: echo "session required pam_lastlog.so showfailed" >> /etc/pam.d/system-auth systemctl restart vdsmd After restarting vdsm it will fail with: InvalidOutputLine: Invalid pvs command ouptut line: 'Last login: Sat Dec 29 14:42:39 IST 2018' Remove the added line from /etc/pam.d/system-auth to recover. This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. Verified on: vdsm-4.30.10-1.el7ev.x86_64 Steps: 1. On the host run: a. echo "session required pam_lastlog.so showfailed" >> /etc/pam.d/system-auth b. systemctl restart vdsmd 2. Check vdsm log Results: VDSM restarted and showed the following error: 2019-03-06 11:22:03,917+0200 ERROR (hsm/init) [storage.HSM] FINISH thread <Thread(hsm/init, started daemon 140708589577984)> failed (concurrent:202) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 407, in storageRefresh lvm.bootstrap(skiplvs=blockSD.SPECIAL_LVS_V4) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 747, in bootstrap _lvminfo.bootstrap() File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 368, in bootstrap self._reloadpvs() File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 393, in _reloadpvs raise InvalidOutputLine("pvs", line) InvalidOutputLine: Invalid pvs command ouptut line: 'Last login: Wed Mar 6 11:22:03 IST 2019' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1077 |