Bug 1570562
Summary: | vdsm is dead after upgrade to vdsm-4.20.26-1.el7ev.x86_64 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Michael Burman <mburman> | ||||||
Component: | Core | Assignee: | Martin Perina <mperina> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Burman <mburman> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.20.19 | CC: | bugs, dfediuck, mburman, mperina, pkliczew | ||||||
Target Milestone: | ovirt-4.2.5 | Keywords: | Regression | ||||||
Target Release: | --- | Flags: | rule-engine:
ovirt-4.2+
rule-engine: blocker+ |
||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | vdsm-4.20.35 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-07-31 15:31:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1597179 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Michael Burman
2018-04-23 09:02:06 UTC
I attempted to reproduce it where I upgraded from 4.20.23-1 to 4.20.26-4.git6b485e4 and noticed similar error during stopping: Apr 25 06:13:21 localhost systemd: Stopped Virtual Desktop Server Manager. Apr 25 06:13:21 localhost systemd: Stopping Auxiliary vdsm service for running helper functions as root... Apr 25 06:13:21 localhost daemonAdapter: Traceback (most recent call last): Apr 25 06:13:21 localhost daemonAdapter: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers Apr 25 06:13:21 localhost daemonAdapter: finalizer() Apr 25 06:13:21 localhost daemonAdapter: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ Apr 25 06:13:21 localhost daemonAdapter: res = self._callback(*self._args, **self._kwargs) Apr 25 06:13:21 localhost daemonAdapter: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' but later I see: Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running mkdirs Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running configure_coredump Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running configure_vdsm_logs Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running wait_for_network Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running run_init_hooks Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running check_is_configured Apr 25 06:13:24 localhost vdsmd_init_common.sh: abrt is already configured for vdsm Apr 25 06:13:24 localhost vdsmd_init_common.sh: lvm is configured for vdsm Apr 25 06:13:24 localhost vdsmd_init_common.sh: libvirt is already configured for vdsm Apr 25 06:13:24 localhost vdsmd_init_common.sh: Current revision of multipath.conf detected, preserving Apr 25 06:13:24 localhost vdsmd_init_common.sh: vdsm: Running validate_configuration Apr 25 06:13:24 localhost vdsmd_init_common.sh: SUCCESS: ssl configured to true. No conflicts Apr 25 06:13:24 localhost vdsmd_init_common.sh: vdsm: Running prepare_transient_repository Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running syslog_available Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running nwfilter Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running dummybr Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running tune_system Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running test_space Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running test_lo Apr 25 06:13:25 localhost systemd: Started Virtual Desktop Server Manager. Apr 25 06:13:25 localhost systemd: Started MOM instance configured for VDSM purposes. Apr 25 06:13:25 localhost systemd: Starting MOM instance configured for VDSM purposes... When I activated this host it was marked as 'UP'. Can you please try to retest it on different host and use 4.20.26-4 as final vdsm version. Additional observations: In your case I see that vdsm was started which triggered supervdsm start. Supervdsm started but for some reason it was restarted and vdsm ended up being stopped. I see that you are using centos 7.5 whereas I used 7.4. based on the logs I do not see any issue with our code. Let's focus on OS differences. (In reply to Piotr Kliczewski from comment #1) > I attempted to reproduce it where I upgraded from 4.20.23-1 to > 4.20.26-4.git6b485e4 and noticed similar error during stopping: > > Apr 25 06:13:21 localhost systemd: Stopped Virtual Desktop Server Manager. > Apr 25 06:13:21 localhost systemd: Stopping Auxiliary vdsm service for > running helper functions as root... > Apr 25 06:13:21 localhost daemonAdapter: Traceback (most recent call last): > Apr 25 06:13:21 localhost daemonAdapter: File > "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers > Apr 25 06:13:21 localhost daemonAdapter: finalizer() > Apr 25 06:13:21 localhost daemonAdapter: File > "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ > Apr 25 06:13:21 localhost daemonAdapter: res = self._callback(*self._args, > **self._kwargs) > Apr 25 06:13:21 localhost daemonAdapter: OSError: [Errno 2] No such file or > directory: '/var/run/vdsm/svdsm.sock' > > but later I see: > > Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running mkdirs > Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running > configure_coredump > Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running > configure_vdsm_logs > Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running > wait_for_network > Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running run_init_hooks > Apr 25 06:13:23 localhost vdsmd_init_common.sh: vdsm: Running > check_is_configured > Apr 25 06:13:24 localhost vdsmd_init_common.sh: abrt is already configured > for vdsm > Apr 25 06:13:24 localhost vdsmd_init_common.sh: lvm is configured for vdsm > Apr 25 06:13:24 localhost vdsmd_init_common.sh: libvirt is already > configured for vdsm > Apr 25 06:13:24 localhost vdsmd_init_common.sh: Current revision of > multipath.conf detected, preserving > Apr 25 06:13:24 localhost vdsmd_init_common.sh: vdsm: Running > validate_configuration > Apr 25 06:13:24 localhost vdsmd_init_common.sh: SUCCESS: ssl configured to > true. No conflicts > Apr 25 06:13:24 localhost vdsmd_init_common.sh: vdsm: Running > prepare_transient_repository > Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running > syslog_available > Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running nwfilter > Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running dummybr > Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running tune_system > Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running test_space > Apr 25 06:13:25 localhost vdsmd_init_common.sh: vdsm: Running test_lo > Apr 25 06:13:25 localhost systemd: Started Virtual Desktop Server Manager. > Apr 25 06:13:25 localhost systemd: Started MOM instance configured for VDSM > purposes. > Apr 25 06:13:25 localhost systemd: Starting MOM instance configured for VDSM > purposes... > > When I activated this host it was marked as 'UP'. Can you please try to > retest it on different host and use 4.20.26-4 as final vdsm version. Reproduced on 4 different hosts (In reply to Piotr Kliczewski from comment #2) > Additional observations: > > In your case I see that vdsm was started which triggered supervdsm start. > Supervdsm started but for some reason it was restarted and vdsm ended up > being stopped. > > I see that you are using centos 7.5 whereas I used 7.4. based on the logs I > do not see any issue with our code. Let's focus on OS differences. I use rhel.75 with latest kernel 3.10.0-862.el7.x86_64 I tested vdsm upgrade from 4.20.23-1 to 4.20.26-4 four times by running: - install vdsm - add host - set maintenance to the host - enable repo with newer version - run yum update - activate the host - set maintenance to the host - remove host - remove vdsm from the host - disable newer repo I used rhel75 vm (fresh install) with 3.10.0-862.el7.x86_64 kernel. All four upgrades were successful. You tested with upgrade to 4.20.26-1 and I tested to 4.20.26-4. Please check whether you will be able to reproduce with newer vdsm version. (In reply to Piotr Kliczewski from comment #5) > I tested vdsm upgrade from 4.20.23-1 to 4.20.26-4 four times by running: > - install vdsm > - add host > - set maintenance to the host > - enable repo with newer version > - run yum update > - activate the host > - set maintenance to the host > - remove host > - remove vdsm from the host > - disable newer repo > > I used rhel75 vm (fresh install) with 3.10.0-862.el7.x86_64 kernel. All four > upgrades were successful. You tested with upgrade to 4.20.26-1 and I tested > to 4.20.26-4. Please check whether you will be able to reproduce with newer > vdsm version. Hi What is this version? qe has only 4.20.26-1 available, is it master build? I can test with the next d/s build we will get. (In reply to Michael Burman from comment #6) > Hi > What is this version? qe has only 4.20.26-1 available, is it master build? > I can test with the next d/s build we will get. It is 4.2 snapshot [1] repo. From my test env I have no access to the repo you are using. Please use newer vdsm build and let me know the result. [1] http://resources.ovirt.org/pub/ovirt-4.2-snapshot/rpm/el7/noarch/ (In reply to Piotr Kliczewski from comment #7) > (In reply to Michael Burman from comment #6) > > Hi > > What is this version? qe has only 4.20.26-1 available, is it master build? > > I can test with the next d/s build we will get. > > It is 4.2 snapshot [1] repo. From my test env I have no access to the repo > you are using. Please use newer vdsm build and let me know the result. > > [1] http://resources.ovirt.org/pub/ovirt-4.2-snapshot/rpm/el7/noarch/ Piotr, this is an upstream vdsm, i don't this is a good test. I can try it, but testing d/s vdsm to u/s vdsm is not good test. (In reply to Michael Burman from comment #8) > (In reply to Piotr Kliczewski from comment #7) > > (In reply to Michael Burman from comment #6) > > > Hi > > > What is this version? qe has only 4.20.26-1 available, is it master build? > > > I can test with the next d/s build we will get. > > > > It is 4.2 snapshot [1] repo. From my test env I have no access to the repo > > you are using. Please use newer vdsm build and let me know the result. > > > > [1] http://resources.ovirt.org/pub/ovirt-4.2-snapshot/rpm/el7/noarch/ > > Piotr, this is an upstream vdsm, i don't this is a good test. I can try it, > but testing d/s vdsm to u/s vdsm is not good test. I get Dependency issues with gluster packages. master vdsm requires higher gluster versions. (In reply to Michael Burman from comment #8) > (In reply to Piotr Kliczewski from comment #7) > > (In reply to Michael Burman from comment #6) > > > Hi > > > What is this version? qe has only 4.20.26-1 available, is it master build? > > > I can test with the next d/s build we will get. > > > > It is 4.2 snapshot [1] repo. From my test env I have no access to the repo > > you are using. Please use newer vdsm build and let me know the result. > > > > [1] http://resources.ovirt.org/pub/ovirt-4.2-snapshot/rpm/el7/noarch/ > > Piotr, this is an upstream vdsm, i don't this is a good test. I can try it, > but testing d/s vdsm to u/s vdsm is not good test. Michael, you should receive either vdsm-4.20.27-1.el7ev in today's 4.2.3 compose (In reply to Martin Perina from comment #10) > (In reply to Michael Burman from comment #8) > > (In reply to Piotr Kliczewski from comment #7) > > > (In reply to Michael Burman from comment #6) > > > > Hi > > > > What is this version? qe has only 4.20.26-1 available, is it master build? > > > > I can test with the next d/s build we will get. > > > > > > It is 4.2 snapshot [1] repo. From my test env I have no access to the repo > > > you are using. Please use newer vdsm build and let me know the result. > > > > > > [1] http://resources.ovirt.org/pub/ovirt-4.2-snapshot/rpm/el7/noarch/ > > > > Piotr, this is an upstream vdsm, i don't this is a good test. I can try it, > > but testing d/s vdsm to u/s vdsm is not good test. > > Michael, you should receive either vdsm-4.20.27-1.el7ev in today's 4.2.3 > compose Great, i will test it then with the new vdsm build for qe.Thanks Martin Piotr please note that the latest update we had 4.20.26-1(4.2.3-2) also included libvirt packages which maybe related to this issue. The bug easily reproduced if updating the vdsm and libvirt at once, which was the case for qe on the latest build. I just did yum history undo to the latest update(vdsm + libvirt) and did update again and it happened. libvirt-daemon-3.9.0-14.el7_5.3.x86_64 libvirt-client-3.9.0-14.el7_5.3.x86_64 (In reply to Michael Burman from comment #13) > The bug easily reproduced if updating the vdsm and libvirt at once, which > was the case for qe on the latest build. > I just did yum history undo to the latest update(vdsm + libvirt) and did > update again and it happened. > What about if you update only vdsm? > libvirt-daemon-3.9.0-14.el7_5.3.x86_64 > libvirt-client-3.9.0-14.el7_5.3.x86_64 and what about if you update libvirt only? (In reply to Michael Burman from comment #13) > The bug easily reproduced if updating the vdsm and libvirt at once, which > was the case for qe on the latest build. > I just did yum history undo to the latest update(vdsm + libvirt) and did > update again and it happened. > > libvirt-daemon-3.9.0-14.el7_5.3.x86_64 > libvirt-client-3.9.0-14.el7_5.3.x86_64 Michael, when you mentioned libvirt are you sure that you have performed the update when host was in Maintenance? If only vdsm or libvirt, seems to be ok. It doesn't matter, in both cases vdsm is dead. I usually do it when it's up, but tested with maintenance as well. vdsm should stay alive in both cases. Would it be possible to test [1]? It is a fix which was created for different issue but it could potentially solve this one as well. [1] https://gerrit.ovirt.org/#/c/89446/ (In reply to Michael Burman from comment #17) > If only vdsm or libvirt, seems to be ok. > It doesn't matter, in both cases vdsm is dead. I usually do it when it's up, > but tested with maintenance as well. vdsm should stay alive in both cases. Well, from customer point of view it does matter, because host upgrades being performed while host is not in maintenance are not supported [1]. But yeah, if reproducible while host is in Maintenance, than we need to resolve it. [1] https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/upgrade_guide/manually_updating_virtualization_hosts (In reply to Piotr Kliczewski from comment #18) > Would it be possible to test [1]? It is a fix which was created for > different issue but it could potentially solve this one as well. > > > [1] https://gerrit.ovirt.org/#/c/89446/ - Piotr, i need a d/s RPM to test this potential fix. - Martin, yes i know and understand that from customer point of few it does matter and not supported, but vdsm should be kept alive even if performing update while host is UP. As part of our new effort in QE to test manual tier 4, such tests are important and i always prefer to push the system to it's edges and not to what is maybe supported for customers. I'm personally have multiple environments and i always perform updates in 3 ways: 1) While host is in maintenance 2) While host is UP (not on SPM host) 3) While host is UP and have at least 1 VM running(not on SPM host) This way i usually see bugs(or potential bugs) others don't see. Any how, this specific bug indeed reproduced on a maintenance host)) Please retest with RHV 4.2.3-4 which contains vdsm-4.20.27-1.el7ev.x86_64.rpm Same result - Upgraded from vdsm-4.20.25-1.el7ev.x86_64 -> vdsm-4.20.27-1.el7ev.x86_64 vdsm is dead with same error - Apr 28 12:43:50 red-vds4.qa.lab.tlv.redhat.com daemonAdapter[14863]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' Apr 28 12:43:50 red-vds4.qa.lab.tlv.redhat.com systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root. This reproduced when updating vdsm+libvirt in one shot libvirt-3.9.0-14.el7_5.2 - > 3.9.0-14.el7_5.3 vdsm-4.20.25-1.el7ev.x86_64 -> vdsm-4.20.27-1.el7ev.x86_64 This bug reproduced as well when doing yum history undo to both vdsm+libvirt vdsm is dead with same error. Created attachment 1428017 [details]
failedQA logs
After offline discussion removing blocker flag and retargeting to 4.2.4 This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. This seems to be systemd bug and I would like to open one. Please provide systemd logs when the issue occurs so we can provide enough info for systemd devs to analyze. Verified on - vdsm-4.20.35-1.el7ev.x86_64 Upgraded libvirt-3.9.0-14.el7_5.2 > libvirt-3.9.0-14.el7_5.6.x86_64 Upgraded vdsm-4.20.25-1.el7ev > vdsm-4.20.35-1.el7ev.x86_64 vdsm is alive after upgrade This bugzilla is included in oVirt 4.2.5 release, published on July 30th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |