Bug 1393012
Summary: | when restarting vdsm, mom also restarting, and immediately failed. | ||
---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Shira Maximov <mshira> |
Component: | Services | Assignee: | Martin Sivák <msivak> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Shira Maximov <mshira> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 4.18.15 | CC: | alukiano, bugs, gklein, mburman, mgoldboi, msivak |
Target Milestone: | ovirt-4.0.6 | Keywords: | AutomationBlocker, Regression |
Target Release: | 4.18.18 | Flags: | gklein:
ovirt-4.0.z?
gklein: blocker? gklein: planning_ack? rule-engine: devel_ack+ rule-engine: testing_ack+ |
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
VDSM takes some time to start and MOM is fast in trying to contact it.
Consequence:
MOM died with connection refused error.
Fix:
MOM is configured to try multiple times now (each 5 seconds, but no more than 10 times per two minutes). Exceeding the retry counter will cause MOM to shut down permanently as the issue is more substantial and needs sysadmins intervention. MOM can be re-enabled by using systemctl reset-failed mom-vdsm.service
Result:
MOM will start eventually. It might log one or two restarts.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-01-18 07:29:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Shira Maximov
2016-11-08 17:05:11 UTC
*** Bug 1396826 has been marked as a duplicate of this bug. *** I am moving this to VDSM component as that is where the fix will be published. *** Bug 1397268 has been marked as a duplicate of this bug. *** I don't think this bug should be ON_QA, it actually should failed QA. messages log still spammed with 'vdsm MOM WARN MOM not available, KSM stats will be missing.' on vdsm-4.18.17-1.el7ev.x86_64 The fix will be part of 4.0.6. It is already merged too, but the automation is not smart enough to know that it is not present in the last weeks build.. I still see this bug on 4.0.6.3-0.1.el7ev vdsm-4.18.18-4.git198e48d.el7ev.x86_64 Dec 11 11:38:11 silver-vdsa journal: vdsm MOM WARNING MOM not available. Dec 11 11:38:11 silver-vdsa journal: vdsm MOM WARNING MOM not available, KSM stats will be missing. Dec 11 11:38:27 silver-vdsa journal: vdsm MOM WARNING MOM not available. Dec 11 11:38:27 silver-vdsa journal: vdsm MOM WARNING MOM not available, KSM stats will be missing. Dec 11 11:38:43 silver-vdsa journal: vdsm MOM WARNING MOM not available. Don't think this report should be ON_QA Michael: Check /usr/lib/systemd/system/mom-vdsm.service, it should contain this: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=init/systemd/mom-vdsm.service.in;h=e83acb40fcfd12f78d27503e92f7e5e4068b731f;hb=e4be1485f2a905c247d935aed72b106026713478 The link will show you the content of the file we ship with 4.18.18 and it contains the proper fix. So you either - haven't reloaded systemd, have an .rpmnew file or systemd override laying somewhere (/etc/systemd/system/mom-vdsm.*) or triggered the maximum failure count (So check /var/log/vdsm/mom.log and journalctl for mom-vdsm). Martin, cat /usr/lib/systemd/system/mom-vdsm.service [Unit] Description=MOM instance configured for VDSM purposes Requires=vdsmd.service After=vdsmd.service Conflicts=momd.service ksmtuned.service [Service] Type=simple LimitCORE=infinity ExecStart=/usr/sbin/momd -c /etc/vdsm/mom.conf Restart=always RestartSec=5 User=vdsm Group=kvm TimeoutStopSec=10 StartLimitBurst=10 StartLimitInterval=120 [Install] WantedBy=multi-user.target Looks like my vdsm vdsm-4.18.18-4.git198e48d.el7ev.x86_64 do contain the fix but messages log is still spammed with those errors. Do i need to do something after updating vdsm? Check whether mom is up or down: systemctl status mom-vdsm.service and get the reason why it is down. Check journalctl for mom to see why it was not restarted: journalctl --unit mom-vdsm.service Check mom log for errors: cat /var/log/vdsm/mom.log Check what happens when you try starting mom manually: systemctl start mom-vdsm.service If mom was not started because of triggering the burst limits, use the following to reset them: systemctl reset-failed mom-vdsm.service Btw, you can also try systemctl daemon-reload to make sure systemd knows about the updated unit files. Try to verify on : vdsm-4.18.18-4.git198e48d.el7ev.x86_64 mom-0.5.8-1.el7ev.noarch Verification steps: 0. look at mom logs 1. restart Vdsm 2. Check in mom logs again immediately after upgrade vdsm, mom is down with the same failure: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run hypervisor_iface = self.get_hypervisor_interface() File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in get_hypervisor_interface return module.instance(self.config) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py", line 47, in instance return JsonRpcVdsmBulkInterface() File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py", line 29, in __init__ super(JsonRpcVdsmBulkInterface, self).__init__() File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py", line 43, in __init__ .orRaise(RuntimeError, 'No connection to VDSM.') File "/usr/lib/python2.7/site-packages/mom/optional.py", line 28, in orRaise raise exception(*args, **kwargs) RuntimeError: No connection to VDSM. But after restarting vdsmd service, mom is running OK. I'm not sure that this is valid behaviour, Martin is this OK that mom didn't work after upgrading vdsm? Verifying this bug, msivak check and this is a valid behavior. |