Bug 1374976

Summary: mom is not available for vdsClient commands on ngn vlan devices
Product: [oVirt] mom Reporter: Michael Burman <mburman>
Component: CoreAssignee: Martin Sivák <msivak>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 0.5.5CC: bugs, danken, dfediuck, fdeutsch, mavital, mburman, mgoldboi
Target Milestone: ovirt-4.0.4Keywords: TestOnly
Target Release: 0.5.5Flags: rule-engine: ovirt-4.0.z+
mgoldboi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-26 12:39:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs
none
vdsm logs none

Description Michael Burman 2016-09-11 08:37:18 UTC
Created attachment 1199867 [details]
logs

Description of problem:
mom is not available for vdsClient commands on ngn vlan devices.

After adding a rhvh-4.0-0.20160906.0+1 server to rhv-m 4.0.4 on top a vlan device(configured via ifcfg-* files) the mom isn't available. 

vdsClient -s 0 getVdsCaps
/usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import utils, vdscli, constants
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsClient.py", line 2980, in <module>
    code, message = commands[command][0](commandArgs)
  File "/usr/share/vdsm/vdsClient.py", line 543, in do_getCap
    return self.ExecAndExit(self.s.getVdsCapabilities())
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/m2cutils.py", line 203, in connect
    sock = socket.create_connection((self.host, self.port), self.timeout)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 110] Connection timed out


Version-Release number of selected component (if applicable):
mom-0.5.5-1.el7ev.noarch
rhvh-4.0-0.20160906.0+1
vdsm-4.18.11-1.el7ev.x86_64

How reproducible:
100% on rhvh-4.0-0.20160906.0+1 and vlan devices that were configured via ifcfg-* files. 
Non vlan devices worked as expected.
On rhel servers all devices worked. 

Steps to Reproduce:
1. Create a vlan device(nic/bond/static/dhcp) on clean rhvh-4.0-0.20160906.0+1 with ifcfg-* files and restart network
2. Successfully add host to rhv-m 4.0.4
3. Run vdsClient command on host

Actual results:
Connection timed out because mom isn't available
- In UI refresh caps seems to work

Expected results:
Should work

Comment 1 Yaniv Kaul 2016-09-11 13:11:51 UTC
That's quite a confusing bug - is that a mom bug? Is that because of the VLAN, because of RHVH, or a combo of both?

Comment 2 Michael Burman 2016-09-11 13:42:54 UTC
I'm not sure if it's a mom bug, but the issue is caused and seen in a combo of both VLAN and RHVH.

Comment 3 Martin Sivák 2016-09-12 07:51:39 UTC
1) I am not sure we support manual edits of ifcfg files at all
2) VDSM uses unix domain sockets to talk to MOM, that should not be affected by VLANs at all
3) MOM uses xmlrpc client as provided by VDSM and that might stall in case the network gets bad.. but there is no data to indicate this
4) Your traceback is coming from vdsClient when trying to connect to VDSM, not to MOM


So my main question is.. where did you see anything MOM related when testing this?

Comment 4 Michael Burman 2016-09-13 05:21:33 UTC
Hi Martin, i'm not sure it's mom related issue, but, i see it related to mom when trying to run vdsClient commands on the server. 
Please advice with who i should contact in order to investigate this thing, i have a server with this issue waiting for someone to take a look. Thanks!

Comment 5 Martin Sivák 2016-09-13 07:29:04 UTC
(In reply to Michael Burman from comment #4)
> i see it related to mom
> when trying to run vdsClient commands on the server.

Where do you see it? The issue is definitely not on mom side since you cannot connect to VDSM using neither MOM nor vdsClient.

> Please advice with who i should contact in order to investigate this thing,
> i have a server with this issue waiting for someone to take a look. Thanks!

Someone from VDSM networking? But they will ask for vdsm logs too.

Comment 6 Michael Burman 2016-09-13 07:57:29 UTC
I see it journalctl output -

Sep 13 10:53:36 orchid-vds2.qa.lab.tlv.redhat.com vdsm[22629]: vdsm MOM WARNING MOM not available.
Sep 13 10:53:36 orchid-vds2.qa.lab.tlv.redhat.com vdsm[22629]: vdsm MOM WARNING MOM not available, KSM stats will be missing.
Sep 13 10:53:36 orchid-vds2.qa.lab.tlv.redhat.com vdsm[22629]: vdsm root ERROR Report host stats failed
                                                               Traceback (most recent call last):
                                                                 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 113, in report_stats
                                                                   report[prefix + '.cpu.ksm_pages'] = hoststats['ksmPages']
                                                               KeyError: 'ksmPages'

Comment 7 Michael Burman 2016-09-13 07:58:31 UTC
Created attachment 1200379 [details]
vdsm logs

Comment 8 Michael Burman 2016-09-13 10:42:27 UTC
Danken has found the reason that preventing from running a vdsClient -s 0 command on vlan devices and it's because vdsm trying to connect to fqdn that isn't the vlan one.

- For example in our server - orchid-vds2.qa.lab.tlv.redhat.com the fqdn for vlan 162 is orchid-vds2-vlan162.qa.lab.tlv.redhat.com, when we configuring a device with vlan 162 and restarting network, the hostname isn't updated and
vdsm trying to connect to orchid-vds2.qa.lab.tlv.redhat.com and it's why it failing. Once running the vdsClient command with the ip or with the correct fqdn it is working, but this is not explaining the 'MOM not available' errors in the logs.

Comment 9 Martin Sivák 2016-09-20 09:07:04 UTC
Ahh so this might be the same issue we saw in 

https://bugzilla.redhat.com/show_bug.cgi?id=1377161

and

https://bugzilla.redhat.com/show_bug.cgi?id=1358530


Can you please check it and confirm that it is the same behaviour? If it is, then this can be marked as test only, because it was fixed in

https://gerrit.ovirt.org/#/c/63308/2

Comment 10 Martin Sivák 2016-09-20 09:08:38 UTC
MOM not available messages are caused by network timeout, as MOM itself is stuck (or down) waiting for vdsm connection.

Comment 11 Michael Burman 2016-09-20 15:08:13 UTC
(In reply to Martin Sivák from comment #9)
> Ahh so this might be the same issue we saw in 
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1377161
> 
> and
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1358530
> 
> 
> Can you please check it and confirm that it is the same behaviour? If it is,
> then this can be marked as test only, because it was fixed in
> 
> https://gerrit.ovirt.org/#/c/63308/2

Hi Martin, 
I can't confirm that this bugs are the same issues, but i can confirm that when using rhvh-4.0-0.20160919.0+1 with vdsm-4.18.13-1.el7ev.x86_64 everything working as expected. 
running vdsClient -s 0 getVdsCaps on a vlan device is OK. Thanks

Comment 12 Red Hat Bugzilla Rules Engine 2016-09-21 10:15:04 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 13 Michael Burman 2016-09-25 08:19:10 UTC
Verified on - rhvh-4.0-0.20160919.0+1 and vdsm-4.18.13-1.el7ev.x86_64