Bug 1588455 - RHV Hosts are continuosly logging error :- database connection failed (No such file or directory)
Summary: RHV Hosts are continuosly logging error :- database connection failed (No suc...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-provider-ovn
Version: 4.2.3
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ovirt-4.2.5
: ---
Assignee: Marcin Mirecki
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-07 11:46 UTC by Siddhant Rao
Modified: 2023-10-06 17:49 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2019-04-07 10:48:54 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-36222 0 None None None 2021-12-10 16:33:52 UTC
Red Hat Knowledge Base (Solution) 3480261 0 None None None 2018-07-13 17:27:41 UTC
oVirt gerrit 92136 0 'None' MERGED driver: do not consider non-running ovs as an error 2021-02-08 15:49:48 UTC

Description Siddhant Rao 2018-06-07 11:46:52 UTC
Description of problem:
The hosts are continuously logging the below error in /var/log/messages,
ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)

On the vdsm side we see the below errors,

var/log/vdsm/vdsm.log:2018-06-07 10:01:09,178+1200 INFO  (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0 err=Failed to get Open VSwitch system-id . err = ['ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)']


I am not sure if this is fatal or critical the moment. please let me know if it is critical.


Version-Release number of selected component (if applicable):
rhvm-4.2.3.5-0.1.el7.noarch
ovirt-provider-ovn-1.2.10-1.el7ev.noarch

How reproducible: 100%

Steps to Reproduce:
1. Run the below command on the Host just after installing the Host,
# ovs-vsctl get Open_vSwitch . external_ids:system-id

Actual results:
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)

Expected results:
"# ovs-vsctl get Open_vSwitch . external_ids:system-id" should work.

Additional info:

On seeing the below error i inspected the Hook '/usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook',

var/log/vdsm/vdsm.log:2018-06-07 10:01:09,178+1200 INFO  (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0 err=Failed to get Open VSwitch system-id . err = ['ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)']



So it seems it is failing here below, (Please confirm)

CMD_LINE = ['ovs-vsctl', 'get', 'Open_vSwitch', '.', 'external_ids:system-id']

def _get_open_vswitch_host_id():
    retcode, out, err = hooking.execCmd(CMD_LINE, sudo=True)
    if retcode == 0:
        return out[0].replace('"', '')
    hooking.log('Failed to get Open VSwitch system-id . err = %s' % (err))


Running the elements in CMD_LINE manually we get the below,

# ovs-vsctl get Open_vSwitch . external_ids:system-id
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)


After this i checked that the service of openvswitch was dead,

# systemctl status openvswitch
● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Please note that the service is disabled.

On starting the service and trying again it works,

# systemctl start openvswitch
# ovs-vsctl get Open_vSwitch . external_ids:system-id
"69f42e91-6d45-45b0-bf72-9315f787329a"

So i believe the problem is that the service is disabled by default (kindly confirm on this) also if any additional change or any additional data is needed kindly let me know for the same.

Regards,
Siddhant Rao

Comment 1 Marcin Mirecki 2018-06-07 14:18:29 UTC
I assume the cause was the openvswitch service not being started.
How did you condfigure the host?
Did you use vdsm-tool: vdsm-tool ovn-config <ovn central ip> <tunneling ip>?
This command should enable and start the openvswitch and ovn-controller services.
Any errors while executing these?

Comment 2 Dan Kenigsberg 2018-06-07 16:36:33 UTC
Was the host added to rhvm-4.2, or is it an upgrade? Does its cluster have ovn as its default provider? Did you intend to use ovn?

On non-ovn cluster, we expect ovs service not to be running, and the ovn driver must not spam the logs.

Comment 4 Siddhant Rao 2018-06-11 19:13:47 UTC
Marcin,

How did you condfigure the host?
Did you use vdsm-tool: vdsm-tool ovn-config <ovn central ip> <tunneling ip>?

Ans. I did not configure the host as such using the vdsm-tool command you mentioned above, these errors feature immediately after the Manager is upgraded.

I am not familiar with <ovn central ip> or <tunneling ip>, let me check these and see if i get any errors.




Dan,

The Host was upgraded to 4.2 , The cluster does have ovn as it's default provider.


---------

This is seen immediately after the Manager and any one Host (in my test setup) is upgraded to 4.2.3 , after we select "Yes" to configure ovn-provider. We are asked the below in the engine-setup,

oVirt OVN provider user[admin@internal]: 
oVirt OVN provider password:

After entering these and after the engine-setup ends, we observe that the openvswitch service is running on the manager but not on the Hosts.

Should the service be disabled on the Host even if OVN is configured during the engine-setup?.

Comment 6 Marcin Mirecki 2018-06-12 09:36:30 UTC
Hi Siddhant,

Sorry for the too detailed question, I assumed you have configured OVN on the host and that was not working.
The error message is raised when vdsm tries to gather some additional information needed by the ovn provider. This it can not do, because the ovs service is not running.

This error is not critical, and can be ignored when you are not using OVN networks on the host.

I will add a fix to mute the log message if ovn is not configured properly on the host, and add this in a more suitable place (one that does not spam the log).

Comment 7 Sandro Bonazzola 2018-06-29 16:28:11 UTC
Dan, this bug moved to modified without adding any patch.
Can you please detail which builds will contain the fix?

Comment 8 Marcin Mirecki 2018-07-04 07:05:55 UTC
This bug is fixed by: https://gerrit.ovirt.org/#/c/92136/
contained in build: ovirt-provider-ovn-1.2.12-1

Comment 9 Meni Yakove 2018-07-04 08:29:11 UTC
ovs-vsctl get Open_vSwitch . external_ids:system-id
"7c6a0eec-6d83-4ce3-9215-e83cb39651a4"

Verified on ovirt-provider-ovn-driver-1.2.12-1.el7ev.noarch

Comment 10 Marina Kalinin 2018-08-31 20:01:29 UTC
Meni / Marcin,

This bug is on verified and 4.2.5 is out.
Why is this bug still open?
Who can confirm the correct status of it?

Thank you!

Comment 11 Michael Burman 2018-09-01 13:55:54 UTC
(In reply to Marina from comment #10)
> Meni / Marcin,
> 
> This bug is on verified and 4.2.5 is out.
> Why is this bug still open?
> Who can confirm the correct status of it?
> 
> Thank you!

Marina,
The fix for this bug is for ovirt-provider-ovn and it was verified by Meni(see comment9) with rhv 4.2.5
The status is verified, i have no idea what it didn't moved to current release.
Let's wait for Marcin's reply.

Comment 13 Marina Kalinin 2018-09-04 18:56:55 UTC
https://access.redhat.com/errata/RHBA-2018:2471

Comment 15 Klaas Demter 2019-04-04 06:52:15 UTC
This is still happening with ovirt-provider-ovn-driver-1.2.16-1.el7ev.noarch
"ovs-vsctl: ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)" in logs, seems to correspond to the rhv-manager being restarted.

The cluster is not using ovn, service is disabled.

See case 02352174

Greetings
Klaas

Comment 16 Klaas Demter 2019-04-04 06:54:18 UTC
Also have another host with ovirt-provider-ovn-driver-1.2.17-1.el7ev.noarch with the same error in logs

Comment 17 Siddhant Rao 2019-04-04 11:50:03 UTC
Hello,

I have a customer who is still facing this problem even after updating with the packages in which the fix has been shipped.
Please have a look into it.
Let me know if you require any logs or if you need any other information.

The customer is using the below versions of the package,

rhvm-4.2.6.4-0.1.el7ev.noarch
ovirt-provider-ovn-1.2.14-1.el7ev.noarch

On the Host :

ovirt-provider-ovn-driver-1.2.16-1.el7ev.noarch
openvswitch-2.9.0-70.el7fdp.1.x86_64

Comment 18 Michael Burman 2019-04-07 10:48:54 UTC
(In reply to Siddhant Rao from comment #17)
> Hello,
> 
> I have a customer who is still facing this problem even after updating with
> the packages in which the fix has been shipped.
> Please have a look into it.
> Let me know if you require any logs or if you need any other information.
> 
> The customer is using the below versions of the package,
> 
> rhvm-4.2.6.4-0.1.el7ev.noarch
> ovirt-provider-ovn-1.2.14-1.el7ev.noarch
> 
> On the Host :
> 
> ovirt-provider-ovn-driver-1.2.16-1.el7ev.noarch
> openvswitch-2.9.0-70.el7fdp.1.x86_64

Hi Siddhant,

Please open a new bug to track the issue your customer is facing. This current bug was verified and closed as current release and should stay this way. Thanks


Note You need to log in before you can comment on or make changes to this bug.