Bug 2154796 - Add an appctl cmd to reconnect ovn-controller when SB goes down
Summary: Add an appctl cmd to reconnect ovn-controller when SB goes down
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn22.12
Version: FDP 22.L
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: OVN Team
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-19 09:59 UTC by zenghui.shi
Modified: 2022-12-20 02:23 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-20 02:23:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2553 0 None None None 2022-12-19 10:10:28 UTC

Description zenghui.shi 2022-12-19 09:59:35 UTC
Description of problem:

ovn-controller inactivity probes are disabled on unix connections. 

The problem is that if somehow the SB goes down ovn-controller might not be able to determine on its own that the connection went away (for example if ovn-controller doesn't need to send data to the SB which is often the case). Although we could restart ovn-controller every time the SB goes down to reconnect, it would be good to have a way to tell ovn-controller to reconnect. for example, an appctl command.

Comment 1 Ilya Maximets 2022-12-19 12:40:19 UTC
(In reply to zenghui.shi from comment #0)
> The problem is that if somehow the SB goes down ovn-controller might not be
> able to determine on its own that the connection went away (for example if
> ovn-controller doesn't need to send data to the SB which is often the case).

Hmm.   I'm not sure if that is actually a problem.  Unix sockets are
typically good at waking up processes that are polling them in case the
other side goes down.  ovn-controller should receive a POLLERR, wake up
and try to re-connect.   It's not the case for the communication over
the network because remote node can go away without signalling.  But on
the same host the kernel always knows that the other process is dead,
so the state of a unix socket connection should always be clear.

Comment 2 Dumitru Ceara 2022-12-19 14:35:32 UTC
(In reply to Ilya Maximets from comment #1)
> (In reply to zenghui.shi from comment #0)
> > The problem is that if somehow the SB goes down ovn-controller might not be
> > able to determine on its own that the connection went away (for example if
> > ovn-controller doesn't need to send data to the SB which is often the case).
> 
> Hmm.   I'm not sure if that is actually a problem.  Unix sockets are
> typically good at waking up processes that are polling them in case the
> other side goes down.  ovn-controller should receive a POLLERR, wake up
> and try to re-connect.   It's not the case for the communication over
> the network because remote node can go away without signalling.  But on
> the same host the kernel always knows that the other process is dead,
> so the state of a unix socket connection should always be clear.

You're right.  My bad, I had the (wrong) impression that ovn-controller
won't receive any event.  I think it's safe to close this as NOTABUG.

@zshi what do you think?

Comment 3 zenghui.shi 2022-12-20 02:23:23 UTC
> > Hmm.   I'm not sure if that is actually a problem.  Unix sockets are
> > typically good at waking up processes that are polling them in case the
> > other side goes down.  ovn-controller should receive a POLLERR, wake up
> > and try to re-connect.   It's not the case for the communication over
> > the network because remote node can go away without signalling.  But on
> > the same host the kernel always knows that the other process is dead,
> > so the state of a unix socket connection should always be clear.
> 
> You're right.  My bad, I had the (wrong) impression that ovn-controller
> won't receive any event.  I think it's safe to close this as NOTABUG.
> 
> @zshi what do you think?

Agree, let's close it as NOTABUG.

Thanks Ilya and Dumitru for the quick response and clarification!


Note You need to log in before you can comment on or make changes to this bug.