Bug 2119224

Summary: ovn-dbs resource remains in 'stopped' state after the Controller reboot.
Product: Red Hat OpenStack Reporter: Julia Marciano <jmarcian>
Component: python-networking-ovnAssignee: Fernando Royo <froyo>
Status: NEW --- QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16.2 (Train)CC: apevec, eolivare, froyo, jlibosva, lhh, majopela, scohen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Julia Marciano 2022-08-17 23:29:40 UTC
DescrAiption of problem:

This issue reproduces from time to time by Tobiko tests and was reported in the past in https://bugzilla.redhat.com/show_bug.cgi?id=1914911#c17:

After a hard reboot of controller node that holds OC main vip, ovn-dbs pacemaker resource is being stopped and never recovers again.
Before the reboot, all ovn-dbs resources are healthy:

<tobiko.shell.sh._ssh.SSHShellProcessFixture object at 0x7f86d1434080>
2022-08-16 01:21:18.872 310259 DEBUG tobiko.shell.sh._execute - Command executed:
command: 'sudo pcs status resources |grep ocf'
exit_status: 0
...
       * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
        * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
        * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2

Than the controller-1 is hard-rebooted:
2022-08-16 01:21:18.938 310259 INFO tobiko.shell.sh._reboot - Host '192.168.24.18' is rebooting (command='sudo /bin/sh -c 'echo 1 > /proc/sys/kernel/sysrq && echo b > /proc/sysrq-trigger'').

After the reboot, ovn-dbs-bundle-1 trying to start, but finally goes into 'stopped' state on controller-1:
        * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
        * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Stopped controller-1
        * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
      * ip-172.17.1.121	(ocf::heartbeat:IPaddr2):	 Started controller-0
        * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    

2022-08-16 01:35:00.682 310259 DEBUG tobiko.tripleo.pacemaker - Got pcs status :

Version-Release number of selected component (if applicable):


How reproducible:
From time to time.

Steps to Reproduce:
1. Run job [1] with Tobiko falts tests (Please see links is in comment bellow).
2.
3.

Actual results:
ovn-dbs-bundle-1 resorce is stopped on controller-1.

Expected results:
all ovn-dbs resources are healthy(1 master and 2 slaves)


Additional info: