Bug 2119224 - ovn-dbs resource remains in 'stopped' state after the Controller reboot.
Summary: ovn-dbs resource remains in 'stopped' state after the Controller reboot.
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Fernando Royo
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-17 23:29 UTC by Julia Marciano
Modified: 2023-07-12 16:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-18249 0 None None None 2022-08-17 23:36:12 UTC

Description Julia Marciano 2022-08-17 23:29:40 UTC
DescrAiption of problem:

This issue reproduces from time to time by Tobiko tests and was reported in the past in https://bugzilla.redhat.com/show_bug.cgi?id=1914911#c17:

After a hard reboot of controller node that holds OC main vip, ovn-dbs pacemaker resource is being stopped and never recovers again.
Before the reboot, all ovn-dbs resources are healthy:

<tobiko.shell.sh._ssh.SSHShellProcessFixture object at 0x7f86d1434080>
2022-08-16 01:21:18.872 310259 DEBUG tobiko.shell.sh._execute - Command executed:
command: 'sudo pcs status resources |grep ocf'
exit_status: 0
...
       * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
        * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Slave controller-1
        * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2

Than the controller-1 is hard-rebooted:
2022-08-16 01:21:18.938 310259 INFO tobiko.shell.sh._reboot - Host '192.168.24.18' is rebooting (command='sudo /bin/sh -c 'echo 1 > /proc/sys/kernel/sysrq && echo b > /proc/sysrq-trigger'').

After the reboot, ovn-dbs-bundle-1 trying to start, but finally goes into 'stopped' state on controller-1:
        * ovn-dbs-bundle-0	(ocf::ovn:ovndb-servers):	 Master controller-0
        * ovn-dbs-bundle-1	(ocf::ovn:ovndb-servers):	 Stopped controller-1
        * ovn-dbs-bundle-2	(ocf::ovn:ovndb-servers):	 Slave controller-2
      * ip-172.17.1.121	(ocf::heartbeat:IPaddr2):	 Started controller-0
        * openstack-cinder-volume-podman-0	(ocf::heartbeat:podman):	 Started controller-0
    

2022-08-16 01:35:00.682 310259 DEBUG tobiko.tripleo.pacemaker - Got pcs status :

Version-Release number of selected component (if applicable):


How reproducible:
From time to time.

Steps to Reproduce:
1. Run job [1] with Tobiko falts tests (Please see links is in comment bellow).
2.
3.

Actual results:
ovn-dbs-bundle-1 resorce is stopped on controller-1.

Expected results:
all ovn-dbs resources are healthy(1 master and 2 slaves)


Additional info:


Note You need to log in before you can comment on or make changes to this bug.