Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2137934

Summary: Controller node nova-* services are down after reboot (happens only with FDP repo)
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Maor <mblue>
Component: openvswitchAssignee: Timothy Redaelli <tredaelli>
openvswitch sub component: other QA Contact: qding
Status: CLOSED WORKSFORME Docs Contact:
Severity: urgent    
Priority: urgent CC: chrisw, ctrautma, ekuris, fleitner, qding, scohen
Version: FDP 22.J   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-28 14:28:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Maor 2022-10-26 15:56:26 UTC
Description of problem:
Controller node services (nova-conductor, nova-scheduler, nova-compute are down) not starting up after node reboot on test - 'test_ovn_dns_name_after_networker_reboot'.
This seems to happen only *when FDP repo is used* for this job.

Job link:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve-gate-ovn/238/

Failure Links:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve-gate-ovn/238/testReport/neutron_plugin.tests.scenario.test_internal_dns/InternalDNSInterruptionsAdvancedTestOvn/test_ovn_dns_name_after_networker_reboot_id_31275dd6_744b_41d2_b4ae_43116901107d_slow_/

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve-gate-ovn/241/testReport/neutron_plugin.tests.scenario.test_internal_dns/InternalDNSInterruptionsAdvancedTestOvn/test_ovn_dns_name_after_networker_reboot_id_31275dd6_744b_41d2_b4ae_43116901107d_slow_/

Jon run that passes without FDP repo:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve-gate-ovn/242/

Version-Release number of selected component (if applicable):
RHOS-17.0-RHEL-9-20220909.n.0
FDP 22.J (Didn't happen with non-FDP runs for this job)

How reproducible:
100% AFAIK
I'm working on deploying and reproducing this failure on live environment for future debugging.

Steps to Reproduce:
1. Run this test 'test_ovn_dns_name_after_networker_reboot' on this job

Actual results:
Services fail to start up after reboot, test fails.

Expected results:
Services coming back up after reboot, test passing.

Additional info:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve-gate-ovn/238/undercloud-0/home/stack/tempest-dir/tempest.log.gz

 |
 V

2022-10-17 15:49:47.839 321074 INFO tempest_helper_plugin.common.waiters [-] Node '88ed3b0d-aa0b-42fa-8d63-e1c52f642c94' reached state -> power:power on, provision:active, maintenance:False
2022-10-17 15:51:18.943 321074 INFO tempest.lib.common.rest_client [req-b7c9610b-c945-4bae-b0c8-7e938b68a803 ] Request (InternalDNSInterruptionsAdvancedTestOvn:test_ovn_dns_name_after_networker_reboot): 200 GET http://10.0.0.125:8774/v2.1/os-services 1.083s
2022-10-17 15:51:18.944 321074 DEBUG tempest.lib.common.rest_client [req-b7c9610b-c945-4bae-b0c8-7e938b68a803 ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: None
    Response - Headers: {'date': 'Mon, 17 Oct 2022 15:51:17 GMT', 'server': 'Apache', 'content-length': '1614', 'openstack-api-version': 'compute 2.1', 'x-openstack-nova-api-version': '2.1', 'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version,Accept-Encoding', 'x-openstack-request-id': 'req-b7c9610b-c945-4bae-b0c8-7e938b68a803', 'x-compute-request-id': 'req-b7c9610b-c945-4bae-b0c8-7e938b68a803', 'content-type': 'application/json', 'connection': 'close', 'status': '200', 'content-location': 'http://10.0.0.125:8774/v2.1/os-services'}
        Body: b'{"services": [{"binary": "nova-conductor", "host": "controller-0.redhat.local", "id": 2, "zone": "internal", "status": "enabled", "state": "down", "updated_at": "2022-10-17T15:48:24.000000", "disabled_reason": null}, {"binary": "nova-scheduler", "host": "controller-0.redhat.local", "id": 8, "zone": "internal", "status": "enabled", "state": "down", "updated_at": "2022-10-17T15:48:25.000000", "disabled_reason": null}, {"binary": "nova-conductor", "host": "controller-1.redhat.local", "id": 26, "zone": "internal", "status": "enabled", "state": "up", "updated_at": "2022-10-17T15:51:08.000000", "disabled_reason": null}, {"binary": "nova-conductor", "host": "controller-2.redhat.local", "id": 29, "zone": "internal", "status": "enabled", "state": "up", "updated_at": "2022-10-17T15:51:16.000000", "disabled_reason": null}, {"binary": "nova-scheduler", "host": "controller-1.redhat.local", "id": 41, "zone": "internal", "status": "enabled", "state": "up", "updated_at": "2022-10-17T15:51:14.000000", "disabled_reason": null}, {"binary": "nova-scheduler", "host": "controller-2.redhat.local", "id": 53, "zone": "internal", "status": "enabled", "state": "up", "updated_at": "2022-10-17T15:51:08.000000", "disabled_reason": null}, {"binary": "nova-compute", "host": "compute-0.redhat.local", "id": 65, "zone": "nova", "status": "enabled", "state": "up", "updated_at": "2022-10-17T15:51:09.000000", "disabled_reason": null}, {"binary": "nova-compute", "host": "compute-1.redhat.local", "id": 68, "zone": "nova", "status": "enabled", "state": "up", "updated_at": "2022-10-17T15:51:09.000000", "disabled_reason": null}]}' _log_request_full /usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py:455
2022-10-17 15:51:21.024 321074 INFO tempest.lib.common.rest_client [req-d943a211-475a-4a3f-bc8e-d00b7382fafa ] Request (InternalDNSInterruptionsAdvancedTestOvn:test_ovn_dns_name_after_networker_reboot): 200 GET http://10.0.0.125:8774/v2.1/os-services 1.075s
2022-10-17 15:51:21.025 321074 DEBUG tempest.lib.common.rest_client [req-d943a211-475a-4a3f-bc8e-d00b7382fafa ] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: None

Comment 1 Flavio Leitner 2022-10-27 15:01:34 UTC
Hi Maor,

Can you help me understand why is this an Open vSwitch issue?

The failure I see in the log is below:
"""
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: ServicesClient failed to reach within the required time (300 s).
"""

and that could be anything.

Thanks,
fbl