Bug 1789822 - Controller replacement breaks Swift config
Summary: Controller replacement breaks Swift config
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: zstream
: 16.1 (Train on RHEL 8.2)
Assignee: Christian Schwede (cschwede)
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks: 1793684 1794758
TreeView+ depends on / blocked
 
Reported: 2020-01-10 14:07 UTC by David Rosenfeld
Modified: 2020-12-15 18:36 UTC (History)
16 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20200905153422.e621f61.el8ost
Doc Type: Known Issue
Doc Text:
Replacement of an overcloud Controller might cause swift rings to become inconsistent across nodes. This results in decreased availability of Object Storage service. + Workaround: Log in to the previously existing Controller node using SSH, deploy the updated rings, and restart the Object Storage containers: ``` (undercloud) [stack@undercloud-0 ~]$ source stackrc (undercloud) [stack@undercloud-0 ~]$ nova list ... | 3fab687e-99c2-4e66-805f-3106fb41d868 | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.17 | | a87276ea-8682-4f27-9426-6b272955b486 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.38 | | a000b156-9adc-4d37-8169-c1af7800788b | controller-3 | ACTIVE | - | Running | ctlplane=192.168.24.35 + (undercloud) [stack@undercloud-0 ~]$ for ip in 192.168.24.17 192.168.24.38 192.168.24.35; do ssh $ip 'sudo podman restart swift_copy_rings ; sudo podman restart $(sudo podman ps -a --format="{{.Names}}" --filter="name=swift_*")'; done ```
Clone Of:
: 1793684 (view as bug list)
Environment:
Last Closed: 2020-12-15 18:35:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1892674 0 None None None 2020-08-24 06:14:58 UTC
OpenStack gerrit 747621 0 None MERGED Fix Swift ring file synchronization issue 2020-12-24 09:21:30 UTC
Red Hat Knowledge Base (Solution) 3690491 0 None None None 2020-06-15 14:40:45 UTC
Red Hat Product Errata RHEA-2020:5413 0 None None None 2020-12-15 18:36:45 UTC

Description David Rosenfeld 2020-01-10 14:07:35 UTC
Description of problem: Controller Replacement regression jobs are successful until test Tempest stage. Between 30 and 40 Tempest tests fail. Example of failed Tempest tests are:

Run Tempest Tests / tempest.api.object_storage.test_container_services_negative.ContainerNegativeTest.test_delete_non_empty_container[id-42da116e-1e8c-4c96-9e06-2f13884ed2b1,negative]
Run Tempest Tests / tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_add_remove_network_from_dhcp_agent[id-a0856713-6549-470c-a656-e97c8df9a14d]
Run Tempest Tests / tempest.api.network.admin.test_dhcp_agent_scheduler.DHCPAgentSchedulersTestJSON.test_list_networks_hosted_by_one_dhcp[id-30c48f98-e45d-4ffb-841c-b8aad57c7587]
Run Tempest Tests / .tearDownClass (tempest.api.object_storage.test_account_quotas.AccountQuotasTest)
Run Tempest Tests / tempest.api.object_storage.test_container_quotas.ContainerQuotasTest.test_upload_too_many_objects[id-3a387039-697a-44fc-a9c0-935de31f426b,smoke]
Run Tempest Tests / tempest.api.object_storage.test_container_quotas.ContainerQuotasTest.test_upload_valid_object[id-9a0fb034-86af-4df0-86fa-f8bd7db21ae0,smoke]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_end_marker[id-55b4fa5c-e12e-4ca9-8fcf-a79afe118522]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_format_json[id-196f5034-6ab0-4032-9da9-a937bbb9fba9]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_format_xml[id-655a53ca-4d15-408c-a377-f4c6dbd0a1fa]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_limit[id-297ec38b-2b61-4ff4-bcd1-7fa055e97b61]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_marker[id-c31ddc63-2a58-4f6b-b25c-94d2937e6867]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_no_object[id-4646ac2d-9bfb-4c7d-a3c5-0f527402b3df]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_path[id-58ca6cc9-6af0-408d-aaec-2a6a7b2f0df9]
Run Tempest Tests / tempest.api.object_storage.test_object_version.ContainerTest.test_versioned_container[id-a151e158-dcbf-4a1f-a1e7-46cd65895a6f]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_contents_with_prefix[id-77e742c7-caf2-4ec9-8aa4-f7d509a3344c]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_container_metadata[id-96e68f0e-19ec-4aa2-86f3-adc6a45e14dd,smoke]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_list_no_container_metadata[id-a2faf936-6b13-4f8d-92a2-c2278355821e]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_update_container_metadata_with_create_and_delete_metadata[id-cf19bc0b-7e16-4a5a-aaed-cb0c2fe8deef]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_update_container_metadata_with_create_metadata[id-2ae5f295-4bf1-4e04-bfad-21e54b62cec5]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_update_container_metadata_with_create_metadata_key[id-31f40a5f-6a52-4314-8794-cd89baed3040]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_update_container_metadata_with_delete_metadata[id-3a5ce7d4-6e4b-47d0-9d87-7cd42c325094]
Run Tempest Tests / tempest.api.object_storage.test_container_services.ContainerTest.test_update_container_metadata_with_delete_metadata_key[id-a2e36378-6f1f-43f4-840a-ffd9cfd61914]
Run Tempest Tests / tempest.api.object_storage.test_container_staticweb.StaticWebTest.test_web_listing_css[id-bc37ec94-43c8-4990-842e-0e5e02fc8926]
Run Tempest Tests / .tearDownClass (tempest.api.object_storage.test_container_staticweb.StaticWebTest)
Run Tempest Tests / tempest.scenario.test_object_storage_basic_ops.TestObjectStorageBasicOps.test_swift_basic_ops[id-b920faf1-7b8a-4657-b9fe-9c4512bfb381,object_storage]
Run Tempest Tests / tempest.api.object_storage.test_object_services.ObjectTest.test_get_object_with_x_object_manifest[id-11b4515b-7ba7-4ca8-8838-357ded86fc10]
Run Tempest Tests / tempest.api.object_storage.test_object_slo.ObjectSloTest.test_delete_large_object[id-87b6dfa1-abe9-404d-8bf0-6c3751e6aa77]
Run Tempest Tests / tempest.api.object_storage.test_object_services.ObjectTest.test_object_upload_in_segments[id-e3e6a64a-9f50-4955-b987-6ce6767c97fb]
Run Tempest Tests / tempest.api.object_storage.test_object_services.ObjectTest.test_update_object_metadata_with_create_and_remove_metadata[id-f726174b-2ded-4708-bff7-729d12ce1f84]
Run Tempest Tests / tempest.api.object_storage.test_object_slo.ObjectSloTest.test_retrieve_large_object[id-49bc49bc-dd1b-4c0f-904e-d9f10b830ee8]
Run Tempest Tests / tempest.api.object_storage.test_object_slo.ObjectSloTest.test_upload_manifest[id-2c3f24a6-36e8-4711-9aa2-800ee1fc7b5b]
Run Tempest Tests / .tearDownClass (tempest.api.object_storage.test_object_services.ObjectTest)
Run Tempest Tests / tempest.api.object_storage.test_object_temp_url.ObjectTempUrlTest.test_put_object_using_temp_url[id-9b08dade-3571-4152-8a4f-a4f2a873a735]
Run Tempest Tests / tempest.api.object_storage.test_account_bulk.BulkTest.test_extract_archive[id-a407de51-1983-47cc-9f14-47c2b059413c]
Run Tempest Tests / .tearDownClass (tempest.api.object_storage.test_object_temp_url_negative.ObjectTempUrlNegativeTest)
Run Tempest Tests / tempest.api.object_storage.test_container_acl.ObjectTestACLs.test_read_object_with_rights[id-a3270f3f-7640-4944-8448-c7ea783ea5b6]
Run Tempest Tests / tempest.api.object_storage.test_container_acl.ObjectTestACLs.test_write_object_with_rights[id-aa58bfa5-40d9-4bc3-82b4-d07f4a9e392a]
Run Tempest Tests / tempest.api.object_storage.test_account_services.AccountTest.test_list_containers[id-3499406a-ae53-4f8c-b43a-133d4dc6fe3f,smoke]
Run Tempest Tests / tempest.api.object_storage.test_account_services.AccountTest.test_list_containers_with_limit[id-5cfa4ab2-4373-48dd-a41f-a532b12b08b2]
Run Tempest Tests / tempest.api.object_storage.test_account_services.AccountTest.test_list_containers_with_marker_and_end_marker[id-ac8502c2-d4e4-4f68-85a6-40befea2ef5e]

Version-Release number of selected component (if applicable): RHOS_TRUNK-16.0-RHEL-8-20200103.n.1


How reproducible: Every time the controller replacement Jenkins regression jobs are executed.


Steps to Reproduce:
1. Execute DFG-df-controller_replacement-16-virthost-3cont_3comp-yes_UC_SSL-yes_OC_SSL-lvm-ipv4-geneve-replace_controller-corrupt_disk-RHELOSP-38494 job in Jenkins
2.
3.

Actual results: 30 to 40 Tempest tests fail


Expected results: Tempest tests complete successfully


Additional info:

Comment 13 Christian Schwede (cschwede) 2020-08-24 06:14:58 UTC
I found the regression in Train/RHOSP16, and opened an upstream bug [1] and proposed a patch to fix it [2].

[1] https://bugs.launchpad.net/tripleo/+bug/1892674
[2] https://review.opendev.org/#/c/747621/

This only applies to Stein and Train. I'm not sure if this is the same reason Takashi found on RHOSP13, but I will look into that next.

Comment 14 Christian Schwede (cschwede) 2020-09-03 09:51:31 UTC
After debugging this further, it shows that this is not a regression, and also affects OSP13 as Takashi noticed. I updated the Launchpad bug entry and the patch on Gerrit, this needs to be applied to our downstream releases as well.

Comment 15 Christian Schwede (cschwede) 2020-09-04 07:36:03 UTC
Patch merged on master, proposed backports:

https://review.opendev.org/#/c/749883/ Train
https://review.opendev.org/#/c/749884/ Ussuri
https://review.opendev.org/#/c/749885/ Stein
https://review.opendev.org/#/c/749886/ Rocky
https://review.opendev.org/#/c/749887/ Queens

Comment 17 David Rosenfeld 2020-10-14 21:11:20 UTC
Yes, controller job is passing with current build.

Comment 18 David Rosenfeld 2020-10-28 12:24:43 UTC
All the storage tempest tests that originally failed during the controller replacement job now pass. RHOS-16.1-RHEL-8-20201021.n.0 was used.

Comment 29 errata-xmlrpc 2020-12-15 18:35:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5413


Note You need to log in before you can comment on or make changes to this bug.