Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1733392

Summary: Galera split brained for a month unnoticed resulting in 2 data sets accessed by a production environment.
Product: Red Hat OpenStack Reporter: coldford <coldford>
Component: galeraAssignee: Damien Ciabrini <dciabrin>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: bdobreli, dvd, lmiccini, mbayer, pkovacs, rcernin
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-22 15:30:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description coldford@redhat.com 2019-07-25 23:40:43 UTC
Description of problem:

On 06/23 one controller node had a hardware issue requiring the node be removed from the cluster following https://access.redhat.com/solutions/3062611, Once the hardware issue was resolved the customer added the node back on 06/28. Unknown to the customer the galera was split brained however even today the pcs status and clustercheck reported healthy. Until today when the issue was discovered, revealing that for approximately ~30 days reads and writes were done to both clusters. This prompted reporting numerous issues including missing images, 50+ instances unable to start(turned out the related stacks had been deleted) 


Version-Release number of selected component (if applicable):

RHOSP 10

How reproducible:

CONTROLLERS 0/1
MariaDB [glance]> select * from images where name like '%bnc_nc_v1.0.2%'\G
*************************** 1. row ***************************
id: 77e1b61e-9f25-4177-89b6-b1ee2a0c4be1
name: bnc_nc_v1.0.2
size: 2456223744
status: active
is_public: 0
created_at: 2019-07-10 22:23:25
updated_at: 2019-07-10 22:24:27
deleted_at: NULL
deleted: 0
disk_format: qcow2
container_format: bare
checksum: 130d7c1da262e89d8748ff6d61f07911
owner: 3b00ee7a0cf44455830b073067a261ad
min_disk: 0
min_ram: 0
protected: 0
virtual_size: NULL
1 row in set (5.32 sec)

CONTROLLERS 2

MariaDB [glance]> select * from images where name like '%bnc_nc_v1.0.2%'\G
Empty set (0.75 sec)

We need the database differences reviewed and merged into a single dataset.

Additional info:
We put controller 2 into standby to limit further drift but now resources written to that set are now being reported as missing.