Bug 2114013

Summary: ocf_heartbeat_galera - trouble with subsequent Galera instance(s)
Product: Red Hat Enterprise Linux 8 Reporter: lejeczek <peljasz>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: ASSIGNED --- QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.8CC: agk, cluster-maint, dciabrin, fdinitto, nwahl
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description lejeczek 2022-08-02 14:54:18 UTC
Description of problem:

Have a 'galera' resource which manages "standard-os-rpm-instance" Galera and such resource will work ok.
Add a second/ary 'galera' resource with a second non-standard Galera instance, though using same binaries from os-rpm installation - such resource goes haywire.

Here is that second 'galera' resource:
-> $ pcs resource config mariadb-apps-clone
Clone: mariadb-apps-clone
  Meta Attributes: mariadb-apps-clone-meta_attributes
    failure-timeout=30s
    promotable=true
    promoted-max=2
    target-role=Stopped
  Resource: mariadb-apps (class=ocf provider=heartbeat type=galera)
    Attributes: mariadb-apps-instance_attributes
      check_passwd=pacemaker#98
      check_user=pacemaker
      cluster_host_map=drunk.internal.ccn:10.0.1.6;sucker.internal.ccn:10.0.1.7
      config=/apps/etc/mariadb-server.cnf
      datadir=/apps/mysql/data
      group=mysql
      log=/var/log/mariadb/maria-apps.log
      pid=/run/mariadb/maria-apps.pid
      socket=/var/lib/mysql/maria-apps.sock
      user=mysql
      wsrep_cluster_address=gcomm://10.0.1.6:5567,10.0.1.7:5567
    Meta Attributes: mariadb-apps-meta_attributes
      failure-timeout=30s
    Operations:
      demote: mariadb-apps-demote-interval-0s
        interval=0s
        timeout=120s
      monitor: mariadb-apps-monitor-interval-20s
        interval=20s
        timeout=30s
        OCF_CHECK_LEVEL=0
      monitor: mariadb-apps-monitor-interval-10s
        interval=10s
        timeout=30s
        role=Master
        OCF_CHECK_LEVEL=0
      monitor: mariadb-apps-monitor-interval-30s
        interval=30s
        timeout=30s
        role=Slave
        OCF_CHECK_LEVEL=0
      promote: mariadb-apps-promote-interval-0s
        interval=0s
        timeout=300s
      start: mariadb-apps-start-interval-0s
        interval=0s
        timeout=120s
      stop: mariadb-apps-stop-interval-0s
        interval=0s
        timeout=120s

Galera second instance works as expected outside of pcs/pacemaker:
Galera config in /apps/etc/mariadb-server.cnf:

[mariadb]
port=3307

[mariadb-10.3]

[mysqld]
port=3307
datadir=/apps/mysql/data
socket=/var/lib/mysql/maria-apps.sock
log-error=/var/log/mariadb/maria-apps.log
pid-file=/run/mariadb/maria-apps.pid
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2

# Mandatory settings
wsrep_on=1
bind-address=0.0.0.0
#bind-address=10.0.1.7
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_name="apps"
wsrep_cluster_address="gcomm://10.0.1.6:5567,10.0.1.7:5567"
wsrep_node_address=10.0.1.7:5567
wsrep_sst_receive_address=10.0.1.7:5444
# Optional setting
wsrep_slave_threads=1
innodb_flush_log_at_trx_commit=0
wsrep_provider_options="ist.recv_addr=10.0.1.7:5568"
wsrep_sst_method=rsync
wsrep_sst_auth=root:

Instantiate Galera cluser to confirm it works outside of 'pacemaker'
-> $ sudo -u mysql /usr/libexec/mysqld --defaults-file=/apps/etc/mariadb-server.cnf --wsrep-new-cluster
do other(s) node:
-> $ sudo -u mysql /usr/libexec/mysqld --defaults-file=/apps/etc/mariadb-server.cnf 

First issue I think is:
- 'config' attr does not do much/anything? because if 'datadir' attr is not specified then resource/agent wants to start Mariadb from '/var/lib/mysql' - and !..
Galera's '/apps/etc/mariadb-server.cnf' does contain those bits anyway.

Second issue is:
that such galera resource, set up as above, seems to think that it works okey, that is started:
...
  * Clone Set: mariadb-apps-clone [mariadb-apps] (promotable):
    * mariadb-apps	(ocf::heartbeat:galera):	 Slave sucker.internal.ccn
    * mariadb-apps	(ocf::heartbeat:galera):	 Slave drunk.internal.ccn

but in reality no second Galera cluster has been started - first Galera instance managed by pcs/pacemaker yes, gets started and operates okey.

And !! it does not matter if first Galera resource is up&running or not, even with:
-> $ pcs resource disable mariadb-clone
and 'mariadb-apps-clone' re-enabled, pacemaker still shows:
...
  * Clone Set: mariadb-apps-clone [mariadb-apps] (promotable):
    * mariadb-apps	(ocf::heartbeat:galera):	 Slave sucker.internal.ccn
    * mariadb-apps	(ocf::heartbeat:galera):	 Slave drunk.internal.ccn

Version-Release number of selected component (if applicable):

resource-agents-4.9.0-27.el8.x86_64
pacemaker-2.1.4-4.el8.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Reid Wahl 2022-08-02 18:34:10 UTC
Can you share your full cluster configuration? An sosreport would be ideal if possible, so that we can also see the log files. (Or you could attach logs manually if you'd prefer.)

---

See also this mailing list thread started by the reporter:
  - [ClusterLabs] resource agent OCF_HEARTBEAT_GALERA issue/broken - ? (https://lists.clusterlabs.org/pipermail/users/2022-July/030433.html)

> First issue I think is:
> - 'config' attr does not do much/anything? because if 'datadir' attr is not specified then resource/agent wants to start Mariadb from '/var/lib/mysql' - and !..
> Galera's '/apps/etc/mariadb-server.cnf' does contain those bits anyway.

As discussed in that thread, the config option specifies the `--defaults-file` option that we use in the mysql start command. Another CLI option may override what's in the file, I presume.

However, it's not obvious without looking at the script that if datadir is not specified, the resource agent explicitly passes --datadir=/var/lib/mysql.

I made a suggestion on the thread:

> It would be reasonable at least to add a note to the resource agent
> metadata, to say something like
> 
>   datadir: Directory containing databases. If this option is not
> specified, then --datadir=/var/lib/mysql will be used when starting
> the database.
> 
> instead of the current
> 
>   datadir: Directory containing databases
```

Comment 2 lejeczek 2022-08-03 06:55:33 UTC
in logs outside of pcs cluster:
...
ERROR: Could not determine initial best node from galera name <10.0.1.7:5567>.
ERROR: MySQL is not running
...

but again, this galera cluster starts perfectly fine outside of pcs cluster control.

Comment 3 Reid Wahl 2022-08-03 06:59:09 UTC
Can you share your full cluster configuration? An sosreport would be ideal if possible, so that we can also see the log files. (Or you could attach logs manually if you'd prefer.)

Comment 4 lejeczek 2022-08-03 07:27:07 UTC
In an attempt to:
-> $ pcs resource debug-promote mariadb-apps --full

I do not see 'mysql_common_start' there.

Comment 5 Reid Wahl 2022-08-03 07:56:08 UTC
Can you share your full cluster configuration? An sosreport would be ideal if possible, so that we can also see the log files. (Or you could attach logs manually if you'd prefer.)

Comment 6 lejeczek 2022-08-03 09:30:08 UTC
Sorry, cannot just yet.
Also it's - as is the thread on list I started - against CentOS 8, should be, not RHEL, my bad. Feel free to change that.

And to cover all bits, here is the resource which works perfectly fine:
-> $ pcs resource config mariadb-clone
Clone: mariadb-clone
  Meta Attributes: mariadb-clone-meta_attributes
    promotable=true
    promoted-max=2
  Resource: mariadb (class=ocf provider=heartbeat type=galera)
    Attributes: mariadb-instance_attributes
      check_passwd=pacemaker#98
      check_user=pacemaker
      cluster_host_map=drunk.internal.ccn:10.0.1.6;sucker.internal.ccn:10.0.1.7
      group=mysql
      user=mysql
      wsrep_cluster_address=gcomm://10.0.1.6,10.0.1.7
    Meta Attributes: mariadb-meta_attributes
      failure-timeout=30s
    Operations:
...
So.. very same single pcs cluster, one galera works while the other does not.

Possibly interesting is:
When I change to this:
      wsrep_cluster_address=gcomm://drunk.internal.ccn:5567,sucker.internal.ccn:5567
and accordingly in '/apps/etc/mariadb-server.cnf'
then I get rid of 'cluster_host_map' as a result 'ERROR: Could not determine initial best node from galera name' disappears but..
...
ERROR: MySQL is not running
INFO: Waiting on node <drunk.internal.ccn:5567> to report database status before Master instances can start.
INFO: Waiting on node <sucker.internal.ccn:5567> to report database status before Master instances can start.
...

I'm quite certain I took care of SELinux - non-standard ports & path - even tried permissive/disabled.
Now when I look at it - do you really think 'global' config has anything to do with it - having one galera resource work but not the other?
Also if given then only difference between the two - in theory - are different IP ports & paths (thus extra attrs in second resource)

Comment 7 lejeczek 2022-08-03 09:42:09 UTC
aha!!! this little thing seems to brake everything/resource/agent, here:

wsrep_cluster_address="gcomm://10.0.1.6:5567,10.0.1.7:5567" -> brakes down
VS
wsrep_cluster_address="gcomm://10.0.1.6,10.0.1.7" -> works okey

is that expected behaviour?
You mention docs/man which you say should be better - I say: with code/bins which have poor documentation or none at all - if such bins are meant for public-user use - we can wipe our arses with. Even if such code/bin is the smartest one.

'wsrep_cluster_address' - another thing - seems it's mandatory! and if is - also no mention of that in man page.

Comment 8 Reid Wahl 2022-08-03 10:13:00 UTC
(In reply to lejeczek from comment #7)
> aha!!! this little thing seems to brake everything/resource/agent, here:
> 
> wsrep_cluster_address="gcomm://10.0.1.6:5567,10.0.1.7:5567" -> brakes down
> VS
> wsrep_cluster_address="gcomm://10.0.1.6,10.0.1.7" -> works okey
> 
> is that expected behaviour?

It may be improvable; I haven't gotten that far and will probably leave it to the resource-agents maintainer to decide. However, it makes sense that the current version of the agent can't properly parse the ports from the wsrep_cluster_address:
```
galera_to_pcmk_name()
{
    local galera=$1
    if [ -z "$OCF_RESKEY_cluster_host_map" ]; then
        echo $galera
    else
        echo "$OCF_RESKEY_cluster_host_map" | tr ';' '\n' | tr -d ' ' | sed 's/:/ /' | awk -F' ' '$2=="'"$galera"'" {print $1;exit}'
    fi
}
...
detect_first_master()
{
    ...
    all_nodes=$(echo "$OCF_RESKEY_wsrep_cluster_address" | sed 's/gcomm:\/\///g' | tr -d ' ' | tr -s ',' ' ')
    best_node_gcomm=$(echo "$all_nodes" | sed 's/^.* \(.*\)$/\1/')
    best_node=$(galera_to_pcmk_name $best_node_gcomm)
    if [ -z "$best_node" ]; then
        ocf_log err "Could not determine initial best node from galera name <${best_node_gcomm}>."
        return
    fi
...
}
```

> You mention docs/man which you say should be better - I say: with code/bins
> which have poor documentation or none at all - if such bins are meant for
> public-user use - we can wipe our arses with. Even if such code/bin is the
> smartest one.
> 
> 'wsrep_cluster_address' - another thing - seems it's mandatory! and if is -
> also no mention of that in man page.

It's listed as required, and it's listed as "gcomm://node,node,node" -- not "gcomm://node[:port],node[:port],node[:port]". The documentation also says that the wsrep_cluster_addresses "are expected to match valid pacemaker node names. If both names need to differ, you must provide a mapping in option cluster_host_map." In this case, the mapping didn't match the wsrep_cluster_adddress due to the ports appendd in the address.
```
[root@fastvm-rhel-8-0-23 ~]# rpm -q resource-agents
resource-agents-4.9.0-16.el8.x86_64
[root@fastvm-rhel-8-0-23 ~]# pcs resource describe ocf:heartbeat:galera
...
Resource options:
  ...
  wsrep_cluster_address (required): The galera cluster address. This takes the form of: gcomm://node,node,node Only nodes present in this node list will be allowed to start a galera instance. The galera node
                                    names listed in this address are expected to match valid pacemaker node names. If both names need to differ, you must provide a mapping in option cluster_host_map.

```

Comment 9 lejeczek 2022-08-03 10:33:39 UTC
in configs mariadb/galera allow ports to be skipped - code set default ones - I guarantee that people coming to this(every other too) resource will assume that ports can be either: there or absent - just like I did.
Docs could cover it better, leaving no doubts.

Comment 10 lejeczek 2023-01-02 18:38:28 UTC
The issue is back, in my comment #7 I wrote:

wsrep_cluster_address="gcomm://10.0.1.6,10.0.1.7" -> works okey

but it does not, a few rpm updates later, any more.

How that second resource "instance" fails now is with:

  * mariadb-apps_promote_0 on sucker.internal.ccn 'error' (1): call=1085, status='complete', exitreason='Failed initial monitor action', last-rc-change='Mon Jan  2 18:13:10 2023', queued=0ms, exec=9618ms

Everything, resource & mariadb configs remain the same as I showed them above.
Also ! again, when is Galera ("second/non-regular" resource) is started the same moment but outside of pcs/HA, with cmds as I showed above, then that Galera cluster runs just fine. 

What I noticed also - which I think might point to where the culprit is - is that when I start Galera first/regular (which was disabled) resource, then the second resource "amazingly" starts and sees no monitor problems, as soon as that first Galera resource elected/promoted Galara cluster. 

As soon as I disable that first Galara resource - when both resources worked fine - then pcs/HA stops/fails the second resource with the same:

  * mariadb-apps_promote_0 on sucker.internal.ccn 'error' (1): call=1207, status='complete', exitreason='Failed initial monitor action', last-rc-change='Mon Jan  2 18:24:30 2023', queued=0ms, exec=7555ms

"first/regular" resource:
-> $ pcs resource config mariadb
Resource: mariadb (class=ocf provider=heartbeat type=galera)
  Attributes: mariadb-instance_attributes
    check_passwd=pacemaker#98
    check_user=pacemaker
    cluster_host_map=drunk.internal.ccn:10.0.1.6;sucker.internal.ccn:10.0.1.7
    group=mysql
    user=mysql
    wsrep_cluster_address=gcomm://10.0.1.6,10.0.1.7
  Meta Attributes: mariadb-meta_attributes
    failure-timeout=30s
  Operations:
    demote: mariadb-demote-interval-0s
      interval=0s
      timeout=120s
    monitor: mariadb-monitor-interval-20s
      interval=20s
      timeout=30s
      OCF_CHECK_LEVEL=0
    monitor: mariadb-monitor-interval-10s
      interval=10s
      timeout=30s
      role=Master
      OCF_CHECK_LEVEL=0
    monitor: mariadb-monitor-interval-30s
      interval=30s
      timeout=30s
      role=Slave
      OCF_CHECK_LEVEL=0
    promote: mariadb-promote-interval-0s
      interval=0s
      timeout=300s
    start: mariadb-start-interval-0s
      interval=0s
      timeout=120s
    stop: mariadb-stop-interval-0s
      interval=0s
      timeout=120s

This "first/regular" resource is not affected, does not fail, if/when the "second/non-regular" resource is stopped/disabled.

many thanks, L.