Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2176424

Summary:

Checksum error at offset

Product:

Red Hat Enterprise Linux 8

Reporter:

shan.wu <shan.wu>

Component:

lvm2

Assignee:

David Teigland <teigland>

lvm2 sub component:

LVM lock daemon / lvmlockd

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED MIGRATED

Docs Contact:

Severity:

urgent

Priority:

high

CC:

agk, bstinson, heinzm, jbrassow, jwboyer, prajnoha, teigland, zkabelac

Version:

CentOS Stream

Keywords:

MigratedToJIRA, Triaged

Target Milestone:

Flags:

pm-rhel: mirror+

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2023-09-23 18:51:39 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
pvck --dump metadata	none
output of pvs -vvvv	none
output of pvck --dump headers	none
output of pvck --dump metadata_all -v	none
lvm script	none
lvm-read-test	none
lvm.conf	none
sanlock.conf	none
host-14-sanlock.log	none
host-40-sanljock.log	none

Description shan.wu 2023-03-08 11:07:49 UTC

Created attachment 1948968 [details]
pvck --dump metadata

Created attachment 1948968 [details]
pvck --dump metadata


Description of problem:
The error is reported as follows（lvm version：2.03.11）：
[root@172-27-31-11 ~]# vgs --nolocking
  WARNING: File locking is disabled.
  /dev/mapper/3378dd83538f55425: Checksum error at offset 1820485120
  WARNING: invalid metadata text from /dev/mapper/3378dd83538f55425 at 1820485120.
  WARNING: metadata on /dev/mapper/3378dd83538f55425 at 1820485120 has invalid summary for VG.
  WARNING: bad metadata text on /dev/mapper/3378dd83538f55425 in mda1
  WARNING: scanning /dev/mapper/3378dd83538f55425 mda1 failed to read metadata summary.
  WARNING: repair VG metadata on /dev/mapper/3378dd83538f55425 with vgck --updatemetadata.
  WARNING: scan failed to get metadata summary from /dev/mapper/3378dd83538f55425 PVID kZXmHW6o8K2jBXklareOwOcdUWl3nOLl
  /dev/mapper/335fc1a306a17644c: Checksum error at offset 1820485120
  WARNING: invalid metadata text from /dev/mapper/335fc1a306a17644c at 1820485120.
  WARNING: metadata on /dev/mapper/335fc1a306a17644c at 1820485120 has invalid summary for VG.
  WARNING: bad metadata text on /dev/mapper/335fc1a306a17644c in mda1
  WARNING: scanning /dev/mapper/335fc1a306a17644c mda1 failed to read metadata summary.
  WARNING: repair VG metadata on /dev/mapper/335fc1a306a17644c with vgck --updatemetadata.
  WARNING: scan failed to get metadata summary from /dev/mapper/335fc1a306a17644c PVID 8t58Vvo2FTwy1LlEfyfke2LKPbDEkj5R
  /dev/mapper/33d6c93fbb2cfa457: Checksum error at offset 1820485120
  WARNING: invalid metadata text from /dev/mapper/33d6c93fbb2cfa457 at 1820485120.
  WARNING: metadata on /dev/mapper/33d6c93fbb2cfa457 at 1820485120 has invalid summary for VG.
  WARNING: bad metadata text on /dev/mapper/33d6c93fbb2cfa457 in mda1
  WARNING: scanning /dev/mapper/33d6c93fbb2cfa457 mda1 failed to read metadata summary.
  WARNING: repair VG metadata on /dev/mapper/33d6c93fbb2cfa457 with vgck --updatemetadata.
  WARNING: scan failed to get metadata summary from /dev/mapper/33d6c93fbb2cfa457 PVID 1f8QNk54zf76Z1EdQNoA08bv44fEBJ6D



How reproducible: Constantly create, expand, activate and delete lv



Actual results: all pv Checksum error


Expected results: Normal use


Additional info: I found that the end of the metadata read out using pvck -- dump metadata is incomplete, because of mda_header_1.raw_Locn[0].size(using pvck --dump headers to see) is is lower than the actual size(using pvck --dump metadata_all to see)， but it is complete in the actual block device

Comment 1 David Teigland 2023-03-08 22:23:28 UTC

This sort of problem is most likely caused by missing locking between hosts.

Please attach the script you are using, or the sequence of exact commands that are being run on each host when the errors appear.  I will need to look at these commands for missing locking.

Is this a persistent or transient condition?  i.e. if you stop all commands and run a single "pvs" command, does it report the errors?  If it is persistent, then please attach:
- The output of a single "pvs -vvvv" while no other commands are running.
- The output of pvck --dump headers <device> from each PV in the VG.
- The output of pvck --dump metadata_all -v <device> from each PV in the VG.

If it's a transient problem, i.e. a single "pvs" command does not report the errors, then I will need to try to reproduce the problem using the sequence of commands you provide.

Comment 2 shan.wu 2023-03-09 02:09:49 UTC

Created attachment 1949192 [details]
output of pvs -vvvv

Comment 3 shan.wu 2023-03-09 02:14:38 UTC

Created attachment 1949193 [details]
output of pvck --dump headers

Comment 4 shan.wu 2023-03-09 02:17:32 UTC

Created attachment 1949194 [details]
output of pvck --dump metadata_all -v

Comment 5 shan.wu 2023-03-09 02:28:25 UTC

Created attachment 1949197 [details]
lvm script

Comment 6 shan.wu 2023-03-09 02:56:03 UTC

Lvm operation uses lvmlockd and sanlock to lock between hosts，I guess this is caused by incorrect size and offset in metadata headers.

Comment 7 shan.wu 2023-03-09 06:51:33 UTC

I can be sure that the problem is caused by the smaller metadata size stored in the metadata header than the actual size in block device

Comment 8 David Teigland 2023-03-10 20:49:13 UTC

Thanks for the info, I've begun looking through it. I will attempt to reproduce it myself next.

Please provide the output of:
rpm -q lvm2
rpm -q sanlock
lvm --version
sanlock -V

If the versions are old enough, I may need you to update to recent versions of both lvm and sanlock to check if the problem still exists in current versions (2.03.11 is quite old, so you should be using the latest RHEL8 update, or even better try on RHEL9.)

It looks like you have 10 hosts running this script at once?

Is each host running the script on both shared VGs at once?

I'm guessing that there are there 20 instances of the script running concurrently across 10 hosts and 2 VGs?

How long do you run this test before the problem appears?

It would be helpful to scale down the test to reduce the volume of data to analyze, e.g. will the problem appear if you run the script on only one VG? then on only 5 hosts?

If this problem persists with recent versions, then I'll probably need to capture the debug logging from the lvm command that writes the bad metadata header. Because of the scale of the test, we should set up logging to limit the amount of data. By setting the LVM_LOG_FILE_EPOCH env var, each lvm command will write a separate log file (unfortunately, LVM_LOG_FILE_EPOCH can only include letters.) I'd suggest setting LVM_LOG_FILE_EPOCH to "hosta", "hostb", ... on each host running the script. Then add this option to each command line in the script: --config 'log {level=7 file=/tmp/lvm}' (or save them somewhere else like /var/log)

This will result in log files named like: /tmp/lvm_hosta_1916277_2662401585
where the first number is the command pid and the second a time value.

You could use some method to limit the number of log files, e.g. keep the last 200 files/commands, or keep the last 5 minutes.

Stop the script immediately when the error appears, and collect data for analysis:
- collect all the command log files from all the hosts (as described above)
- collect "pvck --dump headers" from all PVs in the faulty VG
- collect "pvck --dump metadata_all -v" from all PVs in the faulty VG

Comment 9 shan.wu 2023-03-11 10:53:46 UTC

lvm2-2.03.11-5.el7.x86_64
sanlock-3.8.3-9.g5d2ac84.el7.x86_64

lvm version：
 LVM version:     2.03.11(2)-RHEL8 (2021-01-28)
  Library version: 1.02.175-RHEL8 (2021-01-28)
  Driver version:  4.37.1
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-fsadm --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-cmdlib --enable-dmeventd --with-cluster=internal --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --with-cache=internal --enable-lvmpolld --enable-lvmlockd-sanlock --enable-dmfilemapd --with-writecache=internal --with-vdo=internal --with-vdo-format=/usr/bin/vdoformat --with-integrity=internal --disable-silent-rules

sanlock -V:
sanlock 3.8.3 (built Mar 22 2022 04:30:43)


Yes, I have 10 physical machines. Each physical machine needs to run two scripts at the same time (including read and write, and add the - t option to the read command). The script ran for about half a day. There are two vgs on each physical machine, but only one vg has been tested in script. It seems that this problem has not occurred when there was only one vg before

Comment 10 shan.wu 2023-03-11 10:59:26 UTC

Created attachment 1949823 [details]
lvm-read-test

Comment 11 David Teigland 2023-03-13 22:44:07 UTC

Those are strange versions, are these customized/non-standard OS installations?

lvm-2.03.11 and sanlock-3.8.3 are RHEL8 versions, so "el7" in the rpm must be incorrect?

It sounds like the setup and test is as follows:
10 machines
2 shared VGs with sanlock
3 PVs in each VG
one VG is not used during the test
lvm-r and lvm-w scripts are run in one VG by all nodes concurrently

The lvm-w.sh script creates/changes/removes one LV at a time.  If one of these scripts runs on each host, then I would expect to see 10 different LVs in the VG.  But, the pvs output in comment 2 shows 174 LVs.  Do you know why this is?

Do the commands run from the lvm-w script ever report the checksum errors?

The commands run from lvm-r use --nolocking, which can cause those commands to see inconsistent metadata on disk.  I'll investigate whether this might include checksum errors.

One of my original questions was whether these checksum errors are persistent, i.e. once the error appears, if you stop all commands and run a single "pvs" command, does that pvs command still report the same checksum error?  I can't quite tell if you answered that.
Thanks

Comment 12 shan.wu 2023-03-14 07:26:02 UTC

Yes, this checksum error always exists (not only in the process of reading vg), which ultimately leads to the vg being unable to create lv

Comment 13 shan.wu 2023-03-14 07:28:17 UTC

I installed lvm 2.03.11 using srpm of RHEL8 in redhat 7

Comment 14 David Teigland 2023-03-16 14:36:09 UTC

I've been running the r&w scripts as described for a couple days and haven't reproduced the errors (using 7 nodes).  I'm using the latest lvm2 from the upstream git and the latest upstream sanlock release.  I suggest you try testing with the latest versions to see if the problem still exists.  If you still get the errors, then configure per-command debug logging and collect the info described in comment 8.

Comment 15 shan.wu 2023-03-29 02:09:39 UTC

(In reply to David Teigland from comment #14)
> I've been running the r&w scripts as described for a couple days and haven't
> reproduced the errors (using 7 nodes).  I'm using the latest lvm2 from the
> upstream git and the latest upstream sanlock release.  I suggest you try
> testing with the latest versions to see if the problem still exists.  If you
> still get the errors, then configure per-command debug logging and collect
> the info described in comment 8.

"I have successfully replicated this issue recently, but it is not consistent with the previous phenomenon. This time, I conducted a read and write test on a vg with three pvs, resulting in two pvs having metadata corruption, and one is normal. From my comparison of good and bad pv metadata, the checksum of the latest metadata copies of the two pvs with metadata corruption is equal to the checksum of the same sequence number in the good pv, and because their sequence numbers are consistent,", "The metadata copy is inconsistent, so I suspect that concurrent metadata writing can cause this problem. It can also cause all pvs to be corrupted and the metadata size to be incorrect, as before."

Comment 16 shan.wu 2023-03-29 02:19:09 UTC

"It should be caused by two lvrename commands writing metadata at the same time. Because they both executed successfully, my log did not collect the detailed output of lvrename - vvvv. Therefore, I would like to ask if there is a problem with my lvm or sanlock configuration."? Because I found from the log that a command would take a long time to complete without timeout, the configuration for lvm and sanlock is attached. The following are the startup commands for lvmlockd and sanlock:

/sbin/lvmlockd --daemon-debug --sanlock-timeout 20 --adopt 1
/usr/sbin/sanlock daemon

Comment 17 shan.wu 2023-03-29 02:22:44 UTC

Created attachment 1954340 [details]
lvm.conf

Comment 18 shan.wu 2023-03-29 02:23:07 UTC

Created attachment 1954341 [details]
sanlock.conf

Comment 19 David Teigland 2023-04-06 18:43:32 UTC

I suggest that you remove "renewal_read_extend_sec = 24" and use the default behavior.  I don't know what will happen if you use a value larger than the io timeout.

When you set use_watchdog=0, then you should expect to see corruption if a host's delta lease expires.  Disabling the watchdog can be useful during testing, but you need to manually check if a delta lease expires.

What are the "zstack" settings used for?

Comment 20 shan.wu 2023-04-13 06:04:48 UTC

I found that two hosts simultaneously obtained the exclusive lock of vg and performed an operation to write vg metadata, resulting in metadata corruption. Below are some lvmlockd logs:

host-1:
Mar 27 21:13:09 172-27-31-14 lvmlockd: 1679922789 recv lvrename[13480] cl 267202 update vg "98e69d79725f49e2bcafdcd2f1a5ff9c" mode iv flags 0
Mar 27 21:13:09 172-27-31-14 lvmlockd: 1679922789 S lvm_98e69d79725f49e2bcafdcd2f1a5ff9c R VGLK action update iv
Mar 27 21:13:09 172-27-31-14 lvmlockd: 1679922789 S lvm_98e69d79725f49e2bcafdcd2f1a5ff9c R VGLK res_update cl 267202 lk version to 163122
Mar 27 21:13:09 172-27-31-14 lvmlockd: 1679922789 send lvrename[13480] cl 267202 update vg rv 0

hsot-2:
Mar 27 21:13:08 172-27-31-40 lvmlockd: 1679922788 recv lvrename[17764] cl 4518 update vg "98e69d79725f49e2bcafdcd2f1a5ff9c" mode iv flags 0
Mar 27 21:13:08 172-27-31-40 lvmlockd: 1679922788 S lvm_98e69d79725f49e2bcafdcd2f1a5ff9c R VGLK action update iv
Mar 27 21:13:08 172-27-31-40 lvmlockd: 1679922788 S lvm_98e69d79725f49e2bcafdcd2f1a5ff9c R VGLK res_update cl 4518 lk version to 163122
Mar 27 21:13:08 172-27-31-40 lvmlockd: 1679922788 send lvrename[17764] cl 4518 update vg rv 0

After investigating this, I don't have much ideas, and it's also difficult to reproduce this problem

Comment 21 David Teigland 2023-04-13 16:25:29 UTC

Can you include messages from /var/log/sanlock.log covering this time?

Comment 22 shan.wu 2023-04-14 02:19:56 UTC

Created attachment 1957424 [details]
host-14-sanlock.log

Comment 23 shan.wu 2023-04-14 02:20:21 UTC

Created attachment 1957425 [details]
host-40-sanljock.log

Comment 24 David Teigland 2023-04-14 21:18:29 UTC

Thanks, I suspect that the following sections of sanlock.log are showing the problem in part, but I'll probably need to see sanlock.log from all the hosts covering this same event.  Many hosts are attempting to acquire the same VG lease at once, so all those hosts participate in the paxos algorithm.  For example, the log from host_id 66 shows that host_id 92 committed one owner value, but I don't have the log from host_id 92 to why it did that.

Also, could you check if host_id 66 and host_id 101 are the two hosts you showed in the comment 20 lvrename messages?


host_id 66:

2023-03-27 21:12:29 1146535 [31507]: cmd_acquire 2,9,14028 ci_in 2 fd 9 count 1 flags 9
2023-03-27 21:12:29 1146535 [31507]: s1:r2 paxos_acquire begin offset 69206016 0xa 512 1048576
2023-03-27 21:12:30 1146536 [31507]: s1:r2 leader 1886509 dblocks 12:4379892013:0:0:0:0:1886509:0,20:4379894021:0:0:0:0:1886509:0,27:4379892028:0:0:0:0:1886509:0,31:4379894032:0:0:0:0:1886509:0,
2023-03-27 21:12:30 1146536 [31507]: s1:r2 leader 1886509 dblocks 39:4379892040:0:0:0:0:1886509:0,45:4379892046:0:0:0:0:1886509:0,46:4379890047:0:0:0:0:1886509:0,50:4379894051:4379894051:92:1:1146639:1886509:0,
2023-03-27 21:12:30 1146536 [31507]: s1:r2 leader 1886509 dblocks 58:4379894059:4379894059:92:1:1146639:1886509:0,65:4379892066:0:0:0:0:1886509:0,71:4379890072:0:0:0:0:1886509:0,82:4379892083:0:0:0:0:1886509:0,
2023-03-27 21:12:30 1146536 [31507]: s1:r2 leader 1886509 owner 92 1 1146639 dblocks 90:4379892091:0:0:0:0:1886509:0,91:4379892092:0:0:0:0:1886509:0,
2023-03-27 21:12:30 1146536 [31507]: s1:r2 paxos_acquire leader 1886509 owner 92 1 1146639 max mbal[58] 4379894059 our_dblock 4379892066 0 0 0 0 1886509
2023-03-27 21:12:30 1146536 [31507]: s1:r2 paxos_acquire owner 92 1 1146639 host_status 92 1 1146596 wait_start 1146495
2023-03-27 21:12:32 1146537 [31507]: s1:r2 paxos_acquire owner 92 delta 92 1 1146638 alive
2023-03-27 21:12:33 1146539 [31507]: s1:r2 paxos_acquire owner 92 1 1146639 writer 59 owner dblock released
2023-03-27 21:12:34 1146540 [31507]: s1:r2 paxos_acquire leader 1886509 owner 92 1 1146639
2023-03-27 21:12:34 1146540 [31507]: s1:r2 ballot 1886510 phase1 write mbal 4379896066
2023-03-27 21:12:37 1146543 [31507]: s1:r2 ballot 1886510 phase1 read 20:4379896021:0:0:0:0:1886510:0,45:4379896046:0:0:0:0:1886510:0,65:4379896066:0:0:0:0:1886510:0,
2023-03-27 21:12:37 1146543 [31507]: s1:r2 ballot 1886510 phase2 write bal 4379896066 inp 66 1 1146543 q_max -1
2023-03-27 21:12:40 1146545 [31507]: s1:r2 ballot 1886510 phase2 read 12:4379896013:0:0:0:0:1886510:0,20:4379896021:0:0:0:0:1886510:0,44:4379896045:0:0:0:0:1886510:0,45:4379896046:0:0:0:0:1886510:0,
2023-03-27 21:12:40 1146545 [31507]: s1:r2 ballot 1886510 abort2 larger mbal in bk[80] 4379896081:0:0:0:0:1886510 our dblock 4379896066:4379896066:66:1:1146543:1886510
2023-03-27 21:12:40 1146545 [31507]: s1:r2 ballot 1886510 phase2 read 65:4379896066:4379896066:66:1:1146543:1886510:0,80:4379896081:0:0:0:0:1886510:0,
2023-03-27 21:12:40 1146545 [31507]: s1:r2 paxos_acquire 1886510 retry delay 684582 us
2023-03-27 21:12:42 1146548 [31507]: s1:r2 paxos_acquire leader 1886509 owner 92 1 1146639
2023-03-27 21:12:42 1146548 [31507]: s1:r2 ballot 1886510 phase1 write mbal 4379898066
2023-03-27 21:12:45 1146551 [31507]: s1:r2 ballot 1886510 phase1 read 12:4379896013:0:0:0:0:1886510:0,20:4379898021:0:0:0:0:1886510:0,39:4379896040:0:0:0:0:1886510:0,44:4379898045:0:0:0:0:1886510:0,
2023-03-27 21:12:45 1146551 [31507]: s1:r2 ballot 1886510 phase1 read 45:4379898046:0:0:0:0:1886510:0,46:4379896047:0:0:0:0:1886510:0,50:4379896051:0:0:0:0:1886510:0,65:4379898066:0:0:0:0:1886510:0,
2023-03-27 21:12:45 1146551 [31507]: s1:r2 ballot 1886510 abort1 larger mbal in bk[80] 4379898081:0:0:0:0:1886510 our dblock 4379898066:0:0:0:0:1886510
2023-03-27 21:12:45 1146551 [31507]: s1:r2 ballot 1886510 phase1 read 71:4379896072:0:0:0:0:1886510:0,80:4379898081:0:0:0:0:1886510:0,
2023-03-27 21:12:45 1146551 [31507]: s1:r2 paxos_acquire 1886510 retry delay 739602 us
2023-03-27 21:12:47 1146553 [31507]: s1:r2 paxos_acquire leader 1886510 owner 66 1 1146543
2023-03-27 21:12:47 1146553 [31507]: s1:r2 paxos_acquire 1886510 owner is our inp 66 1 1146543 committed by 92
2023-03-27 21:12:47 1146553 [31507]: s1:r2 acquire_disk rv 1 lver 1886510 at 1146543
2023-03-27 21:12:49 1146555 [31507]: cmd_acquire 2,9,14028 result 0 pid_dead 0

(This shows that host_id 92 decided to set host_id 66 as the owner for lver 1886510.)


host_id 101:

2023-03-27 21:12:31 1146471 [30594]: cmd_acquire 2,9,12513 ci_in 2 fd 9 count 1 flags 9
2023-03-27 21:12:31 1146471 [30594]: s1:r2 paxos_acquire begin offset 69206016 0xa 512 1048576
2023-03-27 21:12:33 1146473 [30594]: s1:r2 leader 1886509 dblocks 12:4379892013:0:0:0:0:1886509:0,20:4379894021:0:0:0:0:1886509:0,27:4379892028:0:0:0:0:1886509:0,31:4379894032:0:0:0:0:
2023-03-27 21:12:33 1146473 [30594]: s1:r2 leader 1886509 dblocks 39:4379892040:0:0:0:0:1886509:0,45:4379892046:0:0:0:0:1886509:0,46:4379890047:0:0:0:0:1886509:0,50:4379894051:43798940
2023-03-27 21:12:33 1146473 [30594]: s1:r2 leader 1886509 dblocks 58:4379894059:4379894059:92:1:1146639:1886509:0,65:4379892066:0:0:0:0:1886509:0,71:4379890072:0:0:0:0:1886509:0,82:437
2023-03-27 21:12:33 1146473 [30594]: s1:r2 leader 1886509 owner 92 1 1146639 dblocks 90:4379892091:0:0:0:0:1886509:0,91:4379892092:0:0:0:0:1886509:1,
2023-03-27 21:12:33 1146473 [30594]: s1:r2 paxos_acquire leader 1886509 owner 92 1 1146639 max mbal[58] 4379894059 our_dblock 0 0 0 0 0 0
2023-03-27 21:12:33 1146473 [30594]: s1:r2 paxos_acquire owner 92 1 1146639 host_status 92 1 1146638 wait_start 1146459
2023-03-27 21:12:34 1146474 [30594]: s1:r2 paxos_acquire owner 92 delta 92 1 1146638 alive
2023-03-27 21:12:36 1146475 [30594]: s1:r2 paxos_acquire owner 92 1 1146639 writer 59 owner dblock released
2023-03-27 21:12:37 1146477 [30594]: s1:r2 paxos_acquire leader 1886509 owner 92 1 1146639
2023-03-27 21:12:37 1146477 [30594]: s1:r2 ballot 1886510 phase1 write mbal 4379896101
2023-03-27 21:12:41 1146480 [30594]: s1:r2 ballot 1886510 abort1 larger mbal in bk[20] 4379898021:0:0:0:0:1886510 our dblock 4379896101:0:0:0:0:1886510
2023-03-27 21:12:41 1146480 [30594]: s1:r2 ballot 1886510 phase1 read 12:4379896013:0:0:0:0:1886510:0,20:4379898021:0:0:0:0:1886510:0,
2023-03-27 21:12:41 1146480 [30594]: s1:r2 paxos_acquire 1886510 retry delay 207718 us
2023-03-27 21:12:42 1146482 [30594]: s1:r2 paxos_acquire leader 1886509 owner 92 1 1146639
2023-03-27 21:12:42 1146482 [30594]: s1:r2 ballot 1886510 phase1 write mbal 4379898101
2023-03-27 21:12:46 1146486 [30594]: s1:r2 ballot 1886510 phase1 read 12:4379896013:0:0:0:0:1886510:0,20:4379898021:0:0:0:0:1886510:0,39:4379896040:0:0:0:0:1886510:0,44:4379898045:0:0:
2023-03-27 21:12:46 1146486 [30594]: s1:r2 ballot 1886510 phase1 read 45:4379898046:0:0:0:0:1886510:0,46:4379896047:0:0:0:0:1886510:0,50:4379896051:0:0:0:0:1886510:0,65:4379898066:0:0:
2023-03-27 21:12:46 1146486 [30594]: s1:r2 ballot 1886510 phase1 read 71:4379896072:0:0:0:0:1886510:0,80:4379898081:0:0:0:0:1886510:0,82:4379898083:0:0:0:0:1886510:0,100:4379898101:0:0
2023-03-27 21:12:46 1146486 [30594]: s1:r2 ballot 1886510 phase2 write bal 4379898101 inp 101 1 1146486 q_max -1
2023-03-27 21:12:50 1146490 [30594]: s1:r2 ballot 1886510 phase2 read 12:4379896013:0:0:0:0:1886510:0,20:4379898021:0:0:0:0:1886510:0,39:4379896040:0:0:0:0:1886510:0,44:4379898045:0:0:
2023-03-27 21:12:50 1146490 [30594]: s1:r2 ballot 1886510 phase2 read 45:4379898046:0:0:0:0:1886510:0,46:4379896047:0:0:0:0:1886510:0,50:4379896051:0:0:0:0:1886510:0,65:4379898066:0:0:
2023-03-27 21:12:50 1146490 [30594]: s1:r2 ballot 1886510 phase2 read 71:4379896072:0:0:0:0:1886510:0,80:4379898081:0:0:0:0:1886510:0,82:4379898083:0:0:0:0:1886510:0,100:4379898101:437
2023-03-27 21:12:52 1146491 [30594]: s1:r2 ballot 1886510 commit self owner 101 1 1146486
2023-03-27 21:12:52 1146491 [30594]: s1:r2 acquire_disk rv 1 lver 1886510 at 1146486
2023-03-27 21:12:53 1146493 [30594]: cmd_acquire 2,9,12513 result 0 pid_dead 0

(This shows that host_id 101 decided to make itself the owner of lver 1886510.)

Comment 26 RHEL Program Management 2023-09-23 18:50:19 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 27 RHEL Program Management 2023-09-23 18:51:39 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.