Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
So far the reproducibility is 2/3 and happens only when non-synchronized mirror loses leg. Will run more tests to get
a) more accurate reproducibility numbers and
b) more debugging information.
Comment 6Jonathan Earl Brassow
2014-04-10 15:57:21 UTC
We are talking about killing a leg before the mirror is fully initialized, right?
Which device are you killing? You say "leg[1]". Does that imply *_mimage_1?
I've performed this test, and I've got it to hang for me for a while, but only as long as error messages are streaming to the console. After that, it seems to complete.
Any more advice for reproducing? Do you have any other I/O happening at the same time as the background sync?
(In reply to Jonathan Earl Brassow from comment #6)
> We are talking about killing a leg before the mirror is fully initialized,
> right?
Yes. The secondary leg is killed before the mirror is full synced.
>
> Which device are you killing? You say "leg[1]". Does that imply *_mimage_1?
>
Nope, '[1]' is related to:
> [1] Used recommended `echo 1 > .../delete` and udev event was generated (and pvscan --cache executed though lvmetad not configured.)> I've performed this test, and I've got it to hang for me for a while, but
> only as long as error messages are streaming to the console. After that, it
> seems to complete.
>
> Any more advice for reproducing? Do you have any other I/O happening at the
> same time as the background sync?
Yes, there were 2 processes writing to the FS on the LV.
Comment 9Jonathan Earl Brassow
2014-06-10 03:08:47 UTC
I've tried this a couple different ways with rhel6.6 RPMs and upstream code. I've not been able to reproduce it. Here are the steps I use:
1) create fresh VG
2) create 10G mirror
3) kill secondary image while mirror recovery is happening
I also start I/O (I've used buffered and direct) before and after killing the device. If I start I/O after killing the device, it ensures that dmeventd first responds to a 'sync' error. If I start I/O before killing the device, it gives a chance of hitting a 'i/o' error instead of a sync error.
I'll need some pointers to reproduce this issue. Have a script?
Comment 12Red Hat Bugzilla
2023-09-14 01:40:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days
Created attachment 676851 [details] logs etc Description of problem: After removing leg[1] the mirror was not repaired by dmeventd. All I got from dmeventd in syslog related to the leg failure are these two lines: Jan 10 13:15:18 zaphodc1-node01 lvm[5786]: Monitoring mirror device helter_skelter-nonsyncd_secondary_2legs_1 for events. Jan 10 13:15:35 zaphodc1-node01 lvm[5786]: Secondary mirror device 253:4 sync failed. dmeventd is stuck waiting for ioctl - see backtrace below. [1] Used recommended `echo 1 > .../delete` and udev event was generated (and pvscan --cache executed though lvmetad not configured.) Version-Release number of selected component (if applicable): lvm2-2.02.98-6.18 How reproducible: ??? Steps to Reproduce: 1. ??? Actual results: mirror was not repaired. Expected results: Mirror should have been repaired. Additional info: Core was generated by `/sbin/dmeventd'. #0 0x00007f085a829a47 in ioctl () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) Missing separate debuginfos, use: debuginfo-install libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libsepol-2.0.41-4.el6.x86_64 (gdb) info threads 2 Thread 0x7f085b76f7a0 (LWP 5786) 0x00007f085a82a4f3 in select () at ../sysdeps/unix/syscall-template.S:82 * 1 Thread 0x7f0859e90700 (LWP 13968) 0x00007f085a829a47 in ioctl () at ../sysdeps/unix/syscall-template.S:82 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f085b76f7a0 (LWP 5786))]#0 0x00007f085a82a4f3 in select () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt full #0 0x00007f085a82a4f3 in select () at ../sysdeps/unix/syscall-template.S:82 No locals. #1 0x000000000040514b in _client_read (fifos=0x7fff8389b170, msg=0x7fff8389b120) at dmeventd.c:1335 t = {tv_sec = 0, tv_usec = 155468} bytes = 0 ret = 0 fds = {fds_bits = {32, 0 <repeats 15 times>}} size = 8 header = 0x7fff8389b000 buf = 0x7fff8389b000 "v0LTlDB6mZTMC4h205wLR 65280" #2 0x0000000000405777 in _process_request (fifos=0x7fff8389b170) at dmeventd.c:1478 die = 0 msg = {cmd = 0, size = 0, data = 0x0} #3 0x000000000040695d in main (argc=1, argv=0x7fff8389b278) at dmeventd.c:2007 opt = -1 '\377' fifos = {client = 5, server = 4, client_path = 0x406c2b "/var/run/dmeventd-client", server_path = 0x406c44 "/var/run/dmeventd-server"} (gdb) thread 1 [Switching to thread 1 (Thread 0x7f0859e90700 (LWP 13968))]#0 0x00007f085a829a47 in ioctl () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt full #0 0x00007f085a829a47 in ioctl () at ../sysdeps/unix/syscall-template.S:82 No locals. #1 0x00007f085ad28049 in _do_dm_ioctl (dmt=0x7f0844000b20, command=3241737480, buffer_repeat_count=0, retry_repeat_count=1, retryable=0x7f0859e8fc18) at ioctl/libdm-iface.c:1726 dmi = 0x7f0844003100 ioctl_with_uevent = 0 #2 0x00007f085ad288ad in dm_task_run (dmt=0x7f0844000b20) at ioctl/libdm-iface.c:1844 dmi = 0x0 command = 3241737480 check_udev = 0 rely_on_udev = 0 suspended_counter = 0 ioctl_retry = 1 retryable = 0 dev_name = 0x0 dev_uuid = 0x7f0844000bf0 "LVM-EUowLYZxEkWGJWJr6pY77z70j03YTf2VIwo2fi8k6Yav0LTlDB6mZTMC4h205wLR" #3 0x0000000000403afd in _event_wait (thread=0x1671180, task=0x7f0859e8fde8) at dmeventd.c:651 set = {__val = {18446744067266821880, 69, 140735400226736, 139673854582850, 139673844910528, 1, 139673477319664, 139673860032013, 0, 139673844906944, 23218096, 139673477319456, 3, 4294967296, 8606191168529454668, 5140695445782813004}} ret = 0 dmt = 0x7f0844000b20 info = {exists = 1508441520, suspended = 32520, live_table = -2088128592, inactive_table = 32767, open_count = 1508444608, event_nr = 32520, major = 0, minor = 0, read_only = 3, target_count = 0} #4 0x0000000000403faf in _monitor_thread (arg=0x1671180) at dmeventd.c:787 __clframe = {__cancel_routine = 0x403d71 <_monitor_unregister>, __cancel_arg = 0x1671180, __do_it = 1, __cancel_type = 0} thread = 0x1671180 wait_error = 1 task = 0x0 #5 0x00007f085af3f851 in start_thread (arg=0x7f0859e90700) at pthread_create.c:301 __res = <value optimized out> pd = 0x7f0859e90700 now = <value optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139673844909824, -8766475169563209079, 140735400226736, 139673844910528, 0, 3, 8667048386050331273, 8667050627825078921}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <value optimized out> pagesize_m1 = <value optimized out> sp = <value optimized out> freesize = <value optimized out> #6 0x00007f085a83190d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 No locals.