Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1116334

Summary: Hang on suspend ioctl call in dmevetnd while trying to repair mirror with allocate policy
Product: Red Hat Enterprise Linux 6 Reporter: Nenad Peric <nperic>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Mirroring and RAID (RHEL6) QA Contact: Cluster QE <mspqa-list>
Status: CLOSED WORKSFORME Docs Contact:
Severity: medium    
Priority: medium CC: agk, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, zkabelac
Version: 6.6Keywords: TestBlocker
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-21 14:14:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
debug output of dmeventd, outputs of dmsetup and lvs. none

Description Nenad Peric 2014-07-04 08:55:48 UTC
Created attachment 914655 [details]
debug output of dmeventd, outputs of dmsetup and lvs.

Description of problem:

When tests with mirrors+allocate are being run every once in a while lvm would get locked up waiting for dmeventd, which in turn waits on ioctl (suspend) to finish. When this happens no other LVM commands can be issued and the whole thing ends up in a deadlock. 

Version-Release number of selected component (if applicable):

kernel -> 2.6.32-488.el6.x86_64

lvm2-2.02.107-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014
lvm2-libs-2.02.107-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014
lvm2-cluster-2.02.107-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014
udev-147-2.55.el6    BUILT: Wed Jun 18 13:30:21 CEST 2014
device-mapper-1.02.86-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014
device-mapper-libs-1.02.86-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014
device-mapper-event-1.02.86-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014
device-mapper-event-libs-1.02.86-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 15:43:06 CEST 2014
cmirror-2.02.107-1.el6    BUILT: Mon Jun 23 16:44:45 CEST 2014


How reproducible:

Around 80% of the time, on different tests with allocate repair policy


Steps to Reproduce:

easily reproduced with tests which fail mirror legs with allocate policy. 
helter_skelter and revolution_9 are both affected.


Actual results:

Repair does not finish, dmeventd and lvm get locked up.

Expected results:

Repair to finish successfully.

Additional info:

Will attach bt of the stuck thread of dmeventd and output of dmsetup and lvs commands.

Comment 3 Jonathan Earl Brassow 2014-08-14 04:09:28 UTC
ignore_lvm_mirrors set to '1' in lvm.conf?

Comment 4 Nenad Peric 2014-08-14 14:21:32 UTC
I did not change ignore_lvm_mirrors to anything specifically, so it is there as it is by default. I would presume that it would be '1'. 

I will try and reproduce the problem but it has been a while and I do not have the concrete system I was testing on available. 

For now I will leave needinfo flag.

Comment 5 Nenad Peric 2014-08-14 15:13:17 UTC
Current mirror/raid device structure(s):
  Couldn't find device with uuid 8kJ9eu-Ni3y-4qmU-BvDW-Lp2X-6X1O-NCX55P.
  LV                     Attr       LSize   Cpy%Sync Devices
   mirror_1               cwi-aom---   2.00g 28.71    mirror_1_mimagetmp_3(0),mirror_1_mimage_3(0)
   [mirror_1_mimage_1]    iwi-aom---   2.00g          /dev/sdi1(0)
   [mirror_1_mimage_2]    iwi-aom---   2.00g          /dev/sde1(0)
   [mirror_1_mimage_3]    Iwi-aom---   2.00g          /dev/sdk1(0)
   [mirror_1_mimagetmp_3] mwi-aom---   2.00g 100.00   mirror_1_mimage_1(0),mirror_1_mimage_2(0)
   [mirror_1_mlog]        lwi-aom---   4.00m          /dev/sdh1(0)



(gdb) info threads 
  2 Thread 0x7f13aa7ae700 (LWP 30785)  0x00007f17b0580c97 in ioctl () from /lib64/libc.so.6
* 1 Thread 0x7f17b14f97a0 (LWP 5550)  0x00007f17b0581743 in select () from /lib64/libc.so.6
(gdb) 


(gdb) thread 2
[Switching to thread 2 (Thread 0x7f13aa7ae700 (LWP 30785))]#0  0x00007f17b0580c97 in ioctl () at ../sysdeps/unix/syscall-template.S:82
82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007f17b0580c97 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f17b0a82e35 in _do_dm_ioctl (dmt=0x7f17b3aab500, command=3241737480, buffer_repeat_count=<value optimized out>, retry_repeat_count=1, 
    retryable=0x7f13aa7adbac) at ioctl/libdm-iface.c:1757
#2  0x00007f17b0a83ec7 in dm_task_run (dmt=0x7f17b3aab500) at ioctl/libdm-iface.c:1875
#3  0x00007f17b14f3de2 in _event_wait (thread=0x7f17b2ec7db0, task=0x7f13aa7adde8) at dmeventd.c:670
#4  0x00007f17b14f3f82 in _monitor_thread (arg=0x7f17b2ec7db0) at dmeventd.c:808
#5  0x00007f17b0c9f9d1 in start_thread (arg=0x7f13aa7ae700) at pthread_create.c:301
#6  0x00007f17b0588ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115




Errors from dmesg:

device-mapper: raid1: log postsuspend failed
device-mapper: raid1: log postsuspend failed



lvs -a -o +devices

[root@tardis-01 ~]# lvs -a -o +devices
  Couldn't find device with uuid 8kJ9eu-Ni3y-4qmU-BvDW-Lp2X-6X1O-NCX55P.
  LV                  VG           Attr       LSize   Pool Origin Data%  Meta%  Move Log           Cpy%Sync Convert Devices                                                       
  mirror_1            revolution_9 mwi-aom---   2.00g                                mirror_1_mlog 100.00           mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0)
  [mirror_1_mimage_1] revolution_9 iwi-aom---   2.00g                                                               /dev/sdi1(0)                                                  
  [mirror_1_mimage_2] revolution_9 iwi-aom---   2.00g                                                               /dev/sde1(0)                                                  
  [mirror_1_mimage_3] revolution_9 iwi-aom---   2.00g                                                               /dev/sdk1(0)                                                  
  [mirror_1_mlog]     revolution_9 lwi-aom---   4.00m                                                               /dev/sdh1(0)                                                  
  lv_home             vg_tardis01  -wi-ao---- 224.88g                                                               /dev/sda2(12800)                                              
  lv_root             vg_tardis01  -wi-ao----  50.00g                                                               /dev/sda2(0)                                                  
  lv_swap             vg_tardis01  -wi-ao----   4.00g                                                               /dev/sda2(70368)  



[root@tardis-01 ~]# lvm dumpconfig
config {
	checks=1
	abort_on_errors=0
	profile_dir="/etc/lvm/profile"
}
dmeventd {
	mirror_library="libdevmapper-event-lvm2mirror.so"
	snapshot_library="libdevmapper-event-lvm2snapshot.so"
	thin_library="libdevmapper-event-lvm2thin.so"
}
activation {
	checks=0
	udev_sync=1
	udev_rules=1
	verify_udev_operations=0
	retry_deactivation=1
	missing_stripe_filler="error"
	use_linear_target=1
	reserved_stack=64
	reserved_memory=65536
	process_priority=-18
	raid_region_size=512
	readahead="auto"
	raid_fault_policy="allocate"
	mirror_log_fault_policy="allocate"
	mirror_image_fault_policy="allocate"
	snapshot_autoextend_threshold=100
	snapshot_autoextend_percent=20
	thin_pool_autoextend_threshold=100
	thin_pool_autoextend_percent=20
	use_mlockall=0
	monitoring=1
	polling_interval=15
	activation_mode="degraded"
}
global {
	umask=63
	test=0
	units="h"
	si_unit_consistency=1
	suffix=1
	activation=1
	proc="/proc"
	locking_type=1
	wait_for_locks=1
	fallback_to_clustered_locking=1
	fallback_to_local_locking=1
	locking_dir="/var/lock/lvm"
	prioritise_write_locks=1
	abort_on_internal_errors=0
	detect_internal_vg_cache_corruption=0
	metadata_read_only=0
	mirror_segtype_default="mirror"
	raid10_segtype_default="mirror"
	use_lvmetad=0
}
shell {
	history_size=100
}
backup {
	backup=1
	backup_dir="/etc/lvm/backup"
	archive=1
	archive_dir="/etc/lvm/archive"
	retain_min=10
	retain_days=30
}
log {
	verbose=0
	silent=0
	syslog=1
	overwrite=0
	level=0
	indent=1
	command_names=0
	prefix="  "
	debug_classes=["memory", "devices", "activation", "allocation", "lvmetad", "metadata", "cache", "locking"]
}
allocation {
	maximise_cling=1
	use_blkid_wiping=1
	wipe_signatures_when_zeroing_new_lvs=1
	mirror_logs_require_separate_pvs=0
	cache_pool_metadata_require_separate_pvs=0
	thin_pool_metadata_require_separate_pvs=0
}
devices {
	dir="/dev"
	scan="/dev"
	obtain_device_list_from_udev=0
	preferred_names=["^/dev/mpath/", "^/dev/mapper/mpath", "^/dev/[hs]d"]
	cache_dir="/etc/lvm/cache"
	cache_file_prefix=""
	write_cache_state=1
	sysfs_scan=1
	multipath_component_detection=1
	md_component_detection=1
	md_chunk_alignment=1
	data_alignment_detection=1
	data_alignment=0
	data_alignment_offset_detection=1
	ignore_suspended_devices=0
	ignore_lvm_mirrors=1
	disable_after_error_count=0
	require_restorefile_with_uuid=1
	pv_min_size=2048
	issue_discards=0
}
[root@tardis-01 ~]# 



[root@tardis-01 ~]# dmsetup ls
revolution_9-mirror_1	(253:7)
revolution_9-mirror_1_mlog	(253:3)
vg_tardis01-lv_home	(253:2)
revolution_9-mirror_1_mimage_3	(253:8)
revolution_9-mirror_1_mimage_2	(253:6)
revolution_9-mirror_1_mimage_1	(253:5)
vg_tardis01-lv_swap	(253:1)
vg_tardis01-lv_root	(253:0)

[root@tardis-01 ~]# dmsetup info -c
Name                           Maj Min Stat Open Targ Event  UUID                                                                
revolution_9-mirror_1          253   7 L--w    1    1     21 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKFQWucfmilWPZGUFJ2O3GsptS5xu3sdbM
revolution_9-mirror_1_mlog     253   3 L--w    1    1      0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKxwmTyNVteppVjPnAd0ZzeLss3OxNUccU
vg_tardis01-lv_home            253   2 L--w    1    1      0 LVM-9HWIquGoVVoj1o6RMBS7UWJSwpn3iDd8qjeMKzaWDBiUZ2jJJTjgQPAaydfFehCg
revolution_9-mirror_1_mimage_3 253   8 L--w    1    1      0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKzBd9WDy5PKaq1RiAnXfhxumf0noKvTig
revolution_9-mirror_1_mimage_2 253   6 L--w    1    1      0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKNLUTXSPVzmhqOZdlmsP3aD7R6fzGnmPi
revolution_9-mirror_1_mimage_1 253   5 L--w    1    1      0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKKfUAIanlwfLTdlt8mXJqIPwyEWO3Tkt1
vg_tardis01-lv_swap            253   1 L--w    1    1      0 LVM-9HWIquGoVVoj1o6RMBS7UWJSwpn3iDd8xVlLZd172WXAlTOXaxvMBAsQKE71KzM1
vg_tardis01-lv_root            253   0 L--w    1    1      0 LVM-9HWIquGoVVoj1o6RMBS7UWJSwpn3iDd8PFs5Lm7v75cBeAIdrOvWzjiS8ev3RHnE


[root@tardis-01 ~]# dmsetup table
revolution_9-mirror_1: 0 4194304 mirror disk 2 253:3 1024 3 253:5 0 253:6 0 253:8 0 1 handle_errors
revolution_9-mirror_1_mlog: 0 8192 linear 8:113 2048
vg_tardis01-lv_home: 0 471597056 linear 8:2 104859648
revolution_9-mirror_1_mimage_3: 0 4194304 linear 8:161 2048
revolution_9-mirror_1_mimage_2: 0 4194304 linear 8:65 2048
revolution_9-mirror_1_mimage_1: 0 4194304 linear 8:129 2048
vg_tardis01-lv_swap: 0 8388608 linear 8:2 576456704
vg_tardis01-lv_root: 0 104857600 linear 8:2 2048

From /var/log/merssages:

Aug 14 16:42:40 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1_mimagetmp_3 for events.
Aug 14 16:42:40 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:42:40 tardis-01 lvm[5550]: WARNING: Failed to replace 1 of 3 images in volume mirror_1
Aug 14 16:42:40 tardis-01 lvm[5550]: WARNING: Failed to replace 1 of 1 logs in volume mirror_1
Aug 14 16:42:56 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 33.4%
Aug 14 16:43:12 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 66.2%
Aug 14 16:43:27 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 98.4%
Aug 14 16:43:42 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 100.0%
Aug 14 16:43:43 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:43:43 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:43:43 tardis-01 lvm[5550]: Logical volume mirror_1 converted.
Aug 14 16:43:58 tardis-01 lvm[5550]: Repair of mirrored device revolution_9-mirror_1 finished successfully.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1_mimagetmp_3 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: dm_task_run failed, errno = 6, No such device or address
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1_mimagetmp_3 disappeared, detaching
Aug 14 16:43:58 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1_mimagetmp_3 for events.
Aug 14 16:44:00 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:44:02 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:44:04 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:44:08 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.




If additional info/tests are needed, please let me know. 
I will leave the computer in the stuck state, so maybe you would try and log in there as well if you want to check additional things.

Comment 6 Jonathan Earl Brassow 2014-08-15 00:51:25 UTC
I have checked out the system and I found nothing blocked, no stray dm devices, and not hung processes.  After that, I unmounted the file system, cycled the LV, and mounted the file system again to ensure some basic functionality with the LV.  There was nothing wrong.

It appears I'll have to get to the problem sooner after creation?  Something is taking longer than expected perhaps?

Comment 8 Nenad Peric 2014-08-21 11:36:41 UTC
I could not reproduce the issue. 
It seems that there was a bug in our tests which introduced the delays and even hangs. Even though there were cases in the beginning that could influence lvm, I think it was rather kernel related. 
I can see that all the lvm commands still work normally, but the tests were still hanging (due to hangs in message handling between driver and client test machine). So the real blocker seemed to be the part of the test engine itself.

I will upload patches I wrote for our tests in order to avoid these situations in the future. 

We can freely close this now I believe. 

Sorry for rocking the boat too much.

Comment 10 Jonathan Earl Brassow 2014-08-21 14:14:59 UTC
thanks for your investigation Nenad.