Bug 1116334
| Summary: | Hang on suspend ioctl call in dmevetnd while trying to repair mirror with allocate policy | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Nenad Peric <nperic> | ||||
| Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||
| lvm2 sub component: | Mirroring and RAID (RHEL6) | QA Contact: | Cluster QE <mspqa-list> | ||||
| Status: | CLOSED WORKSFORME | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | medium | CC: | agk, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, zkabelac | ||||
| Version: | 6.6 | Keywords: | TestBlocker | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-08-21 14:14:59 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
ignore_lvm_mirrors set to '1' in lvm.conf? I did not change ignore_lvm_mirrors to anything specifically, so it is there as it is by default. I would presume that it would be '1'. I will try and reproduce the problem but it has been a while and I do not have the concrete system I was testing on available. For now I will leave needinfo flag. Current mirror/raid device structure(s):
Couldn't find device with uuid 8kJ9eu-Ni3y-4qmU-BvDW-Lp2X-6X1O-NCX55P.
LV Attr LSize Cpy%Sync Devices
mirror_1 cwi-aom--- 2.00g 28.71 mirror_1_mimagetmp_3(0),mirror_1_mimage_3(0)
[mirror_1_mimage_1] iwi-aom--- 2.00g /dev/sdi1(0)
[mirror_1_mimage_2] iwi-aom--- 2.00g /dev/sde1(0)
[mirror_1_mimage_3] Iwi-aom--- 2.00g /dev/sdk1(0)
[mirror_1_mimagetmp_3] mwi-aom--- 2.00g 100.00 mirror_1_mimage_1(0),mirror_1_mimage_2(0)
[mirror_1_mlog] lwi-aom--- 4.00m /dev/sdh1(0)
(gdb) info threads
2 Thread 0x7f13aa7ae700 (LWP 30785) 0x00007f17b0580c97 in ioctl () from /lib64/libc.so.6
* 1 Thread 0x7f17b14f97a0 (LWP 5550) 0x00007f17b0581743 in select () from /lib64/libc.so.6
(gdb)
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f13aa7ae700 (LWP 30785))]#0 0x00007f17b0580c97 in ioctl () at ../sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0 0x00007f17b0580c97 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f17b0a82e35 in _do_dm_ioctl (dmt=0x7f17b3aab500, command=3241737480, buffer_repeat_count=<value optimized out>, retry_repeat_count=1,
retryable=0x7f13aa7adbac) at ioctl/libdm-iface.c:1757
#2 0x00007f17b0a83ec7 in dm_task_run (dmt=0x7f17b3aab500) at ioctl/libdm-iface.c:1875
#3 0x00007f17b14f3de2 in _event_wait (thread=0x7f17b2ec7db0, task=0x7f13aa7adde8) at dmeventd.c:670
#4 0x00007f17b14f3f82 in _monitor_thread (arg=0x7f17b2ec7db0) at dmeventd.c:808
#5 0x00007f17b0c9f9d1 in start_thread (arg=0x7f13aa7ae700) at pthread_create.c:301
#6 0x00007f17b0588ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Errors from dmesg:
device-mapper: raid1: log postsuspend failed
device-mapper: raid1: log postsuspend failed
lvs -a -o +devices
[root@tardis-01 ~]# lvs -a -o +devices
Couldn't find device with uuid 8kJ9eu-Ni3y-4qmU-BvDW-Lp2X-6X1O-NCX55P.
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
mirror_1 revolution_9 mwi-aom--- 2.00g mirror_1_mlog 100.00 mirror_1_mimage_1(0),mirror_1_mimage_2(0),mirror_1_mimage_3(0)
[mirror_1_mimage_1] revolution_9 iwi-aom--- 2.00g /dev/sdi1(0)
[mirror_1_mimage_2] revolution_9 iwi-aom--- 2.00g /dev/sde1(0)
[mirror_1_mimage_3] revolution_9 iwi-aom--- 2.00g /dev/sdk1(0)
[mirror_1_mlog] revolution_9 lwi-aom--- 4.00m /dev/sdh1(0)
lv_home vg_tardis01 -wi-ao---- 224.88g /dev/sda2(12800)
lv_root vg_tardis01 -wi-ao---- 50.00g /dev/sda2(0)
lv_swap vg_tardis01 -wi-ao---- 4.00g /dev/sda2(70368)
[root@tardis-01 ~]# lvm dumpconfig
config {
checks=1
abort_on_errors=0
profile_dir="/etc/lvm/profile"
}
dmeventd {
mirror_library="libdevmapper-event-lvm2mirror.so"
snapshot_library="libdevmapper-event-lvm2snapshot.so"
thin_library="libdevmapper-event-lvm2thin.so"
}
activation {
checks=0
udev_sync=1
udev_rules=1
verify_udev_operations=0
retry_deactivation=1
missing_stripe_filler="error"
use_linear_target=1
reserved_stack=64
reserved_memory=65536
process_priority=-18
raid_region_size=512
readahead="auto"
raid_fault_policy="allocate"
mirror_log_fault_policy="allocate"
mirror_image_fault_policy="allocate"
snapshot_autoextend_threshold=100
snapshot_autoextend_percent=20
thin_pool_autoextend_threshold=100
thin_pool_autoextend_percent=20
use_mlockall=0
monitoring=1
polling_interval=15
activation_mode="degraded"
}
global {
umask=63
test=0
units="h"
si_unit_consistency=1
suffix=1
activation=1
proc="/proc"
locking_type=1
wait_for_locks=1
fallback_to_clustered_locking=1
fallback_to_local_locking=1
locking_dir="/var/lock/lvm"
prioritise_write_locks=1
abort_on_internal_errors=0
detect_internal_vg_cache_corruption=0
metadata_read_only=0
mirror_segtype_default="mirror"
raid10_segtype_default="mirror"
use_lvmetad=0
}
shell {
history_size=100
}
backup {
backup=1
backup_dir="/etc/lvm/backup"
archive=1
archive_dir="/etc/lvm/archive"
retain_min=10
retain_days=30
}
log {
verbose=0
silent=0
syslog=1
overwrite=0
level=0
indent=1
command_names=0
prefix=" "
debug_classes=["memory", "devices", "activation", "allocation", "lvmetad", "metadata", "cache", "locking"]
}
allocation {
maximise_cling=1
use_blkid_wiping=1
wipe_signatures_when_zeroing_new_lvs=1
mirror_logs_require_separate_pvs=0
cache_pool_metadata_require_separate_pvs=0
thin_pool_metadata_require_separate_pvs=0
}
devices {
dir="/dev"
scan="/dev"
obtain_device_list_from_udev=0
preferred_names=["^/dev/mpath/", "^/dev/mapper/mpath", "^/dev/[hs]d"]
cache_dir="/etc/lvm/cache"
cache_file_prefix=""
write_cache_state=1
sysfs_scan=1
multipath_component_detection=1
md_component_detection=1
md_chunk_alignment=1
data_alignment_detection=1
data_alignment=0
data_alignment_offset_detection=1
ignore_suspended_devices=0
ignore_lvm_mirrors=1
disable_after_error_count=0
require_restorefile_with_uuid=1
pv_min_size=2048
issue_discards=0
}
[root@tardis-01 ~]#
[root@tardis-01 ~]# dmsetup ls
revolution_9-mirror_1 (253:7)
revolution_9-mirror_1_mlog (253:3)
vg_tardis01-lv_home (253:2)
revolution_9-mirror_1_mimage_3 (253:8)
revolution_9-mirror_1_mimage_2 (253:6)
revolution_9-mirror_1_mimage_1 (253:5)
vg_tardis01-lv_swap (253:1)
vg_tardis01-lv_root (253:0)
[root@tardis-01 ~]# dmsetup info -c
Name Maj Min Stat Open Targ Event UUID
revolution_9-mirror_1 253 7 L--w 1 1 21 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKFQWucfmilWPZGUFJ2O3GsptS5xu3sdbM
revolution_9-mirror_1_mlog 253 3 L--w 1 1 0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKxwmTyNVteppVjPnAd0ZzeLss3OxNUccU
vg_tardis01-lv_home 253 2 L--w 1 1 0 LVM-9HWIquGoVVoj1o6RMBS7UWJSwpn3iDd8qjeMKzaWDBiUZ2jJJTjgQPAaydfFehCg
revolution_9-mirror_1_mimage_3 253 8 L--w 1 1 0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKzBd9WDy5PKaq1RiAnXfhxumf0noKvTig
revolution_9-mirror_1_mimage_2 253 6 L--w 1 1 0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKNLUTXSPVzmhqOZdlmsP3aD7R6fzGnmPi
revolution_9-mirror_1_mimage_1 253 5 L--w 1 1 0 LVM-T22JK51bI0izpkF0oX1pvg1lZylcm9cKKfUAIanlwfLTdlt8mXJqIPwyEWO3Tkt1
vg_tardis01-lv_swap 253 1 L--w 1 1 0 LVM-9HWIquGoVVoj1o6RMBS7UWJSwpn3iDd8xVlLZd172WXAlTOXaxvMBAsQKE71KzM1
vg_tardis01-lv_root 253 0 L--w 1 1 0 LVM-9HWIquGoVVoj1o6RMBS7UWJSwpn3iDd8PFs5Lm7v75cBeAIdrOvWzjiS8ev3RHnE
[root@tardis-01 ~]# dmsetup table
revolution_9-mirror_1: 0 4194304 mirror disk 2 253:3 1024 3 253:5 0 253:6 0 253:8 0 1 handle_errors
revolution_9-mirror_1_mlog: 0 8192 linear 8:113 2048
vg_tardis01-lv_home: 0 471597056 linear 8:2 104859648
revolution_9-mirror_1_mimage_3: 0 4194304 linear 8:161 2048
revolution_9-mirror_1_mimage_2: 0 4194304 linear 8:65 2048
revolution_9-mirror_1_mimage_1: 0 4194304 linear 8:129 2048
vg_tardis01-lv_swap: 0 8388608 linear 8:2 576456704
vg_tardis01-lv_root: 0 104857600 linear 8:2 2048
From /var/log/merssages:
Aug 14 16:42:40 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1_mimagetmp_3 for events.
Aug 14 16:42:40 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:42:40 tardis-01 lvm[5550]: WARNING: Failed to replace 1 of 3 images in volume mirror_1
Aug 14 16:42:40 tardis-01 lvm[5550]: WARNING: Failed to replace 1 of 1 logs in volume mirror_1
Aug 14 16:42:56 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 33.4%
Aug 14 16:43:12 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 66.2%
Aug 14 16:43:27 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 98.4%
Aug 14 16:43:42 tardis-01 lvm[5550]: revolution_9/mirror_1: Converted: 100.0%
Aug 14 16:43:43 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:43:43 tardis-01 lvm[5550]: Monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:43:43 tardis-01 lvm[5550]: Logical volume mirror_1 converted.
Aug 14 16:43:58 tardis-01 lvm[5550]: Repair of mirrored device revolution_9-mirror_1 finished successfully.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1_mimagetmp_3 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1 is now in-sync.
Aug 14 16:43:58 tardis-01 lvm[5550]: dm_task_run failed, errno = 6, No such device or address
Aug 14 16:43:58 tardis-01 lvm[5550]: revolution_9-mirror_1_mimagetmp_3 disappeared, detaching
Aug 14 16:43:58 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1_mimagetmp_3 for events.
Aug 14 16:44:00 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:44:02 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:44:04 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.
Aug 14 16:44:08 tardis-01 lvm[5550]: No longer monitoring mirror device revolution_9-mirror_1 for events.
If additional info/tests are needed, please let me know.
I will leave the computer in the stuck state, so maybe you would try and log in there as well if you want to check additional things.
I have checked out the system and I found nothing blocked, no stray dm devices, and not hung processes. After that, I unmounted the file system, cycled the LV, and mounted the file system again to ensure some basic functionality with the LV. There was nothing wrong. It appears I'll have to get to the problem sooner after creation? Something is taking longer than expected perhaps? I could not reproduce the issue. It seems that there was a bug in our tests which introduced the delays and even hangs. Even though there were cases in the beginning that could influence lvm, I think it was rather kernel related. I can see that all the lvm commands still work normally, but the tests were still hanging (due to hangs in message handling between driver and client test machine). So the real blocker seemed to be the part of the test engine itself. I will upload patches I wrote for our tests in order to avoid these situations in the future. We can freely close this now I believe. Sorry for rocking the boat too much. thanks for your investigation Nenad. |
Created attachment 914655 [details] debug output of dmeventd, outputs of dmsetup and lvs. Description of problem: When tests with mirrors+allocate are being run every once in a while lvm would get locked up waiting for dmeventd, which in turn waits on ioctl (suspend) to finish. When this happens no other LVM commands can be issued and the whole thing ends up in a deadlock. Version-Release number of selected component (if applicable): kernel -> 2.6.32-488.el6.x86_64 lvm2-2.02.107-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 lvm2-libs-2.02.107-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 lvm2-cluster-2.02.107-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 udev-147-2.55.el6 BUILT: Wed Jun 18 13:30:21 CEST 2014 device-mapper-1.02.86-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 device-mapper-libs-1.02.86-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 device-mapper-event-1.02.86-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 device-mapper-event-libs-1.02.86-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 device-mapper-persistent-data-0.3.2-1.el6 BUILT: Fri Apr 4 15:43:06 CEST 2014 cmirror-2.02.107-1.el6 BUILT: Mon Jun 23 16:44:45 CEST 2014 How reproducible: Around 80% of the time, on different tests with allocate repair policy Steps to Reproduce: easily reproduced with tests which fail mirror legs with allocate policy. helter_skelter and revolution_9 are both affected. Actual results: Repair does not finish, dmeventd and lvm get locked up. Expected results: Repair to finish successfully. Additional info: Will attach bt of the stuck thread of dmeventd and output of dmsetup and lvs commands.