Description of problem: While trying the mirror_sanity test case looping_mirror_log_converts, which converts from a core log to a disk log and back 20 times, I found that the lvconvert commands were taking a very long time. secs command line 2136 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 4779 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 7582 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 79 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 460 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 1160 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 3405 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 2931 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 4856 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 109 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 149 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert On an x86_64 system I found the following run times: 22 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 2 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 26 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 2 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 21 lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 2 lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert crash shows the following stack trace on the lvconvert process. strace shows nothing. PID: 26689 TASK: c000000000e1ac00 CPU: 0 COMMAND: "lvconvert" #0 [c000000042bc3670] .schedule at c0000000003d07b8 #1 [c000000042bc3780] __sched_text_start at c0000000003cf9fc #2 [c000000042bc3840] .mirror_presuspend at d000000000250708 #3 [c000000042bc3910] .suspend_targets at d00000000027285c #4 [c000000042bc39a0] .dm_suspend at d000000000271fc4 #5 [c000000042bc3a70] .dev_suspend at d000000000276798 #6 [c000000042bc3b10] .ctl_ioctl at d0000000002773c4 #7 [c000000042bc3d00] .dm_compat_ctl_ioctl at d000000000277464 #8 [c000000042bc3d70] .compat_sys_ioctl at c000000000139f0c #9 [c000000042bc3e30] syscall_exit at c0000000000086a4 Version-Release number of selected component (if applicable): kernel-2.6.18-150.el5 lvm2-2.02.46-2.el5 lvm2-cluster-2.02.46-1.el5 device-mapper-1.02.32-1.el5 How reproducible: Every time Steps to Reproduce: 1. lvcreate -m 1 -n mirror_log_convert --corelog -l 139009 mirror_sanity /dev/sdc3 /dev/sdb1 /dev/sdb2 /dev/sdc1 2. time lvconvert -m 1 /dev/mirror_sanity/mirror_log_convert 3. time lvconvert -m 1 --corelog /dev/mirror_sanity/mirror_log_convert 4. go to 2 Additional info:
After further investigation it appears that when the mirror is recovering, lvm commands go very slowly. This includes lvchange attempts and even lvs. If I add --nosync to the lvcreate command, lvconvert runtimes return to a consistent 44 seconds or so.
It is not regression, this behavious should be in all ppc kernel/lvm2 versions (I can reproduce similar behaviour on RHEL5.3.) The ppc64 kernel have 64k PAGESIZE, x86_64 have only 4k PAGESIZE. So we are probably submitting more work (copy work is submitted in pages) on ppc than on x86_64 and (pre)suspend call have to wait till all submitted kcopy jobs are done. We should probably limit submitted jobs according to pages size too. (Need more testing, but this seems to me like kernel only problem.)
I ran into another issue on ppc which may be related. I had a local mirror created and I was attempting to change the minor device number when the lvchange command line hung. Aug 25 11:36:48 newport-p1 qarshd[8098]: Running cmdline: lvchange -ay -f -My --major 255 --minor 253 /dev/mirror_1_1707/mirror_1_17070 Aug 25 11:36:48 newport-p1 lvm[10214]: No longer monitoring mirror device mirror_1_1707-mirror_1_17070 for events The mirror was created with this command: # lvcreate -l 834334 -n mirror_1_17070 -m 1 -R 16 --corelog mirror_1_1707 Kernel stack traces: kmirrord S 0000000000000000 10944 4434 139 4438 1564 (L-TLB) Call Trace: [C00000000FD8BA80] [D000000000274204] .dm_table_unplug_all+0x6c/0xd8 [dm_mod] (unreliable) [C00000000FD8BC50] [C000000000010B0C] .__switch_to+0x124/0x148 [C00000000FD8BCE0] [C0000000003D2BE8] .schedule+0xc08/0xdbc [C00000000FD8BDF0] [C000000000083020] .worker_thread+0xf8/0x19c [C00000000FD8BEE0] [C000000000087B58] .kthread+0x128/0x178 [C00000000FD8BF90] [C0000000000275C8] .kernel_thread+0x4c/0x68 kcopyd S 0000000000000000 9664 4438 139 4434 (L-TLB) Call Trace: [C000000000E5FA80] [C000000076FA9540] 0xc000000076fa9540 (unreliable) [C000000000E5FC50] [C000000000010B0C] .__switch_to+0x124/0x148 [C000000000E5FCE0] [C0000000003D2BE8] .schedule+0xc08/0xdbc [C000000000E5FDF0] [C000000000083020] .worker_thread+0xf8/0x19c [C000000000E5FEE0] [C000000000087B58] .kthread+0x128/0x178 [C000000000E5FF90] [C0000000000275C8] .kernel_thread+0x4c/0x68 lvchange D 000000000ff19688 10448 8099 8098 (NOTLB) Call Trace: [C00000003340B440] [0000000024024482] 0x24024482 (unreliable) [C00000003340B610] [C000000000010B0C] .__switch_to+0x124/0x148 [C00000003340B6A0] [C0000000003D2BE8] .schedule+0xc08/0xdbc [C00000003340B7B0] [C0000000003D1E2C] .__down+0x6c/0xf0 [C00000003340B870] [D000000000250708] .mirror_presuspend+0x6c/0x164 [dm_mirror] [C00000003340B940] [D00000000027288C] .suspend_targets+0x60/0x98 [dm_mod] [C00000003340B9D0] [D000000000271290] .dm_put+0xd0/0x220 [dm_mod] [C00000003340BA70] [D000000000276D80] .dev_remove+0xcc/0x100 [dm_mod] [C00000003340BB10] [D000000000277464] .ctl_ioctl+0x29c/0x318 [dm_mod] [C00000003340BD00] [D000000000277504] .dm_compat_ctl_ioctl+0x24/0x34 [dm_mod] [C00000003340BD70] [C000000000139598] .compat_sys_ioctl+0x158/0x3b4 [C00000003340BE30] [C0000000000086A4] syscall_exit+0x0/0x40
I am closing this bug WONTFIX. If you wish to reopen for consideration, please answer the following: 1) If the mirror has completed resync before starting the flip-flopping conversion, does the slowness persist? 2) Is this bug present on other releases (e.g. RHEL6)?