Bug 1090471

Summary: `vgreduce --removemissing` while there are failed mirror LV legs hangs any sync to LV
Product: Red Hat Enterprise Linux 7 Reporter: Marian Csontos <mcsontos>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, lmiksik, mcsontos, msnitzer, prajnoha, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: 7.8   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-28 20:19:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1307111, 1320730    
Attachments:
Description Flags
SysRq none

Description Marian Csontos 2014-04-23 12:10:16 UTC
Description of problem:
I have a VM with 4 PVs.
Created mirrored LV with "-m 1".

When testing mirror LV recovery I removed secondary leg, and again (while being synchronized) and again (while being synchronized).

In order to continue with testing to remove further legs, I tried to run `vgreduce --removemissing $VG` the system hanged.

Version-Release number of selected component (if applicable):
lvm2-2.02.105-14.el7
kernel-3.10.0-121.el7

How reproducible:
100%

Steps to Reproduce:
0. ROOT FS on mirrored LV
1. remove secondary leg (echo 1 > /sys/block/sdX/device/delete), repeat if necessary until there are no more replacement legs and the mirror is partial
2. vgreduce --remove-missing $VG

Actual results:
Command hangs after output:

    PV $UUID1 not recognised. Is the device missing?
    PV $UUID2 not recognised. Is the device missing?
    PV $UUID1 not recognised. Is the device missing?
    PV $UUID2 not recognised. Is the device missing?
    WARNING: Partial LV lv_master needs to be repaired or removed.
    WARNING: Partial LV lv_master_mimage_1 needs to be repaired or removed.
    There are still partial LVs in VG vg_stacked.
    To remove them unconditionally use: vgreduce --removemissing --force.
    Proceeding to remove empty missing PVs.

Stack:

    [<ffffffffa019b195>] jbd2_log_wait_commit+0xc5/0x150 [jbd2]
    [<ffffffffa019d884>] jbd2_complete_transaction+0x54/0xa0 [jbd2]
    [<ffffffffa01b070f>] ext4_sync_file+0x1df/0x330 [ext4]
    [<ffffffff811df8b5>] do_fsync+0x65/0xa0
    [<ffffffff811dfbb0>] SyS_fsync+0x10/0x20
    [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

I can still ssh to the VM, but anything syncing to the LV will hang (non-interruptible): tried `yum install gdb` to get better trace and sync.

The devices are not suspended.

Expected results:
Command may fail but the system should be usable.

Additional info:

Comment 2 Marian Csontos 2014-04-23 12:33:13 UTC
Created attachment 888887 [details]
SysRq

Comment 3 RHEL Program Management 2014-05-01 05:47:44 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 7 Marian Csontos 2016-04-08 11:50:52 UTC
Heinz, were you able to reproduce?

On 7.2, using XFS this time, the same.

Leaving  dmeventd to handle the situation works fine.

But calling `vgreduce --removemissing VG` while there is an unknown device in the mirror hangs and can not be killed.

The command is attempting to write to /etc/lvm/archive OR backup which is on the incomplete volume. Disabling both archive and mirror will not cause freeze.

In case of mirror we have at least 2 PVs both should have a MDA and backup/archive are thus (in my opinion) less important and sync could be skipped.

Another possibile solution is to use RAID1 for root FS.

Comment 10 Heinz Mauelshagen 2017-05-10 14:02:47 UTC
Marian,

I assume the kernel fix as of https://bugzilla.redhat.com/show_bug.cgi?id=1383444 addresses this?
Can you please retest?

Comment 12 Marian Csontos 2017-06-07 11:51:02 UTC
With latest(?) 7.4 kernel (... Not tainted 3.10.0-675.el7.x86_64 ...) still hanging:

[  840.228644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
[  840.232227] jbd2/dm-3-8     D ffffffff816a36e0     0   505      2 0x00000000^M
[  840.234337]  ffff880034d0f9f0 0000000000000046 ffff8800347f8fb0 ffff880034d0ffd8^M
[  840.236553]  ffff880034d0ffd8 ffff880034d0ffd8 ffff8800347f8fb0 ffff88007fc16cc0^M
[  840.241330]  0000000000000000 7fffffffffffffff ffff88007ff661e8 ffffffff816a36e0^M
[  840.245547] Call Trace:^M
[  840.248134]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.251430]  [<ffffffff816a55b9>] schedule+0x29/0x70^M
[  840.253534]  [<ffffffff816a30c9>] schedule_timeout+0x239/0x2c0^M
[  840.255586]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20^M
[  840.257674]  [<ffffffff810e922c>] ? ktime_get_ts64+0x4c/0xf0^M
[  840.259645]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20^M
[  840.261684]  [<ffffffff810e922c>] ? ktime_get_ts64+0x4c/0xf0^M
[  840.263638]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.265501]  [<ffffffff816a4c3d>] io_schedule_timeout+0xad/0x130^M
[  840.267493]  [<ffffffff816a4cd8>] io_schedule+0x18/0x20^M
[  840.269344]  [<ffffffff816a36f1>] bit_wait_io+0x11/0x50^M
[  840.271180]  [<ffffffff816a3215>] __wait_on_bit+0x65/0x90^M
[  840.273041]  [<ffffffff81181a61>] wait_on_page_bit+0x81/0xa0^M
[  840.274967]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40^M
[  840.276927]  [<ffffffff81181b91>] __filemap_fdatawait_range+0x111/0x190^M
[  840.279025]  [<ffffffff812f6bc0>] ? submit_bio+0x70/0x150^M
[  840.280929]  [<ffffffff8123a7f5>] ? bio_alloc_bioset+0x115/0x310^M
[  840.282915]  [<ffffffff81181c24>] filemap_fdatawait_range+0x14/0x30^M
[  840.284935]  [<ffffffff81181c67>] filemap_fdatawait+0x27/0x30^M
[  840.286939]  [<ffffffffc01f5ac1>] jbd2_journal_commit_transaction+0xa81/0x19a0 [jbd2]^M
[  840.289273]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510^M
[  840.291197]  [<ffffffffc01fba79>] kjournald2+0xc9/0x260 [jbd2]^M
[  840.293191]  [<ffffffff810b1910>] ? wake_up_atomic_t+0x30/0x30^M
[  840.295168]  [<ffffffffc01fb9b0>] ? commit_timeout+0x10/0x10 [jbd2]^M
[  840.297213]  [<ffffffff810b098f>] kthread+0xcf/0xe0^M
[  840.299002]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40^M
[  840.301032]  [<ffffffff816b1018>] ret_from_fork+0x58/0x90^M
[  840.302953]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40^M
[  840.305074] INFO: task vgreduce:1485 blocked for more than 120 seconds.^M
[  840.307196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
[  840.309504] vgreduce        D ffffffff816a36e0     0  1485   1333 0x00000080^M
[  840.311761]  ffff880076a8b8f0 0000000000000086 ffff880076e95e20 ffff880076a8bfd8^M
[  840.314242]  ffff880076a8bfd8 ffff880076a8bfd8 ffff880076e95e20 ffff88007fc16cc0^M
[  840.316607]  0000000000000000 7fffffffffffffff ffff88007ff5df50 ffffffff816a36e0^M
[  840.319037] Call Trace:^M
[  840.320494]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.322430]  [<ffffffff816a55b9>] schedule+0x29/0x70^M
[  840.324320]  [<ffffffff816a30c9>] schedule_timeout+0x239/0x2c0^M
[  840.326371]  [<ffffffff810cb5ec>] ? set_next_entity+0x3c/0xe0^M
[  840.328399]  [<ffffffff810295da>] ? __switch_to+0x15a/0x510^M
[  840.330391]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20^M
[  840.332474]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.334367]  [<ffffffff816a4c3d>] io_schedule_timeout+0xad/0x130^M
[  840.336341]  [<ffffffff816a4cd8>] io_schedule+0x18/0x20^M
[  840.338183]  [<ffffffff816a36f1>] bit_wait_io+0x11/0x50^M
[  840.339978]  [<ffffffff816a3215>] __wait_on_bit+0x65/0x90^M
[  840.341818]  [<ffffffff8123524a>] ? bh_lru_install+0x18a/0x1e0^M
[  840.343718]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.345522]  [<ffffffff816a32c1>] out_of_line_wait_on_bit+0x81/0xb0^M
[  840.347495]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40^M
[  840.349428]  [<ffffffffc01f3cc5>] do_get_write_access+0x285/0x4c0 [jbd2]^M
[  840.351472]  [<ffffffff8123582c>] ? __find_get_block+0xbc/0x120^M
[  840.353423]  [<ffffffffc01f3f27>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]^M
[  840.355646]  [<ffffffffc0247d2b>] __ext4_journal_get_write_access+0x3b/0x80 [ext4]^M
[  840.357890]  [<ffffffffc0212547>] __ext4_new_inode+0x447/0x12c0 [ext4]^M
[  840.359945]  [<ffffffffc0224038>] ext4_create+0xd8/0x190 [ext4]^M
[  840.361915]  [<ffffffff8120ca3d>] vfs_create+0xcd/0x130^M
[  840.363737]  [<ffffffff8120dd8f>] do_last+0xbff/0x1280^M
[  840.365578]  [<ffffffff812109c2>] path_openat+0xc2/0x490^M
[  840.367450]  [<ffffffff81210eed>] ? putname+0x3d/0x60^M
[  840.369274]  [<ffffffff8121218b>] do_filp_open+0x4b/0xb0^M
[  840.371134]  [<ffffffff8121f23a>] ? __alloc_fd+0x8a/0x130^M
[  840.372986]  [<ffffffff811ff0b3>] do_sys_open+0xf3/0x1f0^M
[  840.374815]  [<ffffffff811ff1ce>] SyS_open+0x1e/0x20^M
[  840.376595]  [<ffffffff816b10c9>] system_call_fastpath+0x16/0x1b^M

As a mitigation, one should either:

- keep dmeventd on (this should be default with mirrors anyway) [1]
- use RAID for root FS
- disable archive/backup of LVM metadata (not recommended)

[1] However, it can still go wrong if anything calls `vgreduce --removemissing VG` while mirror is partial as it will hang the system.

Comment 13 Jonathan Earl Brassow 2017-06-07 14:34:52 UTC
If LVM is writing archive/backup while the mirror device is dead, it is likely doing it in a wrong place - it should do it after a repair.

This bug is not severe enough to assign blocker status.