Bug 1090471

Summary:

`vgreduce --removemissing` while there are failed mirror LV legs hangs any sync to LV

Product:

Red Hat Enterprise Linux 7

Reporter:

Marian Csontos <mcsontos>

Component:

lvm2

Assignee:

Heinz Mauelshagen <heinzm>

lvm2 sub component:

Mirroring and RAID

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED WONTFIX

Docs Contact:

Severity:

unspecified

Priority:

unspecified

CC:

agk, cmarthal, heinzm, jbrassow, lmiksik, mcsontos, msnitzer, prajnoha, zkabelac

Version:

7.0

Target Milestone:

Target Release:

7.8

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-02-28 20:19:16 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1307111, 1320730

Attachments:

Description	Flags
SysRq	none

Description Marian Csontos 2014-04-23 12:10:16 UTC

Description of problem:
I have a VM with 4 PVs.
Created mirrored LV with "-m 1".

When testing mirror LV recovery I removed secondary leg, and again (while being synchronized) and again (while being synchronized).

In order to continue with testing to remove further legs, I tried to run `vgreduce --removemissing $VG` the system hanged.

Version-Release number of selected component (if applicable):
lvm2-2.02.105-14.el7
kernel-3.10.0-121.el7

How reproducible:
100%

Steps to Reproduce:
0. ROOT FS on mirrored LV
1. remove secondary leg (echo 1 > /sys/block/sdX/device/delete), repeat if necessary until there are no more replacement legs and the mirror is partial
2. vgreduce --remove-missing $VG

Actual results:
Command hangs after output:

    PV $UUID1 not recognised. Is the device missing?
    PV $UUID2 not recognised. Is the device missing?
    PV $UUID1 not recognised. Is the device missing?
    PV $UUID2 not recognised. Is the device missing?
    WARNING: Partial LV lv_master needs to be repaired or removed.
    WARNING: Partial LV lv_master_mimage_1 needs to be repaired or removed.
    There are still partial LVs in VG vg_stacked.
    To remove them unconditionally use: vgreduce --removemissing --force.
    Proceeding to remove empty missing PVs.

Stack:

    [<ffffffffa019b195>] jbd2_log_wait_commit+0xc5/0x150 [jbd2]
    [<ffffffffa019d884>] jbd2_complete_transaction+0x54/0xa0 [jbd2]
    [<ffffffffa01b070f>] ext4_sync_file+0x1df/0x330 [ext4]
    [<ffffffff811df8b5>] do_fsync+0x65/0xa0
    [<ffffffff811dfbb0>] SyS_fsync+0x10/0x20
    [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

I can still ssh to the VM, but anything syncing to the LV will hang (non-interruptible): tried `yum install gdb` to get better trace and sync.

The devices are not suspended.

Expected results:
Command may fail but the system should be usable.

Additional info:

Comment 2 Marian Csontos 2014-04-23 12:33:13 UTC

Created attachment 888887 [details]
SysRq

Comment 3 RHEL Program Management 2014-05-01 05:47:44 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 7 Marian Csontos 2016-04-08 11:50:52 UTC

Heinz, were you able to reproduce?

On 7.2, using XFS this time, the same.

Leaving  dmeventd to handle the situation works fine.

But calling `vgreduce --removemissing VG` while there is an unknown device in the mirror hangs and can not be killed.

The command is attempting to write to /etc/lvm/archive OR backup which is on the incomplete volume. Disabling both archive and mirror will not cause freeze.

In case of mirror we have at least 2 PVs both should have a MDA and backup/archive are thus (in my opinion) less important and sync could be skipped.

Another possibile solution is to use RAID1 for root FS.

Comment 10 Heinz Mauelshagen 2017-05-10 14:02:47 UTC

Marian,

I assume the kernel fix as of https://bugzilla.redhat.com/show_bug.cgi?id=1383444 addresses this?
Can you please retest?

Comment 12 Marian Csontos 2017-06-07 11:51:02 UTC

With latest(?) 7.4 kernel (... Not tainted 3.10.0-675.el7.x86_64 ...) still hanging:

[  840.228644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
[  840.232227] jbd2/dm-3-8     D ffffffff816a36e0     0   505      2 0x00000000^M
[  840.234337]  ffff880034d0f9f0 0000000000000046 ffff8800347f8fb0 ffff880034d0ffd8^M
[  840.236553]  ffff880034d0ffd8 ffff880034d0ffd8 ffff8800347f8fb0 ffff88007fc16cc0^M
[  840.241330]  0000000000000000 7fffffffffffffff ffff88007ff661e8 ffffffff816a36e0^M
[  840.245547] Call Trace:^M
[  840.248134]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.251430]  [<ffffffff816a55b9>] schedule+0x29/0x70^M
[  840.253534]  [<ffffffff816a30c9>] schedule_timeout+0x239/0x2c0^M
[  840.255586]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20^M
[  840.257674]  [<ffffffff810e922c>] ? ktime_get_ts64+0x4c/0xf0^M
[  840.259645]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20^M
[  840.261684]  [<ffffffff810e922c>] ? ktime_get_ts64+0x4c/0xf0^M
[  840.263638]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.265501]  [<ffffffff816a4c3d>] io_schedule_timeout+0xad/0x130^M
[  840.267493]  [<ffffffff816a4cd8>] io_schedule+0x18/0x20^M
[  840.269344]  [<ffffffff816a36f1>] bit_wait_io+0x11/0x50^M
[  840.271180]  [<ffffffff816a3215>] __wait_on_bit+0x65/0x90^M
[  840.273041]  [<ffffffff81181a61>] wait_on_page_bit+0x81/0xa0^M
[  840.274967]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40^M
[  840.276927]  [<ffffffff81181b91>] __filemap_fdatawait_range+0x111/0x190^M
[  840.279025]  [<ffffffff812f6bc0>] ? submit_bio+0x70/0x150^M
[  840.280929]  [<ffffffff8123a7f5>] ? bio_alloc_bioset+0x115/0x310^M
[  840.282915]  [<ffffffff81181c24>] filemap_fdatawait_range+0x14/0x30^M
[  840.284935]  [<ffffffff81181c67>] filemap_fdatawait+0x27/0x30^M
[  840.286939]  [<ffffffffc01f5ac1>] jbd2_journal_commit_transaction+0xa81/0x19a0 [jbd2]^M
[  840.289273]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510^M
[  840.291197]  [<ffffffffc01fba79>] kjournald2+0xc9/0x260 [jbd2]^M
[  840.293191]  [<ffffffff810b1910>] ? wake_up_atomic_t+0x30/0x30^M
[  840.295168]  [<ffffffffc01fb9b0>] ? commit_timeout+0x10/0x10 [jbd2]^M
[  840.297213]  [<ffffffff810b098f>] kthread+0xcf/0xe0^M
[  840.299002]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40^M
[  840.301032]  [<ffffffff816b1018>] ret_from_fork+0x58/0x90^M
[  840.302953]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40^M
[  840.305074] INFO: task vgreduce:1485 blocked for more than 120 seconds.^M
[  840.307196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
[  840.309504] vgreduce        D ffffffff816a36e0     0  1485   1333 0x00000080^M
[  840.311761]  ffff880076a8b8f0 0000000000000086 ffff880076e95e20 ffff880076a8bfd8^M
[  840.314242]  ffff880076a8bfd8 ffff880076a8bfd8 ffff880076e95e20 ffff88007fc16cc0^M
[  840.316607]  0000000000000000 7fffffffffffffff ffff88007ff5df50 ffffffff816a36e0^M
[  840.319037] Call Trace:^M
[  840.320494]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.322430]  [<ffffffff816a55b9>] schedule+0x29/0x70^M
[  840.324320]  [<ffffffff816a30c9>] schedule_timeout+0x239/0x2c0^M
[  840.326371]  [<ffffffff810cb5ec>] ? set_next_entity+0x3c/0xe0^M
[  840.328399]  [<ffffffff810295da>] ? __switch_to+0x15a/0x510^M
[  840.330391]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20^M
[  840.332474]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.334367]  [<ffffffff816a4c3d>] io_schedule_timeout+0xad/0x130^M
[  840.336341]  [<ffffffff816a4cd8>] io_schedule+0x18/0x20^M
[  840.338183]  [<ffffffff816a36f1>] bit_wait_io+0x11/0x50^M
[  840.339978]  [<ffffffff816a3215>] __wait_on_bit+0x65/0x90^M
[  840.341818]  [<ffffffff8123524a>] ? bh_lru_install+0x18a/0x1e0^M
[  840.343718]  [<ffffffff816a36e0>] ? bit_wait+0x50/0x50^M
[  840.345522]  [<ffffffff816a32c1>] out_of_line_wait_on_bit+0x81/0xb0^M
[  840.347495]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40^M
[  840.349428]  [<ffffffffc01f3cc5>] do_get_write_access+0x285/0x4c0 [jbd2]^M
[  840.351472]  [<ffffffff8123582c>] ? __find_get_block+0xbc/0x120^M
[  840.353423]  [<ffffffffc01f3f27>] jbd2_journal_get_write_access+0x27/0x40 [jbd2]^M
[  840.355646]  [<ffffffffc0247d2b>] __ext4_journal_get_write_access+0x3b/0x80 [ext4]^M
[  840.357890]  [<ffffffffc0212547>] __ext4_new_inode+0x447/0x12c0 [ext4]^M
[  840.359945]  [<ffffffffc0224038>] ext4_create+0xd8/0x190 [ext4]^M
[  840.361915]  [<ffffffff8120ca3d>] vfs_create+0xcd/0x130^M
[  840.363737]  [<ffffffff8120dd8f>] do_last+0xbff/0x1280^M
[  840.365578]  [<ffffffff812109c2>] path_openat+0xc2/0x490^M
[  840.367450]  [<ffffffff81210eed>] ? putname+0x3d/0x60^M
[  840.369274]  [<ffffffff8121218b>] do_filp_open+0x4b/0xb0^M
[  840.371134]  [<ffffffff8121f23a>] ? __alloc_fd+0x8a/0x130^M
[  840.372986]  [<ffffffff811ff0b3>] do_sys_open+0xf3/0x1f0^M
[  840.374815]  [<ffffffff811ff1ce>] SyS_open+0x1e/0x20^M
[  840.376595]  [<ffffffff816b10c9>] system_call_fastpath+0x16/0x1b^M

As a mitigation, one should either:

- keep dmeventd on (this should be default with mirrors anyway) [1]
- use RAID for root FS
- disable archive/backup of LVM metadata (not recommended)

[1] However, it can still go wrong if anything calls `vgreduce --removemissing VG` while mirror is partial as it will hang the system.

Comment 13 Jonathan Earl Brassow 2017-06-07 14:34:52 UTC

If LVM is writing archive/backup while the mirror device is dead, it is likely doing it in a wrong place - it should do it after a repair.

This bug is not severe enough to assign blocker status.