Bug 644612
| Summary: | lvm operations deadlock while waiting for user input | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Corey Marthaler <cmarthal> | ||||
| Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||
| lvm2 sub component: | Command-line tools | QA Contact: | cluster-qe <cluster-qe> | ||||
| Status: | CLOSED WONTFIX | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | medium | CC: | agk, coughlan, dwysocha, heinzm, jbrassow, joe.thornber, lvm-team, mpatocka, msnitzer, nkshirsa, prajnoha, thornber, zkabelac | ||||
| Version: | 8.3 | Keywords: | Triaged | ||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||
| Target Release: | 8.4 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-11-01 03:02:29 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 756082 | ||||||
| Attachments: |
|
||||||
Oct 19 14:59:37 hayes-02 kernel: lvconvert S ffff88011fc24100 0 32645 32644 0x00000080 Oct 19 14:59:37 hayes-02 kernel: ffff880218e87a88 0000000000000086 0000000000000000 ffffffff8126644c Oct 19 14:59:37 hayes-02 kernel: ffff880218e87a08 0000000800000082 ffff88021a5a8ab0 0000000100a12629 Oct 19 14:59:37 hayes-02 kernel: ffff880218e590e8 ffff880218e87fd8 0000000000010518 ffff880218e590e8 Oct 19 14:59:37 hayes-02 kernel: Call Trace: Oct 19 14:59:37 hayes-02 kernel: [<ffffffff8126644c>] ? __bitmap_weight+0x8c/0xb0 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff814c8fc5>] schedule_timeout+0x225/0x2f0 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff81401829>] sk_wait_data+0xd9/0xe0 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff8145045b>] tcp_recvmsg+0x2cb/0xe80 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff812069f1>] ? avc_has_perm+0x71/0x90 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff814007c9>] sock_common_recvmsg+0x39/0x50 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff81400361>] sock_aio_read+0x181/0x190 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff8116c65a>] do_sync_read+0xfa/0x140 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff8120bf0f>] ? selinux_file_permission+0xbf/0x150 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff8116d151>] vfs_read+0x181/0x1a0 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff8116d1c1>] sys_read+0x51/0x90 Oct 19 14:59:37 hayes-02 kernel: [<ffffffff81013172>] system_call_fastpath+0x16/0x1b Created attachment 454434 [details]
log from hayes-02
Hit this again last night, bumping priority. I should have known. The cmd isn't hung, it's just waiting to user input. [root@hayes-01 ~]# lvconvert -m 1 --mirrorlog core centipede/mirror /dev/etherd/e1.1p9 Full resync required to convert inactive mirror mirror to core log. Proceed? [y/n]: However, should that really lock all other lvm operations in the mean time? Especially if the convert was scripted like in this test? Once in this state how do I get out of it? [root@hayes-01 ~]# pvs [STUCK] [root@hayes-01 ~]# lvs -a -o +devices [STUCK] Indeed - locks should not be held while waiting for user input:) Some amount of code restructuring may be required to fix this. Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as an exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Adding QA ack for 6.2. Devel will need to provide unit testing results however before this bug can be ultimately verified by QA. I am moving this one to rhel7 for consideration. It is not high enough priority to target rhel6 anymore. Hmm - we all on going changes this seems to failing of the radar. We seem to convert now everything into process_each_vg loop. However to handle prompts better - we should rather let just pick matching 'vg'(s) to select rule. Drop lock Prompt Reacquire lock - compare VG in-process has not been changed and continue with operation. Doing this via 'process_each_vg' loop seems to be scaling complexity and efficiency of such operation. Passing this to David as he now masters this code. This will be an incremental project, more an evolution of the code. There are two ways we should go about fixing this: 1. Modify the process_each loops to do two loops, and prompting between the two loops without locks held. This has been done for the new pvcreate (still needing review), which shows how it can be done for other cases. 2. Reconsider if some of the prompts are really needed. In some cases prompts are really warranted, but in other cases we could probably remove the prompt without causing any real problem (the prompt about about resync seems like one that could be removed without any harm.) The specific case in the description was 'lvconvert -m1' asking the question: Full resync required to convert inactive mirror mirror to core log. Proceed? [y/n]: We will not have process_each_lv reworked to handle a prompting-without-locks phase in the immediate future, but it may be possible to to simply remove that prompt. Someone who is more familiar with this specific resync case and the possible consequences of removing the prompt would have to answer that. (In reply to David Teigland from comment #26) > The specific case in the description was 'lvconvert -m1' asking the question: > > Full resync required to convert inactive mirror mirror to core log. Proceed? > [y/n]: > > We will not have process_each_lv reworked to handle a > prompting-without-locks phase in the immediate future, but it may be > possible to to simply remove that prompt. > Someone who is more familiar with this specific resync case and the possible > consequences of removing the prompt would have to answer that. When a user switches to a core log, it is going to require a full sync for EVERY activation thereafter - a known limitation of using core log. I'm fine with removing the prompt. I'm also fine closing this bug WONTFIX (legacy mirror implementation); but would prefer the prompt removal. Side note - this is outstanding global issue - it's not a mirror bug as such. We don't have yet any solution for the problem when lvm2 detects a 'promptable' issue and command ask a users for y|n. During such prompt we hold all the locks to avoid grabing system state again. Dropping WRITE locks while prompt is shown may get very complicated if there has happened any change while the lock has been released. And it's not just state of VG as such - during secondary 'rescan' we may discover different set of devices, which rise more prompts. Zdenek is correct, this is not a "mirror" issue. Normally I'd be fine closing this, but this bug turns * 10 yrs old * this month. :) Proposing for rhel8 consideration to force the issue, as this still exists with the latest release. [root@hayes-02 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert POOL cache_sanity -wi-a----- 4.00g POOL_meta cache_sanity -wi-a----- 12.00m suspend cache_sanity -wi-a----- 4.00g [root@hayes-02 ~]# lvconvert --type cache-pool --cachepolicy smq --cachemode writethrough -c 32 --poolmetadata cache_sanity/POOL_meta cache_sanity/POOL WARNING: Converting cache_sanity/POOL and cache_sanity/POOL_meta to cache pool's data and metadata volumes with metadata wiping. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Do you really want to convert cache_sanity/POOL and cache_sanity/POOL_meta? [y/n]: ### On another term: [root@hayes-02 ~]# pvscan [STUCK] kernel-4.18.0-235.el8 BUILT: Thu Sep 3 13:19:50 CDT 2020 lvm2-2.03.09-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 lvm2-libs-2.03.09-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 device-mapper-1.02.171-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 device-mapper-libs-1.02.171-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |
Description of problem: While testing the latest 6.0.z rpms, I hit this deadlock. --------------------------------------------- TEST CASE=23 base legs 1; convert to 1 legs; base log disk; convert to core log; active 0; sync=1; pvs=sufficient --------------------------------------------- creating a base mirror on hayes-02 Waiting until all mirrors become fully syncd... 0/1 mirror(s) are fully synced: ( 54.29% ) 1/1 mirror(s) are fully synced: ( 100.00% ) deactivating base volume before convert on hayes-02 Converting from 1 leg(s) disk log; to 1 leg(s) core log on hayes-02 lvconvert --mirrorlog core -m 1 centipede/centi_base /dev/etherd/e1.1p1 [HANG] An 'lvs' cmd also hung. I'll post the kernel dump from this machine. Version-Release number of selected component (if applicable): 2.6.32-71.el6.x86_64 lvm2-2.02.72-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 lvm2-libs-2.02.72-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 lvm2-cluster-2.02.72-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 udev-147-2.29.el6 BUILT: Tue Aug 31 16:44:10 CDT 2010 device-mapper-1.02.53-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 device-mapper-libs-1.02.53-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 device-mapper-event-1.02.53-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 device-mapper-event-libs-1.02.53-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 cmirror-2.02.72-8.el6_0.1 BUILT: Mon Oct 11 10:45:21 CDT 2010 How reproducible: Only once so far Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: