RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1403836 - pvmove progress polling returns wrongfully prematurely
Summary: pvmove progress polling returns wrongfully prematurely
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1385242 1403839 1454287
TreeView+ depends on / blocked
 
Reported: 2016-12-12 13:05 UTC by Liron Aravot
Modified: 2021-09-08 20:28 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-04 15:22:07 UTC
Target Upstream Version:
Embargoed:
okozina: needinfo-


Attachments (Terms of Use)
execution log (7.64 KB, text/plain)
2016-12-12 13:05 UTC, Liron Aravot
no flags Details
lvmpolld test - log (5.82 KB, text/plain)
2016-12-14 16:39 UTC, Liron Aravot
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1462792 0 medium CLOSED [BLOCKED on platform bug 1546538] systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is mas... 2023-03-24 13:49:14 UTC

Internal Links: 1462792

Description Liron Aravot 2016-12-12 13:05:09 UTC
Created attachment 1230789 [details]
execution log

Description of problem:
We attempt to make use of pvmove/vgreduce for ovirt storage domains.

When attempting to run pvmove (see exact command in the attached execution log) the command ends prematurely and the progress is being polled only one time.
The pvmove continues to run on the background and further execution of the same command will have the same effect - one polling attempt and then determination of successful execution.
The weird thing is that if i clean the caches and execute the same command again - it'll be polled as expected (see in the attached execution log) although we use use_lvmetad=0 on the command (so cache should have no effect).


Version-Release number of selected component (if applicable):
libblockdev-lvm-1.6-1.fc24.x86_64
lvm2-2.02.150-2.fc24.x86_64
lvm2-libs-2.02.150-2.fc24.x86_64
Fedora release 24 (Twenty Four)

How reproducible:
Always

Steps to Reproduce:
1. Issue pvmove as specified above.

Actual results:
The pvmove is polled only one time.

Expected results:
The command should return when after it was actually finished.

Additional info:

When attempting to execute with -vvvv, I've got the following error -
LVM command executed by lvmpolld failed.
Reloading config files    Wiping internal VG cache';

The pvmove was still ongoing on the background on that case as well.

I'll try to reproduce again with lvmetad service disabled (just a reminder - we execute with use_lvmetad=0, but i know with had other issues on those areas).

Comment 1 Yaniv Lavi 2016-12-12 13:24:50 UTC
Can you please escalate? This is urgent for RHV 4.1 release and should be fixed for 7.3.z.

Comment 2 Ondrej Kozina 2016-12-12 14:44:56 UTC
(In reply to Liron Aravot from comment #0)
> (...)we use use_lvmetad=0 on the command (so cache should have no
> effect).

What do you have set for lvmetad in /etc/lvm/lvm.conf?

> 
> Additional info:
> 
> When attempting to execute with -vvvv, I've got the following error -
> LVM command executed by lvmpolld failed.

Long running operations are managed by lvmpolld daemon since RHEL7.2 IIRC. The pvmove monitoring executed on daemon side failed for some reason.

Could you have a look at the daemon's log for more info?
(journalctl -u lvm2-lvmpolld)

If it's not of much help please try to run it in debug mode by:
systemctl stop lvm2-lvmpolld
systemctl stop lvm2-lvmpolld.socket
lvmpolld -f -l all

and capture the output when you issue your pvmove cmd in another terminal.

I think that what happened is in fact this:

you started the pvmove w/ lvmetad dissabled but lvmpolld daemon runs commands in default config therefore probably with use_lvmetad=1 and lvmetad was not aware of changes in mda made by pvmove cmd with use_lvmetad=0 on cmdline (according to your report)

Comment 3 Liron Aravot 2016-12-12 14:57:40 UTC
Thanks Ondrej,
prior to you comment I made a run with lvm2-lvmetad.service lvm2-lvmetad.socket disabled, pvmove/lvmpolld worked as expected from the first execution - so the scenario you specified makes sense.

I'll report back with the additional info you requested.

Comment 5 Liron Aravot 2016-12-14 16:39:20 UTC
Created attachment 1231806 [details]
lvmpolld test - log

Comment 6 Liron Aravot 2016-12-14 16:40:03 UTC
Ondrej, please find the requested log attached.

Thanks, Liron.

Comment 7 Ondrej Kozina 2016-12-22 13:45:09 UTC
Hi Liron,

it's as I suspected earlier. The command initiated on lvmpolld side runs with use_lvmetad=1 set and therefore doesn't see pvmove0 mirror lv (missing in lvmetad cache). My recommendation is to turn off lvmetad for good if you have bad experience with it and most importantly don't mix commands modifyting lvm metadata (with use_lvmetad=0) and commands with use_lvmetad=1 set while lvmetad is running because that's what's causing the mda corruption on lvmetad side.

The only currently supported setup is to stop lvmetad manually after command modifying mda w/ use_lvmetad=0 was run (also clearly stated in logs). There's plan to stop lvmetad automatically by such harmful command, but that's future feature.

Comment 8 Liron Aravot 2016-12-27 17:59:47 UTC
Hi Ondrej,
Thanks for your reply.
Question - when the lvmpolld command is initiated to monitor the pvmove, why doesn't the use_lvmetad=0 flag passed to it? regardless of the option of disabling lvmetad, passing the flag makes sense to me (if pvmove doesn't use the metadata, so should the lvmpolld command monitoring it).


Nir, can you please review the discussion? As we have other issues with having lvmetad enabled - do we feel safe enough to disable it for 4.1? or do we want to do that for 4.2?

Thanks,
Liron

Comment 9 Ondrej Kozina 2017-01-02 10:04:24 UTC
(In reply to Liron Aravot from comment #8)
> Hi Ondrej,
> Thanks for your reply.
> Question - when the lvmpolld command is initiated to monitor the pvmove, why
> doesn't the use_lvmetad=0 flag passed to it? regardless of the option of
> disabling lvmetad, passing the flag makes sense to me (if pvmove doesn't use
> the metadata, so should the lvmpolld command monitoring it).

Sure, we were thinking about passing the option and few others as well. Unfortunately in this case any update on lvmpolld side would make cache in lvmetad outdated. Some commands run in lvmpolld environment may update lvm mda (pvmove case included). Mixing commands w/ use_lvmetad=1 and =0 while lvmetad is actually running leads to problems sooner or later I'm affraid. Turning lvmetad off (or restart it after cmd w/ use_lvmetad=0 set) is the only reliable option here.


Note You need to log in before you can comment on or make changes to this bug.