1139216 – MD metadata version 1.0 confusing LVM tools using lvmetad, ending up with "No device found for PV" message

Bug 1139216 - MD metadata version 1.0 confusing LVM tools using lvmetad, ending up with "No device found for PV" message

Summary: MD metadata version 1.0 confusing LVM tools using lvmetad, ending up with "No...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	lvm2
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Petr Rockai
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-09-08 12:18 UTC by Peter Rajnoha
Modified:	2014-09-30 09:54 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-09-30 09:54:41 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
md metadata version 1.0 versus lvmetad (3.86 KB, application/octet-stream) 2014-09-08 12:18 UTC, Peter Rajnoha	no flags	Details
proposed patch (7.67 KB, patch) 2014-09-09 15:26 UTC, Petr Rockai	no flags	Details \| Diff
View All

Description Peter Rajnoha 2014-09-08 12:18:08 UTC

Created attachment 935324 [details]
md metadata version 1.0 versus lvmetad

When using MD with metadata version 1.0 (which means the MD signature is written at the end of underlying devices), we can fool LVM tools/lvmetad to try looking for the device which is normally filtered out by MD filter (md_component_detection=1 which is used by default). Why filtering is OK here, the message "No device ound for PV ..." is confusing and it should not be issued in this case.

Reproducer:

[0] f20/~ # mdadm --create /dev/md0 --metadata 1.0 --level=1 --raid-devices=2 /dev/sda /dev/sdb
mdadm: array /dev/md0 started.

[0] f20/~ # pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created

[0] f20/~ # pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/md0          lvm2 ---  127.94m 127.94m
  /dev/vda2  fedora lvm2 a--    9.50g      0 

[0] f20/~ # pvscan --cache
  Found duplicate PV 1xeIn2yuraCkfHupuUdNAqe9DEKsSP8u: using /dev/md0 not /dev/sda
  Found duplicate PV 1xeIn2yuraCkfHupuUdNAqe9DEKsSP8u: using /dev/sdb not /dev/md0

[0] f20/~ # pvs
  No device found for PV 1xeIn2-yura-CkfH-upuU-dNAq-e9DE-KsSP8u.
  PV         VG     Fmt  Attr PSize PFree
  /dev/vda2  fedora lvm2 a--  9.50g    0 

Since we're writing MD signature at the end of underlying devices, when writing PV header on the MD device /dev/md0, the PV header ends up written at the beginning of these MD component devices just like it would end up when writing to the /dev/sda or /dev/sdb directly.

This all ends up with confusion where we see 3 PV headers at once when considering the reproducer above - found in /dev/sda, /dev/sdb and /dev/md0 which is on top of /dev/sda and /dev/sdb. And somehow the interaction between MD filter and list of PVs from lvmetad does not work well together which ends up with that message "No device found for PV".

(Attaching logs from both pvscan cache and pvs, this is with upstream code as of c774d9a3f35c312d5b24c4ba13da707ccd975194)

Comment 1 Peter Rajnoha 2014-09-08 12:21:18 UTC

I suppose this may be also the cause of bug #1018852. (But I need to ask for MD metadata used there still...)

Comment 2 Peter Rajnoha 2014-09-08 12:34:21 UTC

The source of the problem seems to be that pvscan --cache tries to detect PV header duplicates (from pvscan_cache.log which is attached):

#cache/lvmetad.c:814         Telling lvmetad to store PV /dev/sda (1xeIn2-yura-CkfH-upuU-dNAq-e9DE-KsSP8u)
#cache/lvmcache.c:1544   Found duplicate PV 1xeIn2yuraCkfHupuUdNAqe9DEKsSP8u: using /dev/md0 not /dev/sda

The pvscan --cache should not do any analysis and it should not detect any duplicates - if it sees /dev/sda, /dev/sdb and /dev/md0 to contain the same PV header, it should just send this to lvmetad. The filtering and analysis should be only done on client side when processing the list that lvmetad sent to the LVM tool.

Comment 3 Peter Rajnoha 2014-09-08 12:39:30 UTC

(I suppose the same problem would arise with mpath devices which have similar layout - two or more devices underneath and mpath one on top.)

Comment 4 Petr Rockai 2014-09-09 13:08:38 UTC

Well, lvmetad currently only maintains 1:1 mapping between PV UUIDs and device numbers. The component filters should be (and are) “global” filters: if a device is part of MD, you should treat it as “not a PV” regardless of what PV label(s) you find on it, at pvscan --cache level. We use cmd->lvmetad_filter in creating the iterator in lvmetad_pvscan_all_devs, precisely to skip devices that we don't want to see -- this is global_filter and any component detection and similar filters. Only looking at the code, this logic is not actually implemented as intended, only the regex global_filter is part of lvmetad_filter. So I am pretty sure that this is the actual problem here.

And yes, the same goes for mpath. It's pretty weird that this didn't pop up earlier, because I don't think it is a new behaviour/regression -- it's been like this for a long while. In other words, we always scan all the component devices and the last one to finish scanning “wins”. Not good at all.

The messages from lvmcache that you see about duplicate UUIDs are because lvmcache is tangled into label reading code -- we don't follow the logic, but lvmcache still prints the messages when we read the labels. Those will go away when lvmetad_filter is fixed.

Comment 5 Peter Rajnoha 2014-09-09 13:26:46 UTC

(In reply to Petr Rockai from comment #4)
> Well, lvmetad currently only maintains 1:1 mapping between PV UUIDs and
> device numbers. The component filters should be (and are) “global” filters:
> if a device is part of MD, you should treat it as “not a PV” regardless of
> what PV label(s) you find on it, at pvscan --cache level. We use
> cmd->lvmetad_filter in creating the iterator in lvmetad_pvscan_all_devs,
> precisely to skip devices that we don't want to see -- this is global_filter
> and any component detection and similar filters. Only looking at the code,
> this logic is not actually implemented as intended, only the regex
> global_filter is part of lvmetad_filter. So I am pretty sure that this is
> the actual problem here.

Hmm, I'm a bit confused now - shouldn't "global_filter" be just the actual setting provided by users via "devices/global_filter" lvm.conf setting? (...to filter out devices the user actually knows are not suitable - e.g. if he knows he's doing copies where duplicate VG names/UUIDs can appear).

How does the component filter translates into "global_filter"? Does that mean that the global_filter used in the code is actually the devices/global_filter settting (the part that user provides) + outcome of component filtering (the part that LVM does automatically)?

Comment 6 Peter Rajnoha 2014-09-09 13:30:18 UTC

I thought there was a clear distinction between filters and this sequence was always followed:

  "devices/global_filter" for pvscan --cache to fill lvmetad with info

  "devices/global_filter -> devices/filter -> (any other component filters - md, mpath, part table)" for any LVM command using lvmetad to filter out raw info/list provided by lvmetad

Comment 7 Petr Rockai 2014-09-09 15:26:43 UTC

Created attachment 935795 [details]
proposed patch

Further testing confirms that this is a filtering problem. I am waiting for some test results before checking in a fix, a tentative patch is attached.

Comment 8 Petr Rockai 2014-09-09 17:21:53 UTC

Yes, but the component filters should be part of the global filter, not part of the client-side filter, because they precisely exist to avoid duplicated UUIDs. The invariant for lvmetad is that pvscan --cache should only report one device per PV UUID. What it did up until now was wrong: pvscan --cache would happily report that /dev/sda was PV with UUID foo, even though /dev/sda is part of /dev/md0. At that point, lvmetad believes that PV foo lives on /dev/sda. If another pvscan --cache comes around later, looking at /dev/md0 and tells lvmetad, hey, PV foo lives on /dev/md0, lvmetad updates its data about PV foo. All is well at that point, but only because the scan for /dev/md0 came in last.

So the correct (patched) sequence is:

lvmetad_filter = sysfs -> devices/global_filter -> device type filter -> mpath component -> partitioned filter -> md component
client = lvmetad_filter -> devices/filter -> persistent

where persistent is a bit messy, as when lvmetad is in use, it is not persistent in any way (/etc/lvm/.cache is ignored), but it checks for suspended devices and stuff like that...

Sow that I think of it though, I now also have to include values of the sysfs/mpath/md filter options into the lvmetad token.

Comment 9 Petr Rockai 2014-09-09 17:24:41 UTC

(Just to clarify one thing about the persistent filter: it sits “on top” of lvmetad_filter + devices/filter, not “behind” -- the magic checking it does happens before the other filters run on a device, so the protection from reading suspended devices should still work, as it won't ask the MD component filter about a suspended device at all.)

Comment 10 Petr Rockai 2014-09-09 17:57:00 UTC

TODO: We want to add a warning about using devices/filter in lvm.conf, for people that migrated from older versions and may be affected by changes in filter semantics -- both this one and for the case they enable lvmetad (or their distro does it for them). We should figure out a good way to trigger that warning, without being overly annoying. Candidates are: 1) warn when lvm.conf has filter set, but not global_filter *and* --config is not in use 2) release notes 3) a script running at upgrade time (would be distro specific) 4) magic upgrade detection (something like, lvm.conf looks like our template but it's missing newer bits in it; probably too complicated and fragile)

Comment 11 Peter Rajnoha 2014-09-10 06:51:29 UTC

Now, if we're calling "pvscan --cache <device>" that should scan just the one device given on cmd line, does it execute the filtering pipe too? (the one as mentioned in commnent #8: lvmetad_filter = sysfs -> devices/global_filter -> device type filter -> mpath component -> partitioned filter -> md component). 

Thing is, that the original design was (at least I was under that impression) that lvmetad caches everything and it's only client that does all the filtering. This really needs to be documented better!

BTW, if we're touching this area at the moment, we should also probably look at bz #1095191 where the global_filter is ignored sometimes for some reason - it happens quite often on my machine - though if I add verbose output, the bug is gone - so it must be some race. Any idea what could it be caused by?

There's also another related bz #989548 that's worth looking at now, maybe also bug #1106425 that falls into similar category.

Comment 12 Petr Rockai 2014-09-10 07:35:22 UTC

That depends on what you mean by filtering. Most of the confusion comes from the fact that filters serve multiple purposes. It has always been the case, as I have repeatedly said both here and elsewhere, that as far as lvmetad is concerned, UUIDs are *unique* -- it says so right in the name, Universally Unique IDentifier, afterall. So if you have multiple devices that all exhibit the same PV label (and thus, PV UUID), what you are claiming is that it *doesn't matter which device you use to access the PV*. If this is not the case (i.e. MD component devices), you must somehow pick the right device. The various global filters serve this purpose.

In other words, global filters decide whether a thing is a PV at all. On the other hand, client-side filters decide which PVs you want to work with (or don't want to work with). So /dev/sda is *not* a PV if it is part of /dev/md0 which is a PV itself, whether you see a label on it or no. If you tell lvmetad that such a non-PV is a in fact a PV, lvmetad is happy to oblige, but the result is not what you want.

So again, this “original design” you mention is filtering of PVs in the cases where filtering means “I only want to work with those 3 PVs right now, even though there's another 2000 in the system”. This is normally achieved with --config, or maybe config file cascading. This is because lvmetad needs a global view of the system, i.e. see all PVs. However, if the question is “is this thing even a PV”, that needs to be decided before lvmetad. So pvscan --cache is the correct place to do that.

Comment 13 Peter Rajnoha 2014-09-10 08:11:26 UTC

(In reply to Petr Rockai from comment #12)
> In other words, global filters decide whether a thing is a PV at all. On the
> other hand, client-side filters decide which PVs you want to work with (or
> don't want to work with). So /dev/sda is *not* a PV if it is part of
> /dev/md0 which is a PV itself, whether you see a label on it or no. If you
> tell lvmetad that such a non-PV is a in fact a PV, lvmetad is happy to
> oblige, but the result is not what you want.

Sure, I understand the reasoning behind here. Thing is that even I thought that it was only devices/global_filter that was applied for the PV list passed to lvmetad. IOW, it's not clear from the documentation (I mean the comments in the lvm.conf which is currently the only documention on this filtering matter at the moment) that there's also any other filtering applied. So we definitely need to add a few more words on this subject.

> 
> So again, this “original design” you mention is filtering of PVs in the
> cases where filtering means “I only want to work with those 3 PVs right now,
> even though there's another 2000 in the system”. This is normally achieved
> with --config, or maybe config file cascading.

(---config was not even documented until I added a few words on it not long time ago, anyway... :) )

> This is because lvmetad needs
> a global view of the system, i.e. see all PVs. However, if the question is
> “is this thing even a PV”, that needs to be decided before lvmetad. So
> pvscan --cache is the correct place to do that.

Well, OK then, I'm not against this filtering scheme. The only thing is that I always thought that lvmetad was a pure cache only - that it just replaces direct scan on devices. And the logic to filter out proper devs is always in tools - that's what we claimed all the time (at least that was my impression). And the documentation doesn't make it clear at all here. So we need to improve it...

(My suggestion here would be to use nice simple ascii art to document this - with nice view of the filtering pipe, how it goes to lvmetad and then back to lvm tools. One picture will replace tons of doc words.)

Comment 14 Peter Rajnoha 2014-09-10 08:36:12 UTC

(In reply to Petr Rockai from comment #10)
> TODO: We want to add a warning about using devices/filter in lvm.conf, for
> people that migrated from older versions and may be affected by changes in
> filter semantics -- both this one and for the case they enable lvmetad (or
> their distro does it for them). We should figure out a good way to trigger
> that warning, without being overly annoying. Candidates are: 1) warn when
> lvm.conf has filter set, but not global_filter *and* --config is not in use

This seems to be the best probably. To remove the warning, people would need to state "global_filter = []" or so ("blank" global_filter) in case they really mean this.

> 2) release notes

This works good for RHEL (or any enterprise-level distribution), but not for Fedora (or any other desktop/home-use distribution).

 3) a script running at upgrade time (would be distro
> specific)

Other distros may forget to handle this properly.

4) magic upgrade detection (something like, lvm.conf looks like
> our template but it's missing newer bits in it; probably too complicated and
> fragile)

We can already do this by making use of "lvm dumpconfig --type missing", but what would we do with that information anyway? Add warning on package upgrade - if there are 1000 upgrades at a time, people won't read that.

So it seems 1) is really the best here.

Comment 15 Petr Rockai 2014-09-10 09:14:13 UTC

Yes, lvmetad *is* a pure cache. Filtering happens in tools, but on both sides of lvmetad: global filters before (in pvscan), local filters after (in everything else). It's precisely because lvmetad does not have any logic to decide which devices are better PV candidates than others that we need the before-lvmetad filters (which live in pvscan, i.e. the tool that feeds data into lvmetad). If we decided to make lvmetad smarter (smarter than a dumb cache daemon, that is), then we would have the option to avoid global filters altogether. We could also allow UUID non-uniqueness at the level of lvmetad and sort out which candidate device is the right one at client side post-lvmetad. That has a substantial disadvantage of having to open and read from the devices, defeating the no-disk-access workflow currently possible with lvmetad for many tools.

Comment 16 Peter Rajnoha 2014-09-10 11:02:27 UTC

OK, so let's document this - I'd go with a paragraph in man lvm/lvm.conf to describe this relation with lvmetad better. I can try to write up something... As with current comments/doc lines as they are, it's a bit confusing.

Comment 17 Petr Rockai 2014-09-30 09:54:41 UTC

I have updated the comments in example.conf. Along with the other filter patches, this should be now fixed. (We now also have a related BZ, see bug 1147217; should be fixed by the same patches).

Note You need to log in before you can comment on or make changes to this bug.