Created attachment 935324 [details] md metadata version 1.0 versus lvmetad When using MD with metadata version 1.0 (which means the MD signature is written at the end of underlying devices), we can fool LVM tools/lvmetad to try looking for the device which is normally filtered out by MD filter (md_component_detection=1 which is used by default). Why filtering is OK here, the message "No device ound for PV ..." is confusing and it should not be issued in this case. Reproducer: [0] f20/~ # mdadm --create /dev/md0 --metadata 1.0 --level=1 --raid-devices=2 /dev/sda /dev/sdb mdadm: array /dev/md0 started. [0] f20/~ # pvcreate /dev/md0 Physical volume "/dev/md0" successfully created [0] f20/~ # pvs PV VG Fmt Attr PSize PFree /dev/md0 lvm2 --- 127.94m 127.94m /dev/vda2 fedora lvm2 a-- 9.50g 0 [0] f20/~ # pvscan --cache Found duplicate PV 1xeIn2yuraCkfHupuUdNAqe9DEKsSP8u: using /dev/md0 not /dev/sda Found duplicate PV 1xeIn2yuraCkfHupuUdNAqe9DEKsSP8u: using /dev/sdb not /dev/md0 [0] f20/~ # pvs No device found for PV 1xeIn2-yura-CkfH-upuU-dNAq-e9DE-KsSP8u. PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.50g 0 Since we're writing MD signature at the end of underlying devices, when writing PV header on the MD device /dev/md0, the PV header ends up written at the beginning of these MD component devices just like it would end up when writing to the /dev/sda or /dev/sdb directly. This all ends up with confusion where we see 3 PV headers at once when considering the reproducer above - found in /dev/sda, /dev/sdb and /dev/md0 which is on top of /dev/sda and /dev/sdb. And somehow the interaction between MD filter and list of PVs from lvmetad does not work well together which ends up with that message "No device found for PV". (Attaching logs from both pvscan cache and pvs, this is with upstream code as of c774d9a3f35c312d5b24c4ba13da707ccd975194)
I suppose this may be also the cause of bug #1018852. (But I need to ask for MD metadata used there still...)
The source of the problem seems to be that pvscan --cache tries to detect PV header duplicates (from pvscan_cache.log which is attached): #cache/lvmetad.c:814 Telling lvmetad to store PV /dev/sda (1xeIn2-yura-CkfH-upuU-dNAq-e9DE-KsSP8u) #cache/lvmcache.c:1544 Found duplicate PV 1xeIn2yuraCkfHupuUdNAqe9DEKsSP8u: using /dev/md0 not /dev/sda The pvscan --cache should not do any analysis and it should not detect any duplicates - if it sees /dev/sda, /dev/sdb and /dev/md0 to contain the same PV header, it should just send this to lvmetad. The filtering and analysis should be only done on client side when processing the list that lvmetad sent to the LVM tool.
(I suppose the same problem would arise with mpath devices which have similar layout - two or more devices underneath and mpath one on top.)
Well, lvmetad currently only maintains 1:1 mapping between PV UUIDs and device numbers. The component filters should be (and are) “global” filters: if a device is part of MD, you should treat it as “not a PV” regardless of what PV label(s) you find on it, at pvscan --cache level. We use cmd->lvmetad_filter in creating the iterator in lvmetad_pvscan_all_devs, precisely to skip devices that we don't want to see -- this is global_filter and any component detection and similar filters. Only looking at the code, this logic is not actually implemented as intended, only the regex global_filter is part of lvmetad_filter. So I am pretty sure that this is the actual problem here. And yes, the same goes for mpath. It's pretty weird that this didn't pop up earlier, because I don't think it is a new behaviour/regression -- it's been like this for a long while. In other words, we always scan all the component devices and the last one to finish scanning “wins”. Not good at all. The messages from lvmcache that you see about duplicate UUIDs are because lvmcache is tangled into label reading code -- we don't follow the logic, but lvmcache still prints the messages when we read the labels. Those will go away when lvmetad_filter is fixed.
(In reply to Petr Rockai from comment #4) > Well, lvmetad currently only maintains 1:1 mapping between PV UUIDs and > device numbers. The component filters should be (and are) “global” filters: > if a device is part of MD, you should treat it as “not a PV” regardless of > what PV label(s) you find on it, at pvscan --cache level. We use > cmd->lvmetad_filter in creating the iterator in lvmetad_pvscan_all_devs, > precisely to skip devices that we don't want to see -- this is global_filter > and any component detection and similar filters. Only looking at the code, > this logic is not actually implemented as intended, only the regex > global_filter is part of lvmetad_filter. So I am pretty sure that this is > the actual problem here. Hmm, I'm a bit confused now - shouldn't "global_filter" be just the actual setting provided by users via "devices/global_filter" lvm.conf setting? (...to filter out devices the user actually knows are not suitable - e.g. if he knows he's doing copies where duplicate VG names/UUIDs can appear). How does the component filter translates into "global_filter"? Does that mean that the global_filter used in the code is actually the devices/global_filter settting (the part that user provides) + outcome of component filtering (the part that LVM does automatically)?
I thought there was a clear distinction between filters and this sequence was always followed: "devices/global_filter" for pvscan --cache to fill lvmetad with info "devices/global_filter -> devices/filter -> (any other component filters - md, mpath, part table)" for any LVM command using lvmetad to filter out raw info/list provided by lvmetad
Created attachment 935795 [details] proposed patch Further testing confirms that this is a filtering problem. I am waiting for some test results before checking in a fix, a tentative patch is attached.
Yes, but the component filters should be part of the global filter, not part of the client-side filter, because they precisely exist to avoid duplicated UUIDs. The invariant for lvmetad is that pvscan --cache should only report one device per PV UUID. What it did up until now was wrong: pvscan --cache would happily report that /dev/sda was PV with UUID foo, even though /dev/sda is part of /dev/md0. At that point, lvmetad believes that PV foo lives on /dev/sda. If another pvscan --cache comes around later, looking at /dev/md0 and tells lvmetad, hey, PV foo lives on /dev/md0, lvmetad updates its data about PV foo. All is well at that point, but only because the scan for /dev/md0 came in last. So the correct (patched) sequence is: lvmetad_filter = sysfs -> devices/global_filter -> device type filter -> mpath component -> partitioned filter -> md component client = lvmetad_filter -> devices/filter -> persistent where persistent is a bit messy, as when lvmetad is in use, it is not persistent in any way (/etc/lvm/.cache is ignored), but it checks for suspended devices and stuff like that... Sow that I think of it though, I now also have to include values of the sysfs/mpath/md filter options into the lvmetad token.
(Just to clarify one thing about the persistent filter: it sits “on top” of lvmetad_filter + devices/filter, not “behind” -- the magic checking it does happens before the other filters run on a device, so the protection from reading suspended devices should still work, as it won't ask the MD component filter about a suspended device at all.)
TODO: We want to add a warning about using devices/filter in lvm.conf, for people that migrated from older versions and may be affected by changes in filter semantics -- both this one and for the case they enable lvmetad (or their distro does it for them). We should figure out a good way to trigger that warning, without being overly annoying. Candidates are: 1) warn when lvm.conf has filter set, but not global_filter *and* --config is not in use 2) release notes 3) a script running at upgrade time (would be distro specific) 4) magic upgrade detection (something like, lvm.conf looks like our template but it's missing newer bits in it; probably too complicated and fragile)
Now, if we're calling "pvscan --cache <device>" that should scan just the one device given on cmd line, does it execute the filtering pipe too? (the one as mentioned in commnent #8: lvmetad_filter = sysfs -> devices/global_filter -> device type filter -> mpath component -> partitioned filter -> md component). Thing is, that the original design was (at least I was under that impression) that lvmetad caches everything and it's only client that does all the filtering. This really needs to be documented better! BTW, if we're touching this area at the moment, we should also probably look at bz #1095191 where the global_filter is ignored sometimes for some reason - it happens quite often on my machine - though if I add verbose output, the bug is gone - so it must be some race. Any idea what could it be caused by? There's also another related bz #989548 that's worth looking at now, maybe also bug #1106425 that falls into similar category.
That depends on what you mean by filtering. Most of the confusion comes from the fact that filters serve multiple purposes. It has always been the case, as I have repeatedly said both here and elsewhere, that as far as lvmetad is concerned, UUIDs are *unique* -- it says so right in the name, Universally Unique IDentifier, afterall. So if you have multiple devices that all exhibit the same PV label (and thus, PV UUID), what you are claiming is that it *doesn't matter which device you use to access the PV*. If this is not the case (i.e. MD component devices), you must somehow pick the right device. The various global filters serve this purpose. In other words, global filters decide whether a thing is a PV at all. On the other hand, client-side filters decide which PVs you want to work with (or don't want to work with). So /dev/sda is *not* a PV if it is part of /dev/md0 which is a PV itself, whether you see a label on it or no. If you tell lvmetad that such a non-PV is a in fact a PV, lvmetad is happy to oblige, but the result is not what you want. So again, this “original design” you mention is filtering of PVs in the cases where filtering means “I only want to work with those 3 PVs right now, even though there's another 2000 in the system”. This is normally achieved with --config, or maybe config file cascading. This is because lvmetad needs a global view of the system, i.e. see all PVs. However, if the question is “is this thing even a PV”, that needs to be decided before lvmetad. So pvscan --cache is the correct place to do that.
(In reply to Petr Rockai from comment #12) > In other words, global filters decide whether a thing is a PV at all. On the > other hand, client-side filters decide which PVs you want to work with (or > don't want to work with). So /dev/sda is *not* a PV if it is part of > /dev/md0 which is a PV itself, whether you see a label on it or no. If you > tell lvmetad that such a non-PV is a in fact a PV, lvmetad is happy to > oblige, but the result is not what you want. Sure, I understand the reasoning behind here. Thing is that even I thought that it was only devices/global_filter that was applied for the PV list passed to lvmetad. IOW, it's not clear from the documentation (I mean the comments in the lvm.conf which is currently the only documention on this filtering matter at the moment) that there's also any other filtering applied. So we definitely need to add a few more words on this subject. > > So again, this “original design” you mention is filtering of PVs in the > cases where filtering means “I only want to work with those 3 PVs right now, > even though there's another 2000 in the system”. This is normally achieved > with --config, or maybe config file cascading. (---config was not even documented until I added a few words on it not long time ago, anyway... :) ) > This is because lvmetad needs > a global view of the system, i.e. see all PVs. However, if the question is > “is this thing even a PV”, that needs to be decided before lvmetad. So > pvscan --cache is the correct place to do that. Well, OK then, I'm not against this filtering scheme. The only thing is that I always thought that lvmetad was a pure cache only - that it just replaces direct scan on devices. And the logic to filter out proper devs is always in tools - that's what we claimed all the time (at least that was my impression). And the documentation doesn't make it clear at all here. So we need to improve it... (My suggestion here would be to use nice simple ascii art to document this - with nice view of the filtering pipe, how it goes to lvmetad and then back to lvm tools. One picture will replace tons of doc words.)
(In reply to Petr Rockai from comment #10) > TODO: We want to add a warning about using devices/filter in lvm.conf, for > people that migrated from older versions and may be affected by changes in > filter semantics -- both this one and for the case they enable lvmetad (or > their distro does it for them). We should figure out a good way to trigger > that warning, without being overly annoying. Candidates are: 1) warn when > lvm.conf has filter set, but not global_filter *and* --config is not in use This seems to be the best probably. To remove the warning, people would need to state "global_filter = []" or so ("blank" global_filter) in case they really mean this. > 2) release notes This works good for RHEL (or any enterprise-level distribution), but not for Fedora (or any other desktop/home-use distribution). 3) a script running at upgrade time (would be distro > specific) Other distros may forget to handle this properly. 4) magic upgrade detection (something like, lvm.conf looks like > our template but it's missing newer bits in it; probably too complicated and > fragile) We can already do this by making use of "lvm dumpconfig --type missing", but what would we do with that information anyway? Add warning on package upgrade - if there are 1000 upgrades at a time, people won't read that. So it seems 1) is really the best here.
Yes, lvmetad *is* a pure cache. Filtering happens in tools, but on both sides of lvmetad: global filters before (in pvscan), local filters after (in everything else). It's precisely because lvmetad does not have any logic to decide which devices are better PV candidates than others that we need the before-lvmetad filters (which live in pvscan, i.e. the tool that feeds data into lvmetad). If we decided to make lvmetad smarter (smarter than a dumb cache daemon, that is), then we would have the option to avoid global filters altogether. We could also allow UUID non-uniqueness at the level of lvmetad and sort out which candidate device is the right one at client side post-lvmetad. That has a substantial disadvantage of having to open and read from the devices, defeating the no-disk-access workflow currently possible with lvmetad for many tools.
OK, so let's document this - I'd go with a paragraph in man lvm/lvm.conf to describe this relation with lvmetad better. I can try to write up something... As with current comments/doc lines as they are, it's a bit confusing.
I have updated the comments in example.conf. Along with the other filter patches, this should be now fixed. (We now also have a related BZ, see bug 1147217; should be fixed by the same patches).