Bug 1334063 - lvm2 stumbles on itself during an update
Summary: lvm2 stumbles on itself during an update
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: David Teigland
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1335561 (view as bug list)
Depends On:
Blocks: 1340146
TreeView+ depends on / blocked
 
Reported: 2016-05-07 20:58 UTC by Michal Jaegermann
Modified: 2016-06-06 10:14 UTC (History)
14 users (show)

Fixed In Version: lvm2-2.02.155-1.fc25
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1340146 (view as bug list)
Environment:
Last Closed: 2016-06-06 10:14:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
lvm2app test code (509 bytes, text/plain)
2016-05-17 07:18 UTC, Peter Rajnoha
no flags Details

Description Michal Jaegermann 2016-05-07 20:58:56 UTC
Description of problem:

While running updates I see the following message:

  Upgrading   : lvm2-2.02.152-1.fc25.x86_64                             xxx/yyy
Failed to execute operation: Connection timed out
Failed to start lvm2-lvmetad.socket: Connection timed out
See system logs and 'systemctl status lvm2-lvmetad.socket' for details.
Failed to execute operation: Connection timed out


So far so good; only following the advice give I see the following from 'systemctl -l status lvm2-lvmetad.socket':

● lvm2-lvmetad.socket - LVM2 metadata daemon socket
   Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.socket; enabled; vendor 
   Active: active (running) since Sat 2016-05-07 11:21:42 MDT; 2h 3min ago
     Docs: man:lvmetad(8)
   Listen: /run/lvm/lvmetad.socket (Stream)

In other words lvm2-lvmetad.socket was definitely running in times of updates.  What gives?

Version-Release number of selected component (if applicable):
lvm2-2.02.152-1.fc25

Comment 1 Yanko Kaneti 2016-05-14 12:08:14 UTC
Something to do with systemd reloading its config. Here it happens during:
 /usr/bin/systemctl daemon-reload     as called by trhe rpm scriptlets

This ps excerpt seems to show what gets stuck

24269 ?        S      0:00 /usr/lib/systemd/systemd --switched-root --system --deserialize 23
24270 ?        Z      0:00  \_ [vsftpd-generato] <defunct>
24273 ?        S      0:00  \_ /usr/lib/systemd/system-generators/lvm2-activation-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/generator.late
24277 ?        Z      0:00  \_ [systemd-debug-g] <defunct>
24280 ?        Z      0:00  \_ [systemd-gpt-aut] <defunct>
24281 ?        Z      0:00  \_ [systemd-fstab-g] <defunct>
24282 ?        Z      0:00  \_ [systemd-cryptse] <defunct>

Comment 2 Hussam Al-Tayeb 2016-05-16 08:49:52 UTC
*** Bug 1335561 has been marked as a duplicate of this bug. ***

Comment 3 Peter Rajnoha 2016-05-16 09:09:39 UTC
Does it hang if you run "/usr/lib/systemd/system-generators/lvm2-activation-generator /run/systemd/generator /run/systemd/generator.early /run/systemd" manually on command line? If yes, can you please try running it through strace like this:

  strace /usr/lib/systemd/system-generators/lvm2-activation-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/generator.late

Thanks.

Comment 4 Michal Jaegermann 2016-05-16 15:43:09 UTC
(In reply to Peter Rajnoha from comment #3)
> Does it hang if you run
> "/usr/lib/systemd/system-generators/lvm2-activation-generator
> /run/systemd/generator /run/systemd/generator.early /run/systemd" manually
> on command line?

My guess would be that you meant "/usr/lib/systemd/system-generators/lvm2-activation-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/generator.late" both with strace and without. Regardless, when tried :commands return immediately (time reported "real	0m0.013s").

'systemctl daemon-reload' does not hang either; at least when run from a command line.

Comment 5 Yanko Kaneti 2016-05-16 15:58:37 UTC
Here the lvm2-activation-generator gets stuck wating for the lvmetad daemon (and timeouts) on systemctl daemon-reload  but only if the lvmetad daemon is not running at the time of the daemon-reload. 
You can try reproducing by systemctl stop lvm2-lvmetad.service  followed by systemctl daemon-reload.  

Since the lvmetad service daemon has -t 3600 on its command line I assume its expected for it to exit after one hour. 
One more oddity on my system is the service status:   
Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled; vendor preset: enabled)
Why the preset is enabled if the socket is enabled too.. ? I've probably disabled the service at some point but generally don't see a reason for it to be enabled all the time.

Comment 6 Michal Jaegermann 2016-05-16 16:39:40 UTC
(In reply to Yanko Kaneti from comment #5)

I just run updates which included lvm2-2.02.153-2.fc25.x86_64 and that went through without any delays and/or "Connection timed out" messages.  It looks like lvm2-lvmetad.service was running.

> Here the lvm2-activation-generator gets stuck wating for the lvmetad daemon
.....
> You can try reproducing by systemctl stop lvm2-lvmetad.service

That replies with:
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
  lvm2-lvmetad.socket

> followed by systemctl daemon-reload.

That indeed hangs for a while before printing "Failed to execute operation: Connection timed out" but not for an hour; time gives "real	0m25.085s" before that returns.  As a side effect lvm2-lvmetad.service gets reactivated so the next invocation of daemon-reload does not hang.
 
> One more oddity on my system is the service status:   
> Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled;
> vendor preset: enabled)

Actually "Loaded:" line on my system looks exactly the same but lvm2-lvmetad.service is "active (running)" anyway.  Something seems to be starting it although I did not try to figure out exactly what.  Well, probably enabled by default lvm2-lvmetad.socket but the original reported failure was "Failed to start lvm2-lvmetad.socket: Connection timed out" for some reasons unclear to me.

Comment 7 Peter Rajnoha 2016-05-17 04:35:06 UTC
(In reply to Yanko Kaneti from comment #5)
> Here the lvm2-activation-generator gets stuck wating for the lvmetad daemon
> (and timeouts) on systemctl daemon-reload  but only if the lvmetad daemon is
> not running at the time of the daemon-reload. 
> You can try reproducing by systemctl stop lvm2-lvmetad.service  followed by
> systemctl daemon-reload.  
> 

Yes, I reproduced, thanks!

> Since the lvmetad service daemon has -t 3600 on its command line I assume
> its expected for it to exit after one hour. 
> One more oddity on my system is the service status:   
> Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled;
> vendor preset: enabled)

That's perfectly OK as lvm2-lvmetad.service is supposed to be socket activated. As such, only lvm2-lvmetad.socket is enabled/running and then lvm2-lvmetad.service gets instantiated automatically by first access to that socket. If there's no request on that socket for an hour, lvmetad shuts down itself and if there's access to the socket again later on, it gets instantiated on demand again...

> Why the preset is enabled if the socket is enabled too.. ? I've probably
> disabled the service at some point but generally don't see a reason for it
> to be enabled all the time.

It's actually because the /usr/lib/systemd/system-preset/90-default.preset has
"enable lvm2-lvmetad.*". Indeed, it should be just lvm2-lvmetad.socket. But it's not such an issue as rpm scriptlet calls "systemctl preset lvm2-lvmetad.socket" anyway (of course, if someone calls systemctl present lvm2-lvmetad.service, it gets enabled based on that rule).

Comment 8 Peter Rajnoha 2016-05-17 07:16:12 UTC
This is regression caused by recent changes in the way lvmetad connections is initialized. The lvm2-activation-generator uses lvm2app to read /etc/lvm/lvm.conf file.

Before these changes, the _init_lvmetad that is called within create_toolcontext didn't include lvmetad_connect call - that was delayed up until it was needed:


Test run with lvm2app test code, lvm2 v2.02.150:

(gdb) bt
#0  lvmetad_open (socket=0x7ffff7b668f3 "/run/lvm/lvmetad.socket") at ../include/lvmetad-client.h:67
#1  0x00007ffff7b4ab24 in _lvmetad_connect () at cache/lvmetad.c:129
#2  0x00007ffff7b4abbf in lvmetad_connect_or_warn () at cache/lvmetad.c:143
#3  0x00007ffff7b4accf in lvmetad_active () at cache/lvmetad.c:168
#4  0x00007ffff7b00cec in _vg_read (cmd=0x602030, vgname=0x7fffffffe5ff "vg", vgid=0x0, warn_flags=1, consistent=0x7fffffffe128, precommitted=0) at metadata/metadata.c:4117
#5  0x00007ffff7b02990 in vg_read_internal (cmd=0x602030, vgname=0x7fffffffe5ff "vg", vgid=0x0, warn_flags=1, consistent=0x7fffffffe128) at metadata/metadata.c:4719
#6  0x00007ffff7b04ee1 in _vg_lock_and_read (cmd=0x602030, vg_name=0x7fffffffe5ff "vg", vgid=0x0, lock_flags=33, status_flags=0, read_flags=0, lockd_state=0) at metadata/metadata.c:5743
#7  0x00007ffff7b052bc in vg_read (cmd=0x602030, vg_name=0x7fffffffe5ff "vg", vgid=0x0, read_flags=0, lockd_state=0) at metadata/metadata.c:5846
#8  0x00007ffff7a9047c in _lvm_vg_open (libh=0x602030, vgname=0x7fffffffe5ff "vg", mode=0x4009e7 "r", flags=0) at lvm_vg.c:221
#9  0x00007ffff7a904fa in lvm_vg_open (libh=0x602030, vgname=0x7fffffffe5ff "vg", mode=0x4009e7 "r", flags=0) at lvm_vg.c:238
#10 0x00000000004008d0 in main (argc=2, argv=0x7fffffffe348) at a.c:22

Test run with lvm2app test code, lvm2 v2.02.151 and higher (this is with current upstream HEAD, that is ba9b7b69d9200d9d977089a0b813f508961fcbee):

(gdb) bt
#0  lvmetad_open (socket=0x7ffff7b6556b "/run/lvm/lvmetad.socket") at ../include/lvmetad-client.h:72
#1  0x00007ffff7b48295 in lvmetad_connect (cmd=0x602030) at cache/lvmetad.c:137
#2  0x00007ffff7a9e994 in _init_lvmetad (cmd=0x602030) at commands/toolcontext.c:1730
#3  0x00007ffff7a9ea7f in init_connections (cmd=0x602030) at commands/toolcontext.c:1762
#4  0x00007ffff7a9f29f in create_toolcontext (is_long_lived=0, system_dir=0x0, set_buffering=0, threaded=0, set_connections=1, set_filters=1) at commands/toolcontext.c:1947
#5  0x00007ffff7a884eb in _lvm_init (system_dir=0x0) at lvm_base.c:48
#6  0x00007ffff7a88580 in lvm_init (system_dir=0x0) at lvm_base.c:88
#7  0x0000000000400884 in main (argc=2, argv=0x7fffffffe348) at a.c:17


So now the lvmetad_connect is part of create_toolcontext which is part of lvm2app library initialization too. That's why we have lvm2-activation-generator trying to access lvmetad even though it's not needed.

Actual backtrace from lvm2-activation-generator using lvm2app with recent code and changed (since v2.02.151):

(gdb) bt
#0  0x00007f012a98d6f0 in __read_nocancel () from /lib64/libc.so.6
#1  0x00007f012afb67e4 in buffer_read (fd=4, buffer=0x7ffeb2a9ae68) at daemon-io.c:38
#2  0x00007f012afb6061 in daemon_send (h=..., rq=...) at daemon-client.c:122
#3  0x00007f012afb61da in daemon_send_simple_v (h=..., id=0x7f012aff0a72 "hello", ap=0x7ffeb2a9af98) at daemon-client.c:158
#4  0x00007f012afb62c6 in daemon_send_simple (h=..., id=0x7f012aff0a72 "hello") at daemon-client.c:170
#5  0x00007f012afb59da in daemon_open (i=...) at daemon-client.c:53
#6  0x00007f012af9cd4b in lvmetad_open (socket=0x7f012afba56b "/run/lvm/lvmetad.socket") at ../include/lvmetad-client.h:80
#7  0x00007f012af9d295 in lvmetad_connect (cmd=0x55e8e14b5030) at cache/lvmetad.c:137
#8  0x00007f012aef3994 in _init_lvmetad (cmd=0x55e8e14b5030) at commands/toolcontext.c:1730
#9  0x00007f012aef3a7f in init_connections (cmd=0x55e8e14b5030) at commands/toolcontext.c:1762
#10 0x00007f012aef429f in create_toolcontext (is_long_lived=0, system_dir=0x0, set_buffering=0, threaded=0, set_connections=1, set_filters=1) at commands/toolcontext.c:1947
#11 0x00007f012aedd4eb in _lvm_init (system_dir=0x0) at lvm_base.c:48
#12 0x00007f012aedd580 in lvm_init (system_dir=0x0) at lvm_base.c:88
#13 0x000055e8e0f21eca in lvm_get_use_lvmetad_and_lvmpolld (use_lvmetad=0x7ffeb2a9b384, use_lvmpolld=0x7ffeb2a9b380) at lvm2_activation_generator_systemd_red_hat.c:75
#14 0x000055e8e0f22341 in main (argc=4, argv=0x7ffeb2a9b488) at lvm2_activation_generator_systemd_red_hat.c:186

This causes a problem because at the time of systemd reload, the lvmetad service (and hence lvmetad daemon) can't be instantiated based on socket access because systemd is locked at that time due to the reload (systemd generators are reevaluated on systemd reload).

So we either need to move the actual lvmetad connection till it's needed as it was before or we need to have a way to force use_lvmetad=0 for lvm2app - currently we can't override config without having library handler initialized first (...and library handler intialization causes lvmetad connection to become active - the library handler is actually the cmd_context structure coming from create_toolcontext).

Comment 9 Peter Rajnoha 2016-05-17 07:18:49 UTC
Created attachment 1158195 [details]
lvm2app test code

lvm2app test code used in comment #8.

Comment 10 David Teigland 2016-05-17 17:25:32 UTC
Here's one approach that should solve the problem:
https://git.fedorahosted.org/cgit/lvm2.git/log/?h=dev-dct-config-context

It removes the need for a full lvm context just to read a config setting. The liblvm2app function to check a config setting only reads lvm.conf, and doesn't do any other lvm command setup.  (We do a similar thing for the 'lvm config' command which skips some command setup steps that are not used.)

It would be nice if lvm2-activation-generator could avoid using liblvm2app altogether, but Peter should decide which approach to take.

Comment 11 Zdenek Kabelac 2016-05-17 18:33:27 UTC
I'd say I'm a bit scarred we rely on lvm2app in some 'real' tool.

lvm2app is missing a lot of initialization from lvm.conf compared with standard lvm command.

So the code should really be well audited whether all the used variables are really using proper lvm.conf settings and eventually fix the missing init part
(as lots of them are rather 'compile-time' presets in case of lvm2app)

Comment 12 David Teigland 2016-05-17 19:00:12 UTC
While auditing liblvm2app users, we should take a little extra time to plan a migration path away from using liblvm2app altogether.  One option is to switch to the dbus API (e.g. yum should probably do this).  In some cases, like lvm2-activation-generator, lvm isn't actually needed at all, but rather a simple way to just read/query lvm.conf fields (that's mostly libdevmapper anyway.)

Comment 13 Peter Rajnoha 2016-05-18 06:41:11 UTC
(In reply to David Teigland from comment #12)
> In some cases,
> like lvm2-activation-generator, lvm isn't actually needed at all, but rather
> a simple way to just read/query lvm.conf fields (that's mostly libdevmapper
> anyway.)

...the most important thing here is that we read the configuration properly, that means the whole config chain - we have several sources of config which we merge together to give the overall result, like tag configs etc. Unfortunately, this is at LVM level, not libdevmapper - so this LVM config reading part needs to be extracted and it needs to be a part of another (better) library then, like liblvm2cmd(?).

Comment 14 Peter Rajnoha 2016-05-18 06:43:44 UTC
(In reply to David Teigland from comment #10)
> Here's one approach that should solve the problem:
> https://git.fedorahosted.org/cgit/lvm2.git/log/?h=dev-dct-config-context
> 
> It removes the need for a full lvm context just to read a config setting.
> The liblvm2app function to check a config setting only reads lvm.conf, and
> doesn't do any other lvm command setup.  (We do a similar thing for the 'lvm
> config' command which skips some command setup steps that are not used.)
> 
> It would be nice if lvm2-activation-generator could avoid using liblvm2app
> altogether, but Peter should decide which approach to take.

Yeah, that patch you did looks feasible and quick - so I'm for that one. I haven't tried that one though, but looks pretty straightforward to me at first glance...

Comment 15 Peter Rajnoha 2016-05-18 06:45:48 UTC
(The reason lvm2app was used was for the activation generator to be a simple lightweight program, just using lvm2app that had the interface to get the information needed quickly. And at that time, it worked so there was no reason to not use it...)


Note You need to log in before you can comment on or make changes to this bug.