Bug 1600699

Summary: gi.overrides.BlockDev.LVMError: Waiting for 'VgCreate' method of the '/com/redhat/lvmdbus1/Manager' object to finish failed: Failed to get Complete property of the / object: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: Method "Get" with ...
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: agk, amulhern, anaconda-maint-list, anprice, blivet-maint-list, bmarzins, bmr, cfeist, heinzm, jkonecny, jonathan, kellin, kzak, lvm-team, mcsontos, mkolman, msnitzer, prajnoha, prockai, rvykydal, sbueno, vanmeeuwen+fedora, vponcova, vtrefny, wwoods, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:78853eda99d9d29c1bb32c355a8d5d8d17e9cae2edc4ffc17db2ea882b3ea55f;
Fixed In Version: lvm2-2.02.180-1.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1608263 (view as bug list) Environment:
Last Closed: 2018-07-25 08:31:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: anaconda-tb
none
File: anaconda.log
none
File: dbus.log
none
File: dnf.librepo.log
none
File: environ
none
File: hawkey.log
none
File: lorax-packages.log
none
File: lsblk_output
none
File: lvm.log
none
File: nmcli_dev_list
none
File: os_info
none
File: program.log
none
File: storage.log
none
File: syslog
none
File: ifcfg.log
none
File: packaging.log none

Description Adam Williamson 2018-07-12 19:46:03 UTC
Description of problem:
Just hit this one (which openQA runs into quite often) in manual testing, just running an all-defaults install from a recent Rawhide netinst image.

Version-Release number of selected component:
anaconda-29.19

The following was filed automatically by anaconda:
anaconda 29.19 exception report
Traceback (most recent call first):
  File "/usr/lib64/python3.7/site-packages/gi/overrides/BlockDev.py", line 963, in wrapped
    raise transform[1](msg)
  File "/usr/lib/python3.7/site-packages/blivet/devices/lvm.py", line 245, in _create
    blockdev.lvm.vgcreate(self.name, pv_list, self.pe_size)
  File "/usr/lib/python3.7/site-packages/blivet/threads.py", line 53, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/blivet/devices/storage.py", line 457, in create
    self._create()
  File "/usr/lib/python3.7/site-packages/blivet/threads.py", line 53, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/blivet/deviceaction.py", line 327, in execute
    self.device.create()
  File "/usr/lib/python3.7/site-packages/blivet/threads.py", line 53, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/blivet/actionlist.py", line 327, in process
    action.execute(callbacks)
  File "/usr/lib/python3.7/site-packages/blivet/actionlist.py", line 48, in wrapped_func
    return func(obj, *args, **kwargs)
  File "/usr/lib/python3.7/site-packages/blivet/blivet.py", line 135, in do_it
    self.devicetree.actions.process(callbacks=callbacks, devices=self.devices)
  File "/usr/lib/python3.7/site-packages/blivet/threads.py", line 53, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib64/python3.7/site-packages/pyanaconda/storage/osinstall.py", line 1223, in do_it
    super().do_it(callbacks=callbacks)
  File "/usr/lib/python3.7/site-packages/blivet/threads.py", line 53, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib64/python3.7/site-packages/pyanaconda/storage/osinstall.py", line 2144, in turn_on_filesystems
    storage.do_it(callbacks)
  File "/usr/lib64/python3.7/site-packages/pyanaconda/installation_tasks.py", line 438, in run_task
    self._task(*self._task_args, **self._task_kwargs)
  File "/usr/lib64/python3.7/site-packages/pyanaconda/installation_tasks.py", line 472, in start
    self.run_task()
  File "/usr/lib64/python3.7/site-packages/pyanaconda/installation_tasks.py", line 304, in start
    item.start()
  File "/usr/lib64/python3.7/site-packages/pyanaconda/installation_tasks.py", line 304, in start
    item.start()
  File "/usr/lib64/python3.7/site-packages/pyanaconda/installation.py", line 372, in doInstall
    installation_queue.start()
  File "/usr/lib64/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.7/site-packages/pyanaconda/threading.py", line 286, in run
    threading.Thread.run(self)
gi.overrides.BlockDev.LVMError: Waiting for 'VgCreate' method of the '/com/redhat/lvmdbus1/Manager' object to finish failed: Failed to get Complete property of the / object: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: Method "Get" with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't exist


Additional info:
addons:         com_redhat_docker, com_redhat_kdump
blivet-gui-utils.log: 
cmdline:        /usr/bin/python3  /sbin/anaconda
cmdline_file:   BOOT_IMAGE=vmlinuz initrd=initrd.img inst.stage2=hd:LABEL=Fedora-E-dvd-x86_64-rawh quiet
executable:     /sbin/anaconda
hashmarkername: anaconda
kernel:         4.18.0-0.rc4.git1.1.fc29.x86_64
product:        Fedora
release:        Cannot get release name.
type:           anaconda
version:        rawhide

Comment 1 Adam Williamson 2018-07-12 19:46:09 UTC
Created attachment 1458525 [details]
File: anaconda-tb

Comment 2 Adam Williamson 2018-07-12 19:46:10 UTC
Created attachment 1458526 [details]
File: anaconda.log

Comment 3 Adam Williamson 2018-07-12 19:46:11 UTC
Created attachment 1458527 [details]
File: dbus.log

Comment 4 Adam Williamson 2018-07-12 19:46:12 UTC
Created attachment 1458528 [details]
File: dnf.librepo.log

Comment 5 Adam Williamson 2018-07-12 19:46:13 UTC
Created attachment 1458529 [details]
File: environ

Comment 6 Adam Williamson 2018-07-12 19:46:15 UTC
Created attachment 1458530 [details]
File: hawkey.log

Comment 7 Adam Williamson 2018-07-12 19:46:16 UTC
Created attachment 1458531 [details]
File: lorax-packages.log

Comment 8 Adam Williamson 2018-07-12 19:46:17 UTC
Created attachment 1458532 [details]
File: lsblk_output

Comment 9 Adam Williamson 2018-07-12 19:46:19 UTC
Created attachment 1458533 [details]
File: lvm.log

Comment 10 Adam Williamson 2018-07-12 19:46:20 UTC
Created attachment 1458534 [details]
File: nmcli_dev_list

Comment 11 Adam Williamson 2018-07-12 19:46:21 UTC
Created attachment 1458535 [details]
File: os_info

Comment 12 Adam Williamson 2018-07-12 19:46:22 UTC
Created attachment 1458536 [details]
File: program.log

Comment 13 Adam Williamson 2018-07-12 19:46:24 UTC
Created attachment 1458537 [details]
File: storage.log

Comment 14 Adam Williamson 2018-07-12 19:46:25 UTC
Created attachment 1458538 [details]
File: syslog

Comment 15 Adam Williamson 2018-07-12 19:46:26 UTC
Created attachment 1458539 [details]
File: ifcfg.log

Comment 16 Adam Williamson 2018-07-12 19:46:27 UTC
Created attachment 1458540 [details]
File: packaging.log

Comment 17 Jiri Konecny 2018-07-16 16:09:34 UTC
This is happening in our storage library. Switching components.

Comment 22 Vojtech Trefny 2018-07-17 11:52:40 UTC
Workaround for this is to force blivet to use the "old" libblockdev LVM plugin that doesn't use the DBus API. It is a simple patch (https://github.com/vojtechtrefny/blivet/commit/2f90040ff66eacc9715e370cd49ffb72d8d1f36f), I'll do a new blivet build with this for rawhide today.

Comment 23 Vojtech Trefny 2018-07-17 12:38:40 UTC
python-blivet-3.1.0-0.4.b1.fc29 should fix this, but original problem is in lvmdbus -- changing component to lvm2

Comment 24 Marian Csontos 2018-07-19 17:17:02 UTC
Martin, could you please try to run anaconda testsuite with 2.02.180 being built soon landing in rawhide repo?

Comment 25 Martin Kolman 2018-07-19 17:25:00 UTC
(In reply to Marian Csontos from comment #24)
> Martin, could you please try to run anaconda testsuite with 2.02.180 being
> built soon landing in rawhide repo?

Sure, I'll try to do that tomorrow - but I guess I'll have to coordinate with Vojta to use a custom Blivet package with the workaround reverted so that the LVM-DBUS plugin is used again.

Comment 26 Martin Kolman 2018-07-20 15:59:08 UTC
Looks like the race condition is still present in some form even with the new LVM package:

GLib.GError: g-io-error-quark: Failed to call the 'PvCreate' method on the '/com/redhat/lvmdbus1/Manager' object: Timeout was reached (24)

There might be a difference though - before I was getting also LvCreate & VgCreate issues, but now I see consistently only PvCreate. I'll ask Vojta on Monday to take a look at the detailed test results & to check the test setup was valid (eq. correct versions of LVM & Blivet were used).

Comment 27 Marian Csontos 2018-07-23 12:34:17 UTC
To me this looks like different issue. In the first issue the return code was a mess. This looks just like an ordinary error.

Comment 28 Martin Kolman 2018-07-24 10:51:05 UTC
After looking at the results with vtrefny, it looks like the new LVM version was indeed correctly used. Also all the failures now consistently look like this:

15:08:30,560 INFO anaconda:blivet: executing action: [136] create format lvmpv on mdarray pv.01 (id 131)
15:08:30,571 INFO anaconda:anaconda: progress: Creating lvmpv on /dev/md/pv.01
15:08:30,623 DEBUG anaconda:blivet:                     MDRaidArrayDevice.setup: pv.01 ; orig: False ; status: True ; controllable: True ;
15:08:30,669 DEBUG anaconda:blivet:                     LVMPhysicalVolume.create: device: /dev/md/pv.01 ; type: lvmpv ; status: False ;
15:08:30,728 DEBUG anaconda:blivet:                       LVMPhysicalVolume._create: device: /dev/md/pv.01 ; type: lvmpv ; status: False ;
15:08:30,738 INFO anaconda:program: [26] Calling the 'com.redhat.lvmdbus1.Manager.PvCreate' method on the '/com/redhat/lvmdbus1/Manager' object with the following parameters: '('/dev/md/pv.01', 1, {'--config': <' devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } log {level=7 file=/tmp/lvm.log syslog=0}'>})'
15:08:30,751 INFO dbus-daemon:[system] Activating via systemd: service name='com.redhat.lvmdbus1' unit='lvm2-lvmdbusd.service' requested by ':1.8' (uid=0 pid=2971 comm="/usr/bin/python3 /sbin/anaconda " label="system_u:system_r:initrc_t:s0")
15:08:30,900 INFO systemd:Starting LVM2 D-Bus service...
15:08:34,021 INFO dbus-daemon:[system] Successfully activated service 'com.redhat.lvmdbus1'
15:08:34,031 NOTICE audit:SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=lvm2-lvmdbusd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
15:08:34,032 INFO systemd:Started LVM2 D-Bus service.
15:08:35,760 INFO anaconda:program: [26] Done.
15:08:35,766 INFO anaconda:program: [26] Got error: Failed to call the 'PvCreate' method on the '/com/redhat/lvmdbus1/Manager' object: Timeout was reached


Basically:
- PV creation has been requested
- the operation times out after 5 seconds

This looks like a bug as according to vtrefny the LVM DBUS API should either finish in <1 second or return a job object if the operation takes >1 seconds.

Comment 29 mulhern 2018-07-24 13:20:27 UTC
Maybe:
  1. dbus call is made and blivet/libblockdev D-Bus timeout clock starts ticking
  2. plenty of time passes while lvm D-Bus service gets started (approx 4 seconds)
  2.5 There occurs a brief hiatus while LVM D-Bus service is getting its act together and before it actually receives the message
  3. LVM D-Bus service gets a D-Bus message. Its 1 second clock starts ticking, but before that limit can be reached
  4. blivet/libblockdev reaches its five second timeout
?

Comment 30 Vojtech Trefny 2018-07-25 08:18:15 UTC
I didn't notice it takes 4 seconds to start the lvmdbusd service (and we have a different bug for Fedora and it takes 8 seconds there). So this is definitely a different issue and we can close this one. I'll create a separate issue with this for libblockdev.

Comment 31 Marian Csontos 2018-07-25 08:30:09 UTC
I was going to write the same. Check the load/free memory on the test system.

I checked the code (not that I fully understand all the magic), and this really should not happen, but!

Call, once in lvmdbusd, should really return result or job in 1 second, but! There is no guarantee the message will be delivered immediately, especially when either system or D-Bus is under high load.

Is not the system somewhat unstable or overloaded? (swapping?) It takes over 3 seconds to start lvmdbusd, so it may be possible the lvmdbusd just does not get enough time to process anything.