Bug 1624578 - Gluster node reboot fails after gdeploy configuration if brick filesystems have VDO beneath
Summary: Gluster node reboot fails after gdeploy configuration if brick filesystems ha...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gdeploy
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Sahina Bose
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-01 18:44 UTC by Giuseppe Ragusa
Modified: 2020-07-24 06:20 UTC (History)
5 users (show)

Fixed In Version: gdeploy-2.0.9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-24 06:19:43 UTC
Embargoed:


Attachments (Terms of Use)
Sample gdeploy configuration file (8.74 KB, text/plain)
2018-09-01 18:44 UTC, Giuseppe Ragusa
no flags Details

Description Giuseppe Ragusa 2018-09-01 18:44:04 UTC
Created attachment 1480237 [details]
Sample gdeploy configuration file

Description of problem:
Gdeploy does not add the required options in /etc/fstab (x-systemd.requires=vdo.service) for the brick filesystems nor does it enable the vdo service in systemd when VDO is being enabled inside gdeploy.conf - tested with LVM thin volumes on top of VDO volumes.

Version-Release number of selected component (if applicable):
gdeploy-2.0.8-1.el7.noarch

How reproducible:
Always - see attached gdeploy.conf

Steps to Reproduce:
1. Create a gdeploy.conf with bricks on VDO-backed LVM volumes
2. Run gdeploy
3. Reboot Gluster nodes

Actual results:
Boot fails (stops early at Ctrl-D / root login to fix things) because brick filesystems cannot be mounted.

Expected results:
Boot succeeds.

Additional info:
Tested on community GlusterFS 3.12.13-1
Not tested without LVM (bricks directly on top of VDO), but could be affected too.
Adding the aforementioned fstab option (x-systemd.requires=vdo.service) on all brick filesystem lines and rebooting makes the problem go away (actually I rebuilt the initramfs too, but should not be needed).

Comment 2 Sachidananda Urs 2018-09-02 14:15:28 UTC
(In reply to Giuseppe Ragusa from comment #0)
> Created attachment 1480237 [details]
> Sample gdeploy configuration file
> 
> Description of problem:
> Gdeploy does not add the required options in /etc/fstab
> (x-systemd.requires=vdo.service) for the brick filesystems nor does it
> enable the vdo service in systemd when VDO is being enabled inside
> gdeploy.conf - tested with LVM thin volumes on top of VDO volumes.
> 
> Version-Release number of selected component (if applicable):
> gdeploy-2.0.8-1.el7.noarch

We recently released gdeploy-2.0.9 couple of days ago to addressing this
issue. The build can be found at:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1140092

Comment 3 Giuseppe Ragusa 2018-09-19 00:07:52 UTC
Hi, please note that, per my further testing on my above proposed workaround, an additional fstab "_netdev" option is needed too to avoid a dependency loop in systemd.

I still didn't have time to check to gdeploy-2.0.9 package, so maybe it already includes that option too.

Comment 4 Sachidananda Urs 2018-09-19 01:09:07 UTC
(In reply to Giuseppe Ragusa from comment #3)
> Hi, please note that, per my further testing on my above proposed
> workaround, an additional fstab "_netdev" option is needed too to avoid a
> dependency loop in systemd.
> 
> I still didn't have time to check to gdeploy-2.0.9 package, so maybe it
> already includes that option too.

We add the opts: inode64,noatime,nodiratime,x-systemd.requires=vdo.service
We have tested with these fstab entries and works fine with reboots.

We haven't added _netdev, can you try with 2.0.9? I'm not sure if _netdev is absolutely necessary to avoid hangs during reboot.

Comment 5 Giuseppe Ragusa 2018-09-19 22:55:35 UTC
I think there's no point in trying since I already had an Ansible task that was editing fstab to add the x-systemd.requires option and it was not enough in my setup (VDO with LVM thin on top - GlusterFS with CTDB/Samba/Gluster-NFS and oVirt on top).

This is the problem I hit (ordering cycle, not dependency cycle, my mistake):

Sep 14 16:29:07 pinkiepie systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
Sep 14 16:29:07 pinkiepie systemd: Found ordering cycle on basic.target/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on sockets.target/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on dbus.socket/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on gluster_bricks-vmstoredomain-brick1.mount/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on vdo.service/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on basic.target/start
Sep 14 16:29:07 pinkiepie systemd: Breaking ordering cycle by deleting job sockets.target/start
Sep 14 16:29:07 pinkiepie systemd: Job sockets.target/start deleted to break ordering cycle starting with basic.target/start
Sep 14 16:29:07 pinkiepie systemd: Found ordering cycle on basic.target/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on gluster_bricks-vmstoredomain-brick1.mount/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on vdo.service/start
Sep 14 16:29:07 pinkiepie systemd: Found dependency on basic.target/start
Sep 14 16:29:07 pinkiepie systemd: Breaking ordering cycle by deleting job local-fs.target/start
Sep 14 16:29:07 pinkiepie systemd: Job local-fs.target/start deleted to break ordering cycle starting with basic.target/start

Each time systemd was deleting some job to break the cycle and something was not working down the line.
Adding "_netdev" has reliably solved the ordering cycle problem so far.

Comment 6 Sachidananda Urs 2018-09-20 02:57:12 UTC
(In reply to Giuseppe Ragusa from comment #5)
> I think there's no point in trying since I already had an Ansible task that
> was editing fstab to add the x-systemd.requires option and it was not enough
> in my setup (VDO with LVM thin on top - GlusterFS with
> CTDB/Samba/Gluster-NFS and oVirt on top).

_netdev option to mount vdo devices is not necessary. _netdev option is used
for network filesytems like NFS, Samba etc... and does not have any use for disk
filesystems. See below:


Source: mount(8) manpage `Filesystem Independent Mount Options' section

_netdev
The filesystem resides on a device that requires network access (used to prevent the system from attempting to mount these filesystems until the network has been enabled on the system).

Comment 7 Giuseppe Ragusa 2018-09-21 00:55:33 UTC
I think you perfectly described the reason why the "_netdev" option breaks the systemd ordering cycle which I was experiencing on basic.target :-)

Now the question is: is this systemd ordering cycle problem (see my /var/log/messages excerpt in comment #5) due to something wrong in my setup or is it a possible condition deriving from legitimate configurations/use_cases on a GlusterFS node?

If the second holds true (possible issue) then there could still be more "proper" ways to obviate it than using "_netdev", of course :-)

Many thanks for your help.

Comment 8 Sachidananda Urs 2018-09-21 03:07:10 UTC
(In reply to Giuseppe Ragusa from comment #7)
> I think you perfectly described the reason why the "_netdev" option breaks
> the systemd ordering cycle which I was experiencing on basic.target :-)
> 
> Now the question is: is this systemd ordering cycle problem (see my
> /var/log/messages excerpt in comment #5) due to something wrong in my setup
> or is it a possible condition deriving from legitimate
> configurations/use_cases on a GlusterFS node?
> 
> If the second holds true (possible issue) then there could still be more
> "proper" ways to obviate it than using "_netdev", of course :-)
> 
> Many thanks for your help.

I will try to reproduce this systemd issue. Meanwhile, could you please try with
2.0.9 and see if it already solves the issue. Please try with a fresh setup.

Comment 9 Giuseppe Ragusa 2018-09-24 23:40:18 UTC
I can confirm that gdeploy 2.0.9 adds the expected x-systemd.requires option in fstab (reinstalled from scratch and relaunched with new gdeploy; please note that the gdeploy 2.0.9 src.rpm linked to in comment #2 above does not rebuild as-is on CentOS7 because of sphinx problems).

I can also confirm that the systemd ordering cycle problem persists:

Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on rhel-autorelabel.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job rhel-autorelabel.service/start
Sep 25 01:31:57 pinkiepie systemd: Job rhel-autorelabel.service/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on systemd-journal-catalog-update.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job systemd-journal-catalog-update.service/start
Sep 25 01:31:57 pinkiepie systemd: Job systemd-journal-catalog-update.service/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on systemd-update-done.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job systemd-update-done.service/start
Sep 25 01:31:57 pinkiepie systemd: Job systemd-update-done.service/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on systemd-machine-id-commit.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job systemd-machine-id-commit.service/start
Sep 25 01:31:57 pinkiepie systemd: Job systemd-machine-id-commit.service/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on auditd.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job auditd.service/start
Sep 25 01:31:57 pinkiepie systemd: Job auditd.service/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on selinux-policy-migrate-local-changes/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job selinux-policy-migrate-local-changes/start
Sep 25 01:31:57 pinkiepie systemd: Job selinux-policy-migrate-local-changes/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on systemd-update-utmp.service/verify-active
Sep 25 01:31:57 pinkiepie systemd: Found dependency on systemd-tmpfiles-setup.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on rhel-import-state.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job systemd-update-utmp.service/verify-active
Sep 25 01:31:57 pinkiepie systemd: Job systemd-update-utmp.service/verify-active deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on systemd-tmpfiles-setup.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on rhel-import-state.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job systemd-tmpfiles-setup.service/start
Sep 25 01:31:57 pinkiepie systemd: Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on rhel-import-state.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job rhel-import-state.service/start
Sep 25 01:31:57 pinkiepie systemd: Job rhel-import-state.service/start deleted to break ordering cycle starting with sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found ordering cycle on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on gluster_bricks-enginedomain-brick1.mount/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on vdo.service/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on basic.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sockets.target/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on virtlogd.socket/start
Sep 25 01:31:57 pinkiepie systemd: Found dependency on sysinit.target/start
Sep 25 01:31:57 pinkiepie systemd: Breaking ordering cycle by deleting job local-fs.target/start
Sep 25 01:31:57 pinkiepie systemd: Job local-fs.target/start deleted to break ordering cycle starting with sysinit.target/start

Comment 10 Sachidananda Urs 2018-09-25 04:26:52 UTC
(In reply to Giuseppe Ragusa from comment #9)
> I can confirm that gdeploy 2.0.9 adds the expected x-systemd.requires option
> in fstab (reinstalled from scratch and relaunched with new gdeploy; please
> note that the gdeploy 2.0.9 src.rpm linked to in comment #2 above does not
> rebuild as-is on CentOS7 because of sphinx problems).
> 
> I can also confirm that the systemd ordering cycle problem persists:

That is strange, it seem to work for RHEL users. I will check what causes
this behaviour on CentOS systems. Thanks for the update.

Comment 11 Giuseppe Ragusa 2018-09-25 20:45:38 UTC
Please note that I'm testing on CentOS 7.5 fully up-to-date, but not on a basic "out-of-the-box" setup.
My tests are part of a wide project to automate the setup of a whole hyperconverged infrastructure, as hinted to in comment #5 above.

I'm happy to provide full details/logs on the configuration (mainly systemd services enabled at boot and their dependencies should be inspected, I suppose).

Comment 12 Sachidananda Urs 2018-09-26 04:11:29 UTC
(In reply to Giuseppe Ragusa from comment #11)
> Please note that I'm testing on CentOS 7.5 fully up-to-date, but not on a
> basic "out-of-the-box" setup.
> My tests are part of a wide project to automate the setup of a whole
> hyperconverged infrastructure, as hinted to in comment #5 above.
> 
> I'm happy to provide full details/logs on the configuration (mainly systemd
> services enabled at boot and their dependencies should be inspected, I
> suppose).

Can you help with a couple of questions:

1. What is the VDO version you are using?
2. How did you install VDO?

The attached gdeploy config file: is it the latest?

I'll try to reproduce this issue.

Comment 13 Giuseppe Ragusa 2018-09-26 07:25:45 UTC
1. Here are the package versions:
vdo-6.1.0.168-18.x86_64
kmod-kvdo-6.1.0.181-17.el7_5.x86_64

2. I have two different setups (both exhibit the same issue):
  a. CentOS 7.5.1804 host with packages manually installed (yum -y install vdo kmod-kvdo)
  b. oVirt NGN "appliance" with packages already installed upstream

The attached gdeploy.conf is the latest (actually it gets automatically created by Ansible from a Jinja2 template, but the template has not changed).

Please let me restate that it is a testing environment, so I can provide sosreports, logs etc.

Many thanks.


Note You need to log in before you can comment on or make changes to this bug.