Bug 1726612

Summary: Stratisd starts whit fail messages after reboot
Product: [Fedora] Fedora Reporter: Florian van Oudgaarden <florianvanoudgaarden>
Component: stratisdAssignee: John Baublitz <jbaublitz>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 30CC: amulhern, andy, bgurney, igor.raits, jbaublitz, rust-sig, yaneti
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 1.0.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-15 13:21:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
SOS Report from Stratisd Not Starting at boot none

Description Florian van Oudgaarden 2019-07-03 09:39:02 UTC
Description of problem:
After reboot the stratisd.service fails to start after a systemctl start stratisd.service it starts normal. wen i reboot the system again it goes back to failed 

Version-Release number of selected component (if applicable): 
# stratis --version
1.0.0

How reproducible:
Reboot system

Steps to Reproduce:
1.install fedora 30 minimum install. And update it to newest available packages
2.Install Stratisd and stratis-cli
3.Enable startisd and start stratisd whit systemctl
4.Configure some disks
5. reboot system. 

Actual results:
[root@f30-2 ~]# systemctl status stratisd.service
● stratisd.service - A daemon that manages a pool of block devices to create flexible file systems
   Loaded: loaded (/usr/lib/systemd/system/stratisd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2019-07-03 10:50:27 CEST; 22min ago
     Docs: man:stratisd(8)
  Process: 559 ExecStart=/usr/libexec/stratisd --debug (code=exited, status=1/FAILURE)
 Main PID: 559 (code=exited, status=1/FAILURE)

Jul 03 10:50:27 f30-2.test.org stratisd[559]: IO error: No such file or directory (os error 2)
[root@f30-2 ~]# systemctl start stratisd.service
[root@f30-2 ~]# systemctl status stratisd.service
● stratisd.service - A daemon that manages a pool of block devices to create flexible file systems
   Loaded: loaded (/usr/lib/systemd/system/stratisd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-07-03 11:13:25 CEST; 4s ago
     Docs: man:stratisd(8)
 Main PID: 22319 (stratisd)
    Tasks: 1 (limit: 4915)
   Memory: 3.7M
   CGroup: /system.slice/stratisd.service
           └─22319 /usr/libexec/stratisd --debug

Jul 03 11:13:25 f30-2.test.org stratisd[22319]:         }
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:     },
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:     incomplete_pools: {},
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:     watched_dev_last_event_nrs: {}
Jul 03 11:13:25 f30-2.test.org stratisd[22319]: }
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:  INFO stratisd: D-Bus API is available
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:  INFO libstratis::engine::strat_engine::thinpool::thinpool: Data tier percent used: 21
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:  INFO libstratis::engine::strat_engine::thinpool::thinpool: Data tier percent used: 21
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:  INFO libstratis::engine::strat_engine::thinpool::thinpool: Data tier percent used: 21
Jul 03 11:13:25 f30-2.test.org stratisd[22319]:  INFO libstratis::engine::strat_engine::thinpool::thinpool: Data tier percent used: 21
[root@f30-2 ~]#

Expected results:
A normal started service

Additional info:

Comment 1 mulhern 2019-07-03 12:56:39 UTC
Note: the information provided by "stratis --version" may be incorrect due to https://bugzilla.redhat.com/show_bug.cgi?id=1652124. This was fixed in https://github.com/stratis-storage/stratis-cli/pull/265, and included in version 1.0.3. So, the actual version of stratis-cli being run is probably 1.0.2.

Comment 2 mulhern 2019-07-03 13:28:11 UTC
Note: We can rule out the problem of the missing application binary due to ordering problems as the error message would unambiguously identify this problem. So, the error "Jul 03 10:50:27 f30-2.test.org stratisd[559]: IO error: No such file or directory (os error 2)" refers definitely to something else.

Comment 3 mulhern 2019-07-03 13:30:45 UTC
We will attempt to reproduce this problem.

Comment 4 mulhern 2019-07-03 13:32:36 UTC
Re-assigning to stratis team member, as this seems unlikely to be a packaging problem.

Comment 5 Bryan Gurney 2019-07-03 15:10:38 UTC
I have a couple of questions:

How often is this issue reproducible?  Is it reproducible 100% of the time?  Most of the time?  Sometimes?

(In the Bugzilla template, this is the question being asked by the "How reproducible:" field.)

Also, in your reproduction step "4.Configure some disks", can you elaborate on the approximate setup of the pools and/or filesystems?

On a test VM, I set up Fedora 30 Workstation, with the following package versions:
stratisd-1.0.4-1.module_f30+4201+e7a8c4f2.x86_64
stratis-cli-1.0.2-1.module_f30+4065+aaaad0b9.x86_64

After installing stratis-cli and stratisd, and running "systemctl enable stratisd" and "systemctl start stratisd", I created one filesystem in one pool, with the following commands:

stratis pool create spool1 /dev/vdb1
stratis fs create spool1 sfs1

After restarting, stratisd was running, and the stratis device was running.

To test the same pool with automatic mounting, I added this line to /etc/fstab:
/stratis/spool1/sfs1 /mnt/sfs1/                                 xfs     defaults        0 0

After restarting, stratisd was running, and the filesystem was mounted on /mnt/sfs1.

For your tests (where stratisd is failing to start) are there any additional steps that are different from the steps in my test?

Comment 6 Florian van Oudgaarden 2019-07-04 15:46:02 UTC
I have one pool 
# stratis pool list
Name           Total Physical Size  Total Physical Used
greaterpool                  8 GiB             1.65 GiB

in that pool i made three filesystem points
# stratis filesystem list
Pool Name    Name  Used     Created            Device                     UUID
greaterpool  log2  546 MiB  Jul 02 2019 14:54  /stratis/greaterpool/log2  3a781428dea149ada3c2c2b86f7d88df
greaterpool  log1  546 MiB  Jul 02 2019 14:54  /stratis/greaterpool/log1  558eebd593d7449a802eb177d8cd5d00
greaterpool  log3  546 MiB  Jul 02 2019 14:54  /stratis/greaterpool/log3  acb1b3c3fa9f40ac8e3db4d9096839db

And i added the following lines to my /etc/fstab
UUID=acb1b3c3-fa9f-40ac-8e3d-b4d9096839db /mount/log3 xfs defaults 0 0
UUID=3a781428-dea1-49ad-a3c2-c2b86f7d88df /mount/log2 xfs defaults 0 0
UUID=558eebd5-93d7-449a-802e-b177d8cd5d00 /mount/log1 xfs defaults 0 0

Strange thing is wen I remove the lines from my fs tab the stratis deamon starts normal after reboot. 
wen I add them, and do the mount -a command the mount points work fine. 
wen I reboot again the boot log will give the following message :

Jul 04 17:11:04 f30-2.test.org systemd[1]: dev-disk-by\x2duuid-558eebd5\x2d93d7\x2d449a\x2d802e\x2db177d8cd5d00.device: Job dev-disk-by\x2duuid-558eebd5\x2d93d7\x2d449a\x2d802e\x2db177d8cd5d00.device/start timed out.
Jul 04 17:11:04 f30-2.test.org systemd[1]: Timed out waiting for device /dev/disk/by-uuid/558eebd5-93d7-449a-802e-b177d8cd5d00.
Jul 04 17:11:04 f30-2.test.org systemd[1]: Dependency failed for /mount/log1.
Jul 04 17:11:04 f30-2.test.org systemd[1]: Dependency failed for Local File Systems.
Jul 04 17:11:04 f30-2.test.org systemd[1]: Dependency failed for Mark the need to relabel after reboot.
Jul 04 17:11:04 f30-2.test.org systemd[1]: selinux-autorelabel-mark.service: Job selinux-autorelabel-mark.service/start failed with result 'dependency'.
Jul 04 17:11:04 f30-2.test.org systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jul 04 17:11:04 f30-2.test.org systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jul 04 17:11:04 f30-2.test.org systemd[1]: mount-log1.mount: Job mount-log.mount/start failed with result 'dependency'.
Jul 04 17:11:04 f30-2.test.org systemd[1]: dev-disk-by\x2duuid-558eebd5\x2d93d7\x2d449a\x2d802e\x2db177d8cd5d00.device: Job dev-disk-by\x2duuid-558eebd5\x2d93d7\x2d449a\x2d802e\x2db177d8cd5d00.device/start failed with result 'timeout'.
Jul 04 17:11:04 f30-2.test.org systemd[1]: dev-disk-by\x2duuid-3a781428\x2ddea1\x2d49ad\x2da3c2\x2dc2b86f7d88df.device: Job dev-disk-by\x2duuid-3a781428\x2ddea1\x2d49ad\x2da3c2\x2dc2b86f7d88df.device/start timed out.
Jul 04 17:11:04 f30-2.test.org systemd[1]: Timed out waiting for device /dev/disk/by-uuid/3a781428-dea1-49ad-a3c2-c2b86f7d88df.
Jul 04 17:11:04 f30-2.test.org systemd[1]: Dependency failed for /mount/log2.
Jul 04 17:11:04 f30-2.test.org systemd[1]: mount-log2.mount: Job mount-log2.mount/start failed with result 'dependency'.
Jul 04 17:11:04 f30-2.test.org systemd[1]: dev-disk-by\x2duuid-3a781428\x2ddea1\x2d49ad\x2da3c2\x2dc2b86f7d88df.device: Job dev-disk-by\x2duuid-3a781428\x2ddea1\x2d49ad\x2da3c2\x2dc2b86f7d88df.device/start failed with result 'timeout'. 

After that I come in the emergency mode.   

So only wen I have my mount points configured in the fstab the failure whit the stratis deamon appears.

Got the same configuration and the same packages :
[root@f30-2 ~]# rpm -q stratisd
stratisd-1.0.4-1.module_f30+4201+e7a8c4f2.x86_64
[root@f30-2 ~]# rpm -q stratis-cli
stratis-cli-1.0.2-1.module_f30+4065+aaaad0b9.x86_64

Comment 7 Bryan Gurney 2019-07-08 19:32:46 UTC
What was the base block device used for creating the Stratis pool?

I used a partition on an NVMe solid state drive, with the following commands:

# stratis pool create spool1 /dev/nvme0n1p5
# stratis fs create spool1 sfs1
# stratis fs create spool1 sfs2
# stratis fs create spool1 sfs3

I used the following /etc/fstab lines:

Test 1 (obtained via grepping for the relevant lines in /proc/mounts, and then replacing the xfs mount parameters with "defaults"):
/dev/mapper/stratis-1-60027c704fc14cbaaf7254da329ce044-thin-fs-2087ea789725457a9b8a1a6c751e0fa4 /mnt/sfs1 xfs defaults 0 0
/dev/mapper/stratis-1-60027c704fc14cbaaf7254da329ce044-thin-fs-1ad9e5d623124ccbb21ae0003033fd42 /mnt/sfs2 xfs defaults 0 0
/dev/mapper/stratis-1-60027c704fc14cbaaf7254da329ce044-thin-fs-9ae62c03ec4e44578df51b45d4cda605 /mnt/sfs3 xfs defaults 0 0

Test 2 (obtained by running "blkid /stratis/spool1/sfs1", etc.):
UUID=2087ea78-9725-457a-9b8a-1a6c751e0fa4 /mnt/sfs1 xfs defaults 0 0
UUID=1ad9e5d6-2312-4ccb-b21a-e0003033fd42 /mnt/sfs2 xfs defaults 0 0
UUID=9ae62c03-ec4e-4457-8df5-1b45d4cda605 /mnt/sfs3 xfs defaults 0 0

In both cases, my test system booted successfully, without any problems.

Comment 8 Bryan Gurney 2019-07-08 19:54:32 UTC
To clarify my question about the block device being used for the Stratis pool: do you know if it's a device that may take longer to initialize than a standard SATA or NVMe storage device?  (Even if it isn't, this may be a useful data point.)

Comment 9 Florian van Oudgaarden 2019-07-10 10:44:39 UTC
The block devices that are used are virtual block devices run whit virtual box on my apple notebook. they are vdi files used by virtualbox and the block device that is buildup whit lvm has no problems while booting. I will test this on my kvm environment on my centos 7 server tonight (I'm in the CET timezone) so i can rule out Virtualbox. never thought of that to be honest.

Comment 10 Dustin 2019-09-12 02:02:22 UTC
Seeing the same thing on RHEL 8.0

Fails on Boot, then when I login and start stratisd normally from the command line. It starts fine.

I have `noauto,x-systemd.automount,defaults,x-systemd.requires=stratisd.service` in the `/etc/fstab file`
To let it boot without going to emergency.target

Else with normal fstab options `defaults,x-systemd.requires=stratisd.service 0 0`
it drops to emergency.target. After failing to mount.


# journalctl -xb

...

Sep 11 15:42:32 rhel8-9999.local stratisd[676]: IO error: No such file or directory (os error 2)
Sep 11 15:42:32 rhel8-9999.local systemd[1]: stratisd.service: Main process exited, code=exited, status=1/FAILURE
Sep 11 15:42:32 rhel8-9999.local systemd[1]: stratisd.service: Failed with result 'exit-code'.
...
Sep 11 15:42:54 rhel8-9999.local dracut[3102]: dracut module 'stratis' will not be installed, because command 'stratisd-init' could not be found!



# [devops@rhel8-9999 ~]$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.0 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.0"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.0 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.0:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.0"
[devops@rhel8-9999 ~]$


// =================================================

[devops@rhel8-9999 ~]$ rpm -qi stratis-cli stratisd
Name        : stratis-cli
Version     : 1.0.2
Release     : 1.el8
Architecture: noarch
Install Date: Fri 06 Sep 2019 09:59:12 AEST
Group       : Unspecified
Size        : 129260
License     : ASL 2.0
Signature   : RSA/SHA256, Fri 14 Dec 2018 15:35:00 AEDT, Key ID 199e2f91fd431d51
Source RPM  : stratis-cli-1.0.2-1.el8.src.rpm
Build Date  : Wed 12 Dec 2018 10:26:04 AEDT
Build Host  : x86-vm-08.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/stratis-storage/stratis-cli
Summary     : Command-line tool for interacting with the Stratis daemon
Description :
stratis provides a command-line interface (CLI) for
interacting with the Stratis daemon, stratisd. stratis
interacts with stratisd via D-Bus.
Name        : stratisd
Version     : 1.0.3
Release     : 1.el8
Architecture: x86_64
Install Date: Fri 06 Sep 2019 09:59:11 AEST
Group       : Unspecified
Size        : 4215640
License     : MPLv2.0
Signature   : RSA/SHA256, Tue 15 Jan 2019 07:33:12 AEDT, Key ID 199e2f91fd431d51
Source RPM  : stratisd-1.0.3-1.el8.src.rpm
Build Date  : Fri 11 Jan 2019 15:50:20 AEDT
Build Host  : x86-vm-07.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : https://github.com/stratis-storage/stratisd
Summary     : Daemon that manages block devices to create filesystems
Description :
Daemon that manages block devices to create filesystems.
[devops@rhel8-9999 ~]$

// =================================================

[devops@rhel8-9999 ~]$ cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Fri Sep  6 09:42:49 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
/dev/mapper/rhel-root   /                       xfs     defaults        0 0
UUID=2fd90641-d8a1-4cb2-b3f9-397cd8e3a8d1 /boot                   xfs     defaults        0 0
/dev/mapper/rhel-home   /home                   xfs     defaults        0 0
/dev/mapper/rhel-var    /var                    xfs     defaults        0 0
/dev/mapper/rhel-swap   swap                    swap    defaults        0 0

#UUID=22ef2306-2bd4-406a-aefa-e1b71c918341 /mnt/data/1  xfs     nofail,defaults,x-systemd.requires=stratisd.service 0 0
#UUID=5dd9b7b5-4589-457f-ac49-f51d3aa7cd56 /mnt/data/2  xfs     noauto,x-systemd.automount,defaults,x-systemd.requires=stratisd.service 0 0

UUID=22ef2306-2bd4-406a-aefa-e1b71c918341 /mnt/data/1   xfs     noauto,x-systemd.automount,defaults,x-systemd.requires=stratisd.service 0 0
UUID=5dd9b7b5-4589-457f-ac49-f51d3aa7cd56 /mnt/data/2   xfs     noauto,x-systemd.automount,defaults,x-systemd.requires=stratisd.service 0 0

// =================================================

[devops@rhel8-9999 ~]$ sudo stratis blockdev list
Pool Name     Device Node    Physical Size   State  Tier
stratis_test  /dev/sda4           1.48 TiB  In-use  Data
stratis_test  /dev/sda5           1.48 TiB  In-use  Data
[devops@rhel8-9999 ~]$


// ==================================================

[devops@rhel8-9999 ~]$ df -h
Filesystem             Size  Used Avail Use% Mounted on
devtmpfs                16G     0   16G   0% /dev
tmpfs                   16G     0   16G   0% /dev/shm
tmpfs                   16G  8.8M   16G   1% /run
tmpfs                   16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/rhel-root   50G  2.2G   48G   5% /
/dev/sda2             1014M  232M  783M  23% /boot
/dev/mapper/rhel-home  200G  2.0G  198G   1% /home
/dev/mapper/rhel-var   1.0T   11G 1013G   2% /var
tmpfs                  3.1G     0  3.1G   0% /run/user/1000

[devops@rhel8-9999 ~]$ sudo mount /stratis/stratis_test/
fs_howto     my_precious
[devops@rhel8-9999 ~]$ sudo mount /stratis/stratis_test/fs_howto /mnt/data/1/
[devops@rhel8-9999 ~]$ df -h
Filesystem                                                                                       Size  Used Avail Use% Mounted on
devtmpfs                                                                                          16G     0   16G   0% /dev
tmpfs                                                                                             16G     0   16G   0% /dev/shm
tmpfs                                                                                             16G  8.8M   16G   1% /run
tmpfs                                                                                             16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/rhel-root                                                                             50G  2.2G   48G   5% /
/dev/sda2                                                                                       1014M  232M  783M  23% /boot
/dev/mapper/rhel-home                                                                            200G  2.0G  198G   1% /home
/dev/mapper/rhel-var                                                                             1.0T   11G 1013G   2% /var
tmpfs                                                                                            3.1G     0  3.1G   0% /run/user/1000
/dev/mapper/stratis-1-b165796c3eff42cb8decc0083fdad600-thin-fs-5dd9b7b54589457fac49f51d3aa7cd56  1.0T   14G 1011G   2% /mnt/data/1
[devops@rhel8-9999 ~]$ sudo mount /stratis/stratis_test/my_precious /mnt/data/2
[devops@rhel8-9999 ~]$ df -h
Filesystem                                                                                       Size  Used Avail Use% Mounted on
devtmpfs                                                                                          16G     0   16G   0% /dev
tmpfs                                                                                             16G     0   16G   0% /dev/shm
tmpfs                                                                                             16G  8.8M   16G   1% /run
tmpfs                                                                                             16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/rhel-root                                                                             50G  2.2G   48G   5% /
/dev/sda2                                                                                       1014M  232M  783M  23% /boot
/dev/mapper/rhel-home                                                                            200G  2.0G  198G   1% /home
/dev/mapper/rhel-var                                                                             1.0T   11G 1013G   2% /var
tmpfs                                                                                            3.1G     0  3.1G   0% /run/user/1000
/dev/mapper/stratis-1-b165796c3eff42cb8decc0083fdad600-thin-fs-5dd9b7b54589457fac49f51d3aa7cd56  1.0T   14G 1011G   2% /mnt/data/1
/dev/mapper/stratis-1-b165796c3eff42cb8decc0083fdad600-thin-fs-22ef23062bd4406aaefae1b71c918341  1.0T  7.2G 1017G   1% /mnt/data/2
[devops@rhel8-9999 ~]$

// =========================================================================================

I have an sosreport if required.

Comment 11 Bryan Gurney 2019-09-12 15:19:24 UTC
Yes, please do send an sosreport, if you can.  The output from "sos_commands/systemd/systemd-analyze*" may help to diagnose any timing-related issues.

One other question: before the stratisd error of "No such file or directory", was there a systemd error with the text "failed with result 'timeout'"?

Comment 12 mulhern 2019-09-12 19:27:36 UTC
We can not rule out the possibility that this occurs due to failure to open the pid file. The pid file itself is opened with the create flag, but if the directory "/var/run" for some reason does not exist at the time that the attempt is made to open the pid file, then the IO error that we see will result.

Comment 13 Dustin 2019-09-12 20:40:53 UTC
Created attachment 1614657 [details]
SOS Report from Stratisd Not Starting at boot

SOS Report from Stratisd Not Starting at boot

Comment 14 John Baublitz 2019-09-19 17:22:21 UTC
https://github.com/stratis-storage/stratisd/issues/1630 is the issue on Github that we'll be using to track the resolution of this problem.

Comment 15 mulhern 2020-04-15 13:21:53 UTC
stratisd >= 1.0.6 is available in Fedora >= 32