1296464 – Create Cluster: ceph-disk activate-all fails in non deterministic way because of missing device file

Bug 1296464 - Create Cluster: ceph-disk activate-all fails in non deterministic way because of missing device file

Summary: Create Cluster: ceph-disk activate-all fails in non deterministic way because...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	Ceph
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	2
Assignee:	Shubhendu Tripathi
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-07 10:34 UTC by Daniel Horák
Modified:	2016-07-20 17:45 UTC (History)
CC List:	8 users (show)
Fixed In Version:	rhscon-core-0.0.34-1.el7scon.x86_64 rhscon-ceph-0.0.33-1.el7scon.x86_64 rhscon-ui-0.0.47-1.el7scon.noarch
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-07-20 17:45:54 UTC
Embargoed:

Attachments	(Terms of Use)
excerpt from /var/log/salt/minion of node1 (5.91 KB, text/plain) 2016-01-14 18:13 UTC, Martin Bukatovic	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gerrithub.io	261132	0	None	None	None	2016-01-25 13:02:49 UTC
Red Hat Bugzilla	1299997	0	unspecified	CLOSED	not clear how to recover from failure of "Create Cluster" operation	2021-02-22 00:41:40 UTC

Internal Links: 1299997

Description Daniel Horák 2016-01-07 10:34:16 UTC

Description of problem:
  Cluster creating sometimes fails with not much specific error (see Bug 1296215). It might be related to some ERRORs in /var/log/salt/minion on one node.

    <<truncated>>
  2016-01-07 10:45:42,664 [salt.loaded.int.module.cmdmod][DEBUG   ][16495] stderr: partx: specified range <1:0> does not make sense
  libust[16596/16596]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
  partx: /dev/vdb: error adding partition 2
  partx: /dev/vdb: error adding partitions 1-2
  partx: /dev/vdb: error adding partitions 1-2
  2016-01-07 10:45:42,664 [salt.state       ][INFO    ][16495] {'pid': 16556, 'retcode': 0, 'stderr': 'partx: specified range <1:0> does not make sense\nlibust[16596/16596]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)\npartx: /dev/vdb: error adding partition 2\npartx: /dev/vdb: error adding partitions 1-2\npartx: /dev/vdb: error adding partitions 1-2', 'stdout': 'Creating new GPT entries.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nCreating new GPT entries.\nThe operation has completed successfully.\nThe operation has completed successfully.\nThe operation has completed successfully.\nmeta-data=/dev/vdb1              isize=2048   agcount=4, agsize=2555839 blks\n         =                       sectsz=512   attr=2, projid32bit=1\n         =                       crc=0        finobt=0\ndata     =                       bsize=4096   blocks=10223355, imaxpct=25\n         =                       sunit=0      swidth=0 blks\nnaming   =version 2              bsize=4096   ascii-ci=0 ftype=0\nlog      =internal log           bsize=4096   blocks=4991, version=2\n         =                       sectsz=512   sunit=0 blks, lazy-count=1\nrealtime =none                   extsz=4096   blocks=0, rtextents=0\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nThe operation has completed successfully.'}
  2016-01-07 10:45:42,665 [salt.state       ][INFO    ][16495] Completed state [ceph-disk prepare --cluster TestCluster01 --cluster-uuid b917c0c8-740b-49be-9488-f0447630172d --fs-type xfs --zap-disk /dev/vdb] at time 10:45:42.664719
  2016-01-07 10:45:42,665 [salt.state       ][DEBUG   ][16495] File /var/cache/salt/minion/accumulator/55341904 does not exist, no need to cleanup.
  2016-01-07 10:45:42,667 [salt.minion      ][INFO    ][16495] Returning information for job: 20160107104536156506
  2016-01-07 10:45:42,667 [salt.crypt       ][DEBUG   ][16495] Re-using SAuth for ('/etc/salt/pki/minion', 'usm3-node1.novalocal', 'tcp://172.16.61.12:4506')
  2016-01-07 10:45:42,693 [salt.minion      ][INFO    ][13034] User root Executing command cmd.run_all with jid 20160107104542689550
  2016-01-07 10:45:42,693 [salt.minion      ][DEBUG   ][13034] Command details {'tgt_type': 'list', 'jid': '20160107104542689550', 'tgt': {'usm3-node1.novalocal': {'public_ip': '172.16.61.10', 'cluster_ip': '172.16.61.10', 'devices': {'/dev/vdb': 'xfs'}}}, 'ret': '', 'user': 'root', 'arg': ['ceph-disk activate-all'], 'fun': 'cmd.run_all'}
  2016-01-07 10:45:42,701 [salt.minion      ][INFO    ][16812] Starting a new job with PID 16812
  2016-01-07 10:45:42,705 [salt.utils.lazy  ][DEBUG   ][16812] LazyLoaded cmd.run_all
  2016-01-07 10:45:42,708 [salt.loaded.int.module.cmdmod][INFO    ][16812] Executing command 'ceph-disk activate-all' in directory '/root'
  2016-01-07 10:45:44,390 [salt.loaded.int.module.cmdmod][ERROR   ][16812] Command 'ceph-disk activate-all' failed with return code: 1
  2016-01-07 10:45:44,390 [salt.loaded.int.module.cmdmod][ERROR   ][16812] stderr: error: /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.700058d0-33f3-46ca-a418-7dbd66b119f6: No such file or directory
  ceph-disk: Cannot discover filesystem type: device /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.700058d0-33f3-46ca-a418-7dbd66b119f6: Command '/sbin/blkid' returned non-zero exit status 2
  ceph-disk: Error: One or more partitions failed to activate
  2016-01-07 10:45:44,391 [salt.loaded.int.module.cmdmod][ERROR   ][16812] retcode: 1
    <<truncated>>

Version-Release number of selected component (if applicable):
USM server:
  # rpm -qa | grep -e rhscon -e salt | sort 
    rhscon-ceph-0.0.3-0.1.alpha1.el7.x86_64
    rhscon-core-0.0.5-0.1.alpha1.el7.x86_64
    rhscon-ui-0.0.3-0.1.alpha1.el7.noarch
    salt-2015.5.5-1.el7.noarch
    salt-master-2015.5.5-1.el7.noarch

USM nodes:
  # rpm -qa | grep -e rhscon -e salt | sort 
    rhscon-agent-0.0.3-0.1.alpha1.el7.noarch
    salt-2015.5.5-1.el7.noarch
    salt-minion-2015.5.5-1.el7.noarch

How reproducible:
  cca 30%

Steps to Reproduce:
1. prepare 1 USM server and 4 USM nodes accordingly to doc
2. accept all discovered hosts and try to create cluster
3. if the cluster creation fails, check /var/log/salt/minion on all nodes

Actual results:
  Cluster creation fails sometimes.

Expected results:
  Cluster creation pass

Additional info:

Comment 1 Shubhendu Tripathi 2016-01-07 12:08:01 UTC

The error message "Error: One or more partitions failed to activate" indicates that the issue is happening due to im-proper disk cleanup, which was used for OSD creation earlier.

Comment 2 Daniel Horák 2016-01-07 12:48:13 UTC

It happened on freshly created nodes (in OpenStack), so I think it should be clean. I'll ping you on IRC or send email, when I notice it again with login details.

Comment 3 Shubhendu Tripathi 2016-01-07 18:19:42 UTC

Cleaning the disks resolves the issue.
Tried in the setup provided in cluster creation was successful.

Comment 4 Daniel Horák 2016-01-14 07:50:12 UTC

It's probably not 100% reproducible, but it happened quite often on different clusters (though created and installed by the same install script and ansible playbook).
The whole cluster is always freshly installed and the data disk (/dev/vdb, clean but formated to ext3 and mounted to /mnt) is cleaned via "dd if=/dev/zero of=/dev/vdb bs=1MB count=10" and then the machine is rebooted (this is done before the USM cluster installation/configuration).

Comment 5 Martin Bukatovic 2016-01-14 18:13:07 UTC

Created attachment 1114907 [details]
excerpt from /var/log/salt/minion of node1

(In reply to Shubhendu Tripathi from comment #1)
> The error message "Error: One or more partitions failed to activate"
> indicates that the issue is happening due to im-proper disk cleanup, which
> was used for OSD creation earlier.

I disagree here.

You can see in /var/log/salt/minion except I'm attaching to this BZ
(I hit this issue today) that node1's salt minion did:

 * create new GTP table on /dev/vdb (with 2 paritions)
 * create xfs filesystem on /dev/vdb1 device (one of new partitions)
 * have some partx issues
 * fail on 'ceph-disk activate-all'

This means that it doesn't matter what state device /dev/vdb was in,
because salt minion did create completely new partition table here
and formatted it's 1st partition. Whatever device /dev/vdb contained
before, it's long gone since salt minion reclaimed the device.

I verified this by checking the state on the node1 myself:

~~~
# fdisk -l /dev/vdb
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/vdb: 34.4 GB, 34359738368 bytes, 67108864 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1      2099200     67108830     31G  unknown         ceph data
 2         2048      2097152   1023M  unknown         ceph journal
~~~

This means that partition table has been created successfully. Going little
further:

~~~
# mkdir /mnt/foo
# mount /dev/vdb1 /mnt/foo
# findmnt /mnt/foo
TARGET   SOURCE    FSTYPE OPTIONS
/mnt/foo /dev/vdb1 xfs    rw,relatime,seclabel,attr2,inode64,noquota
~~~

This means that 1st partition has been created successfully.

So if salt minion managed to create new partition table, 2 new paritions
with filesystem on the 1st one, but *failed* to run `ceph-disk activate-all`,
it means that salt automation (as it's currently part of rhscon) is broken.

(In reply to Shubhendu Tripathi from comment #3)
> Cleaning the disks resolves the issue.
> Tried in the setup provided in cluster creation was successful.

But what you cleaned here was configuration created by salt, so if you need to
clean it up and rerun manually, it's just another evidence that disk setup as
implemented in rhscon using salt is broken.

So let's go even deeper and check the failure:


~~~
stderr: 2016-01-14 16:26:14.815515 7fc2cde44780 -1 did not load config file, using default settings.
2016-01-14 16:26:14.832013 7f08b27e8780 -1 did not load config file, using default settings.
mount: special device /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.95d49259-f0e8-4825-9ad6-d4c704d2801a does not exist
ceph-disk: Mounting filesystem failed: Command '['/usr/bin/mount', '-t', 'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.95d49259-f0e8-4825-9ad6-d4c704d2801a', '/var/lib/ceph/tmp/mnt.Qa
5BZm']' returned non-zero exit status 32
~~~

In this piece of log, we can see that mount of some partition fails. Let's
check which one it is:

~~~
# ls -l /dev/disk/by-parttypeuuid/
total 0
lrwxrwxrwx. 1 root root 10 Jan 14 16:26 45b0969e-9b03-4f30-b4c6-b4b80ceff106.b0cd1b9a-d345-4255-a90d-aa105acf8d2f -> ../../vdb2
lrwxrwxrwx. 1 root root 10 Jan 14 16:26 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.95d49259-f0e8-4825-9ad6-d4c704d2801a -> ../../vdb1
~~~

Oh, so it's actually /dev/vdb1, which I did successfully mounted already.

But just to be sure:

~~~
# umount /mnt/foo/
# mount /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.95d49259-f0e8-4825-9ad6-d4c704d2801a /mnt/foo/
# findmnt /mnt/foo
TARGET   SOURCE    FSTYPE OPTIONS
/mnt/foo /dev/vdb1 xfs    rw,relatime,seclabel,attr2,inode64,noquota
~~~

It just works as expected.

To sum this up: it seems that the device didn't exist when ceph-disk
activate-all tried to mount it. And you would need to fix salt deployment
code to one of: wait for the device to appear, or to make sure it appears
sooner, or to use different way to work with the device.

Comment 6 Martin Bukatovic 2016-01-14 18:16:38 UTC

I'm setting the severity to high as this BZ causes cluster setup to fail in a non deterministic way.

Comment 7 Kyle Bader 2016-01-29 19:13:54 UTC

Cleaning shouldn't be done with dd, it's much better to use ceph-disk zap. I think I agree with Martin here, we need to wait for the device to appear, or make it appear sooner. I'm not sure of a way to do the later, so the former is likely what needs to happen.

Comment 11 Shubhendu Tripathi 2016-07-20 17:45:54 UTC

As correctly pointed out by Daniel, this path is not applicable now when ceph-installer is used for provisioning the ceph cluster.

Closing this now.

Note You need to log in before you can comment on or make changes to this bug.