Bug 1312099 - All OSDs fails to setup/start during Create Cluster task
Summary: All OSDs fails to setup/start during Create Cluster task
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: core
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 2
Assignee: Nishanth Thomas
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-25 17:56 UTC by Martin Bukatovic
Modified: 2018-11-19 05:30 UTC (History)
3 users (show)

Fixed In Version: rhscon-ceph-0.0.23-1.el7scon.x86_64, rhscon-core-0.0.24-1.el7scon.x86_64, rhscon-ui-0.0.39-1.el7scon.noarch
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 05:30:51 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Martin Bukatovic 2016-02-25 17:56:46 UTC
Description of problem
======================

Create Cluster task fails. This seems to be caused by incorrect configuration
which makes all OSDs to fail.

Version-Release number of selected component
============================================

usm machine:

rhscon-ceph-0.0.6-8.el7.x86_64
rhscon-core-0.0.8-7.el7.x86_64
rhscon-ui-0.0.16-1.el7.noarch
salt-master-2015.5.5-1.el7.noarch
salt-2015.5.5-1.el7.noarch
ceph-0.94.3-6.el7cp.x86_64
ceph-common-0.94.3-6.el7cp.x86_64

ceph node (running OSD):

rhscon-agent-0.0.3-2.el7.noarch
salt-2015.5.5-1.el7.noarch
salt-minion-2015.5.5-1.el7.noarch
ceph-0.94.3-6.el7cp.x86_64
ceph-common-0.94.3-6.el7cp.x86_64
ceph-osd-0.94.3-6.el7cp.x86_64

How reproducible
================

Don't know (when I tried deploy new cluster to reproduce this issue, I run
into different issue).

Steps to Reproduce
==================

1. Install skyring on server and prepare few hosts for cluster setup
2. Accept all nodes
3. Start "Create Cluster" wizard and create a cluster (using def. config)

Actual results
==============

The Create Cluster task fails.

USM web ui reports in a "Sub Tasks" table of task details page:

~~~
1 	Started ceph provider task for cluster creation: 2e730373-017c-4ce0-b1b1-23d44674f2e8 	Feb 25 2016, 11:01:02 AM
2 	Persisting cluster details 	Feb 25 2016, 11:01:02 AM
3 	Adding mons 	Feb 25 2016, 11:01:14 AM
4 	Updating node details for cluster 	Feb 25 2016, 11:01:14 AM
5 	Starting and creating mons 	Feb 25 2016, 11:01:14 AM
6 	Getting updated nodes list for OSD creation 	Feb 25 2016, 11:01:28 AM
7 	Adding OSDs 	Feb 25 2016, 11:01:28 AM
8 	Failed. error: <nil> 	Feb 25 2016, 11:03:26 AM
~~~

Selecting interesting events from `/var/log/skyring/skyring.log` file:

~~~
2016-02-25T11:01:42+0000 INFO     saltwrapper.py:50 saltwrapper.wrapper] rv={'mbukatov-usm1-node1.example.com': {'grains_|-skyring_cluster_name_|-skyring_cluster_name_|-present': {'comment': 'Set grain skyring_cluster_name to mbukatov-usm1-cluster1', 'name': 'skyring_cluster_name', 'start_time': '11:01:29.544243', 'result': True, 'duration': 1251.506, '__run_num__': 0, 'changes': {'skyring_cluster_name': 'mbukatov-usm1-cluster1'}}, 'cmd_|-prepare_/dev/vdb_|-ceph-disk prepare --cluster mbukatov-usm1-cluster1 --cluster-uuid 54ae2c64-1273-43dd-a186-218ee967f187 --fs-type xfs --zap-disk /dev/vdb_|-run': {'comment': 'Command "ceph-disk prepare --cluster mbukatov-usm1-cluster1 --cluster-uuid 54ae2c64-1273-43dd-a186-218ee967f187 --fs-type xfs --zap-disk /dev/vdb" run', 'name': 'ceph-disk prepare --cluster mbukatov-usm1-cluster1 --cluster-uuid 54ae2c64-1273-43dd-a186-218ee967f187 --fs-type xfs --zap-disk /dev/vdb', 'start_time': '11:01:31.996207', 'result': True, 'duration': 10410.46, '__run_num__': 5, 'changes': {'pid': 31007, 'retcode': 0, 'stderr': 'partx: specified range <1:0> does not make sense\nlibust[31049/31049]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)\npartx: /dev/vdb: error adding partition 2\npartx: /dev/vdb: error adding partitions 1-2\npartx: /dev/vdb: error adding partitions 1-2', 'stdout': 'Creating new GPT entries.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nCreating new GPT entries.\nThe operation has completed successfully.\nThe operation has completed successfully.\nThe operation has completed successfully.\nmeta-data=/dev/vdb1              isize=2048   agcount=4, agsize=2031551 blks\n         =                       sectsz=512   attr=2, projid32bit=1\n         =                       crc=0        finobt=0\ndata     =                       bsize=4096   blocks=8126203, imaxpct=25\n         =                       sunit=0      swidth=0 blks\nnaming   =version 2              bsize=4096   ascii-ci=0 ftype=0\nlog      =internal log           bsize=4096   blocks=3967, version=2\n         =                       sectsz=512   sunit=0 blks, lazy-count=1\nrealtime =none                   extsz=4096   blocks=0, rtextents=0\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nThe operation has completed successfully.'}}, 'file_|-/etc/ceph/mbukatov-usm1-cluster1.conf_|-/etc/ceph/mbukatov-usm1-cluster1.conf_|-managed': {'comment': 'File /etc/ceph/mbukatov-usm1-cluster1.conf updated', 'name': '/etc/ceph/mbukatov-usm1-cluster1.conf', 'start_time': '11:01:31.822357', 'result': True, 'duration': 95.023, '__run_num__': 2, 'changes': {'diff': 'New file', 'mode': '0644'}}, 'file_|-/var/lib/ceph/osd_|-/var/lib/ceph/osd_|-directory': {'comment': 'Directory /var/lib/ceph/osd is in the correct state', 'name': '/var/lib/ceph/osd', 'start_time': '11:01:31.987959', 'result': True, 'duration': 2.53, '__run_num__': 4, 'changes': {}}, 'file_|-/var/lib/ceph/bootstrap-osd/mbukatov-usm1-cluster1.keyring_|-/var/lib/ceph/bootstrap-osd/mbukatov-usm1-cluster1.keyring_|-managed': {'comment': 'File /var/lib/ceph/bootstrap-osd/mbukatov-usm1-cluster1.keyring updated', 'name': '/var/lib/ceph/bootstrap-osd/mbukatov-usm1-cluster1.keyring', 'start_time': '11:01:31.917833', 'result': True, 'duration': 69.584, '__run_num__': 3, 'changes': {'diff': 'New file', 'mode': '0644'}}, 'grains_|-skyring_node_type_|-skyring_node_type_|-present': {'comment': 'Set grain skyring_node_type to osd', 'name': 'skyring_node_type', 'start_time': '11:01:30.796286', 'result': True, 'duration': 1009.128, '__run_num__': 1, 'changes': {'skyring_node_type': 'osd'}}}}
2016-02-25T11:01:42+0000 INFO     saltwrapper.py:48 saltwrapper.wrapper] args=(<salt.client.LocalClient object at 0x358a550>, 'mbukatov-usm1-node1.example.com', 'cmd.run_all', ['ls -l /dev/disk/by-parttypeuuid']), kwargs={}
2016-02-25T11:01:42+0000 INFO     saltwrapper.py:50 saltwrapper.wrapper] rv={'mbukatov-usm1-node1.example.com': {'pid': 31197, 'retcode': 0, 'stderr': '', 'stdout': 'total 0\nlrwxrwxrwx. 1 root root 10 Feb 25 11:01 45b0969e-9b03-4f30-b4c6-b4b80ceff106.6a2aefd7-149e-4562-a876-ba3cf6e41898 -> ../../vdb2\nlrwxrwxrwx. 1 root root 10 Feb 25 11:01 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.176e7593-dd85-47ab-9a01-b0fc7e5c107d -> ../../vdb1'}}
~~~

~~~
2016-02-25T11:01:57+0000 INFO     saltwrapper.py:48 saltwrapper.wrapper] args=(<salt.client.LocalClient object at 0x358a550>, {'mbukatov-usm1-node1.example.com': {'public_ip': '172.16.180.10', 'cluster_ip': '172.16.180.10', 'devices': {'/dev/vdb': 'xfs'}}}, 'cmd.run_all', ['ceph-disk activate-all']), kwargs={'expr_form': 'list'}
2016-02-25T11:01:58+0000 INFO     saltwrapper.py:50 saltwrapper.wrapper] rv={'mbukatov-usm1-node1.example.com': {'pid': 31406, 'retcode': 1, 'stderr': "2016-02-25 11:01:57.971441 7f6f108c5780 -1 did not load config file, using default settings.\n2016-02-25 11:01:58.007591 7fe2008fa780 -1 did not load config file, using default settings.\nlibust[31417/31417]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)\nceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'mbukatov-usm1-cluster1', 'start', 'osd.0']' returned non-zero exit status 1\nceph-disk: Error: One or more partitions failed to activate", 'stdout': '/etc/init.d/ceph: osd.0 not found (/etc/ceph/mbukatov-usm1-cluster1.conf defines osd.usm1-cluster1-0 mon.a, /var/lib/ceph defines osd.usm1-cluster1-0)'}}
2016-02-25T11:01:58+0000 INFO     saltwrapper.py:48 saltwrapper.wrapper] args=(<salt.client.LocalClient object at 0x358a550>, {'mbukatov-usm1-node1.example.com': {'public_ip': '172.16.180.10', 'cluster_ip': '172.16.180.10', 'devices': {'/dev/vdb': 'xfs'}}}, 'state.single', ['file.managed', '/etc/ceph/mbukatov-usm1-cluster1.conf', 'source=salt://skyring/conf/ceph/mbukatov-usm1-cluster1/mbukatov-usm1-cluster1.conf', 'show_diff=False']), kwargs={'expr_form': 'list'}
2016-02-25T11:01:59+0000 INFO     saltwrapper.py:50 saltwrapper.wrapper] rv={'mbukatov-usm1-node1.example.com': {'file_|-/etc/ceph/mbukatov-usm1-cluster1.conf_|-/etc/ceph/mbukatov-usm1-cluster1.conf_|-managed': {'comment': 'File /etc/ceph/mbukatov-usm1-cluster1.conf updated', 'name': '/etc/ceph/mbukatov-usm1-cluster1.conf', 'start_time': '11:01:59.091946', 'result': True, 'duration': 81.412, '__run_num__': 0, 'changes': {'diff': '<show_diff=False>'}}}}
2016-02-25T11:01:59+0000 ERROR    saltwrapper.py:498 saltwrapper.AddOSD] admin:4b937f55-f6b0-4962-b0be-c07a119cd6fc-add_osd failed. error={'mbukatov-usm1-node1.example.com': {'pid': 31406, 'retcode': 1, 'stderr': "2016-02-25 11:01:57.971441 7f6f108c5780 -1 did not load config file, using default settings.\n2016-02-25 11:01:58.007591 7fe2008fa780 -1 did not load config file, using default settings.\nlibust[31417/31417]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)\nceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'mbukatov-usm1-cluster1', 'start', 'osd.0']' returned non-zero exit status 1\nceph-disk: Error: One or more partitions failed to activate", 'stdout': '/etc/init.d/ceph: osd.0 not found (/etc/ceph/mbukatov-usm1-cluster1.conf defines osd.usm1-cluster1-0 mon.a, /var/lib/ceph defines osd.usm1-cluster1-0)'}}
2016-02-25T11:01:59+0000 INFO     saltwrapper.py:48 saltwrapper.wrapper] args=(<salt.client.LocalClient object at 0x358a550>, {'mbukatov-usm1-node2.example.com': {'public_ip': '172.16.180.78', 'cluster_ip': '172.16.180.78', 'devices': {'/dev/vdb': 'xfs'}}}, ['grains.item', 'network.subnets'], [['ipv4', 'ipv6'], []]), kwargs={'expr_form': 'list'}
2016-02-25T11:01:59+0000 INFO     saltwrapper.py:50 saltwrapper.wrapper] rv={'mbukatov-usm1-node2.example.com': {'grains.item': {'ipv4': ['127.0.0.1', '172.16.180.78'], 'ipv6': ['::1', 'fe80::f816:3eff:fe4f:29f0']}, 'network.subnets': ['172.16.180.0/24']}}
2016-02-25T11:01:59+0000 INFO     saltwrapper.py:48 saltwrapper.wrapper] args=(<salt.client.LocalClient object at 0x358a550>, {'mbukatov-usm1-node2.example.com': {'public_ip': '172.16.180.78', 'cluster_ip': '172.16.180.78', 'devices': {'/dev/vdb': 'xfs'}}}, 'state.sls', ['prepare_ceph_osd']), kwargs={'expr_form': 'list', 'kwarg': {'pillar': {'skyring': {'mbukatov-usm1-node2.example.com': {'cluster_name': 'mbukatov-usm1-cluster1', 'cluster_id': '54ae2c64-1273-43dd-a186-218ee967f187', 'devices': {'/dev/vdb': 'xfs'}}}}}}
~~~

Similar blocks of log entries can be found for all other OSD machines of the
cluster (node2, node3, node4).

When OSD setup fails on all nodes, the task fails:

~~~
^[[31m2016-02-25T11:03:28.359+01:00 ERROR    cluster.go:227 func·001]^[[0m admin:4b937f55-f6b0-4962-b0be-c07a119cd6fc- Failed to create the cluster mbukatov-usm1-cluster1
2016-02-25T11:03:28.359+01:00 ERROR    cluster.go:227 func·001] admin:4b937f55-f6b0-4962-b0be-c07a119cd6fc- Failed to create the cluster mbukatov-usm1-cluster1
^[[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:75 ReleaseLock]^[[0m Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9:0xc20b4527c0 de342cd8-2535-4eb2-8583-ce03535c3fe7:0xc20b4529e0 462b1a36-c987-4a2b-806d-2b199e632630:0xc20b452c00 870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:0xc20b452e20 fdff7bc2-1c27-4454-b775-f7ec48a52edb:0xc20b453040])
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:75 ReleaseLock] Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9:0xc20b4527c0 de342cd8-2535-4eb2-8583-ce03535c3fe7:0xc20b4529e0 462b1a36-c987-4a2b-806d-2b199e632630:0xc20b452c00 870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:0xc20b452e20 fdff7bc2-1c27-4454-b775-f7ec48a52edb:0xc20b453040])
^[[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:76 ReleaseLock]^[[0m Releasing the locks for:%!(EXTRA map[uuid.UUID]string=map[251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9:POST_Clusters : mbukatov-usm1-node3.example.com de342cd8-2535-4eb2-8583-ce03535c3fe7:POST_Clusters : mbukatov-usm1-node4.example.com 462b1a36-c987-4a2b-806d-2b199e632630:POST_Clusters : mbukatov-usm1-mon1.example.com 870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:POST_Clusters : mbukatov-usm1-node1.example.com fdff7bc2-1c27-4454-b775-f7ec48a52edb:POST_Clusters : mbukatov-usm1-node2.example.com])
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:76 ReleaseLock] Releasing the locks for:%!(EXTRA map[uuid.UUID]string=map[870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0:POST_Clusters : mbukatov-usm1-node1.example.com fdff7bc2-1c27-4454-b775-f7ec48a52edb:POST_Clusters : mbukatov-usm1-node2.example.com 251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9:POST_Clusters : mbukatov-usm1-node3.example.com de342cd8-2535-4eb2-8583-ce03535c3fe7:POST_Clusters : mbukatov-usm1-node4.example.com 462b1a36-c987-4a2b-806d-2b199e632630:POST_Clusters : mbukatov-usm1-mon1.example.com])
^[[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]^[[0m Lock Released: %!(EXTRA uuid.UUID=fdff7bc2-1c27-4454-b775-f7ec48a52edb)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=fdff7bc2-1c27-4454-b775-f7ec48a52edb)
^[[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]^[[0m Lock Released: %!(EXTRA uuid.UUID=251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=251ae1f8-c9bd-43bf-a78c-10aa5afe7ec9)
^[[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]^[[0m Lock Released: %!(EXTRA uuid.UUID=de342cd8-2535-4eb2-8583-ce03535c3fe7)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=de342cd8-2535-4eb2-8583-ce03535c3fe7)
^[[36m2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock]^[[0m Lock Released: %!(EXTRA uuid.UUID=462b1a36-c987-4a2b-806d-2b199e632630)
2016-02-25T11:03:28.36+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=462b1a36-c987-4a2b-806d-2b199e632630)
^[[36m2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:83 ReleaseLock]^[[0m Lock Released: %!(EXTRA uuid.UUID=870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0)
2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:83 ReleaseLock] Lock Released: %!(EXTRA uuid.UUID=870d6ca1-06ff-45a8-a9e5-aaa8f06bcbb0)
^[[36m2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:86 ReleaseLock]^[[0m Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[])
2016-02-25T11:03:28.361+01:00 DEBUG    lockmanager.go:86 ReleaseLock] Currently Locked:%!(EXTRA map[uuid.UUID]*lock.LockInternal=map[])
~~~

From file `/var/log/salt/minion` on node4:

~~~
2016-02-25 11:03:25,500 [salt.minion      ][INFO    ][3789] User root Executing command cmd.run_all with jid 20160225110325492251
2016-02-25 11:03:25,503 [salt.minion      ][DEBUG   ][3789] Command details {'tgt_type': 'list', 'jid': '20160225110325492251', 'tgt': {'mbukatov-usm1-node4.example.com': {'public_ip': '172.16.180.88', 'cluster_ip': '172.16.180.88', 'devices': {'/dev/vdb': 'xfs'}}}, 'ret': '', 'user': 'root', 'arg': ['ceph-di
sk activate-all'], 'fun': 'cmd.run_all'}
2016-02-25 11:03:25,521 [salt.minion      ][INFO    ][31389] Starting a new job with PID 31389
2016-02-25 11:03:25,525 [salt.utils.lazy  ][DEBUG   ][31389] LazyLoaded cmd.run_all
2016-02-25 11:03:25,528 [salt.loaded.int.module.cmdmod][INFO    ][31389] Executing command 'ceph-disk activate-all' in directory '/root'
2016-02-25 11:03:25,836 [salt.loaded.int.module.cmdmod][ERROR   ][31389] Command 'ceph-disk activate-all' failed with return code: 1
2016-02-25 11:03:25,837 [salt.loaded.int.module.cmdmod][ERROR   ][31389] stdout: /etc/init.d/ceph: osd.3 not found (/etc/ceph/mbukatov-usm1-cluster1.conf defines osd.usm1-cluster1-3 mon.a, /var/lib/ceph defines osd.usm1-cluster1-3)
2016-02-25 11:03:25,837 [salt.loaded.int.module.cmdmod][ERROR   ][31389] stderr: 2016-02-25 11:03:25.653244 7fe2efe24780 -1 did not load config file, using default settings.
2016-02-25 11:03:25.665761 7f1ef057d780 -1 did not load config file, using default settings.
libust[31401/31401]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'mbukatov-usm1-cluster1', 'start', 'osd.3']' returned non-zero exit status 1
ceph-disk: Error: One or more partitions failed to activate
2016-02-25 11:03:25,838 [salt.loaded.int.module.cmdmod][ERROR   ][31389] retcode: 1
2016-02-25 11:03:25,839 [salt.minion      ][INFO    ][31389] Returning information for job: 20160225110325492251
~~~

From file `mbukatov-usm1-cluster1-osd.3.log` from node4:

~~~
2016-02-25 11:03:10.843075 7fb5420db840  0 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-osd, pid 31287
2016-02-25 11:03:10.880282 7fb5420db840  1 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) mkfs in /var/lib/ceph/tmp/mnt.PM3KFe
2016-02-25 11:03:10.880340 7fb5420db840  1 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) mkfs fsid is already set to d7bb7bd5-8963-4fe3-a9af-6fe87866dcf0
2016-02-25 11:03:10.888172 7fb5420db840  0 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) backend xfs (magic 0x58465342)
2016-02-25 11:03:10.912339 7fb5420db840  1 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) leveldb db exists/created
2016-02-25 11:03:10.925635 7fb5420db840  1 journal _open /var/lib/ceph/tmp/mnt.PM3KFe/journal fd 12: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-02-25 11:03:10.926452 7fb5420db840 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected d7bb7bd5-8963-4fe3-a9af-6fe87866dcf0, invalid (someone else's?) journal
2016-02-25 11:03:10.932141 7fb5420db840  1 journal _open /var/lib/ceph/tmp/mnt.PM3KFe/journal fd 12: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-02-25 11:03:10.938211 7fb5420db840  0 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) mkjournal created journal on /var/lib/ceph/tmp/mnt.PM3KFe/journal
2016-02-25 11:03:10.938306 7fb5420db840  1 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) mkfs done in /var/lib/ceph/tmp/mnt.PM3KFe
2016-02-25 11:03:10.938552 7fb5420db840  0 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) backend xfs (magic 0x58465342)
2016-02-25 11:03:10.964306 7fb5420db840  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.PM3KFe) detect_features: FIEMAP ioctl is supported and appears to work
2016-02-25 11:03:10.964390 7fb5420db840  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.PM3KFe) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2016-02-25 11:03:10.972778 7fb5420db840  0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.PM3KFe) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2016-02-25 11:03:10.973021 7fb5420db840  0 xfsfilestorebackend(/var/lib/ceph/tmp/mnt.PM3KFe) detect_feature: extsize is disabled by conf
2016-02-25 11:03:10.983448 7fb5420db840  0 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2016-02-25 11:03:10.993102 7fb5420db840  1 journal _open /var/lib/ceph/tmp/mnt.PM3KFe/journal fd 18: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-02-25 11:03:11.003260 7fb5420db840  1 journal _open /var/lib/ceph/tmp/mnt.PM3KFe/journal fd 18: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-02-25 11:03:11.005829 7fb5420db840 -1 filestore(/var/lib/ceph/tmp/mnt.PM3KFe) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2016-02-25 11:03:11.035974 7fb5420db840  1 journal close /var/lib/ceph/tmp/mnt.PM3KFe/journal
2016-02-25 11:03:11.038748 7fb5420db840 -1 created object store /var/lib/ceph/tmp/mnt.PM3KFe journal /var/lib/ceph/tmp/mnt.PM3KFe/journal for osd.3 fsid 54ae2c64-1273-43dd-a186-218ee967f187
2016-02-25 11:03:11.038892 7fb5420db840 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.PM3KFe/keyring: can't open /var/lib/ceph/tmp/mnt.PM3KFe/keyring: (2) No such file or directory
2016-02-25 11:03:11.040799 7fb5420db840 -1 created new key in keyring /var/lib/ceph/tmp/mnt.PM3KFe/keyring
~~~

File /etc/ceph/mbukatov-usm1-cluster1.conf from node4:

~~~
[global]
fsid = 54ae2c64-1273-43dd-a186-218ee967f187
public network = 172.16.180.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
filestore xattr use omap = true
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 128
osd pool default pgp num = 128
osd crush chooseleaf type = 1
cluster network = 172.16.180.0/24

[mon]
mon initial members = a

[mon.a]
host = mbukatov-usm1-mon1
mon addr = 172.16.180.87:6789
~~~

Expected results
================

The Create Cluster task doesn't fail.

Comment 5 Nishanth Thomas 2016-03-10 10:03:57 UTC
Is this reproducible?

Comment 6 Martin Bukatovic 2016-03-18 19:03:02 UTC
(In reply to Nishanth Thomas from comment #5)
> Is this reproducible?

Not sure about the current state of this BZ wrt the latest builds, as the
installation and setup were changed. Let's see if I notice this again in
next few weeks.

That said, have you made any changes in a way how configuration is done?
Because is seems that the generated config files were invalid (line from
the BZ description):

~~~
2016-02-25 11:03:25,837 [salt.loaded.int.module.cmdmod][ERROR   ][31389] stdout: /etc/init.d/ceph: osd.3 not found (/etc/ceph/mbukatov-usm1-cluster1.conf defines osd.usm1-cluster1-3 mon.a, /var/lib/ceph defines osd.usm1-cluster1-3)
~~~

Comment 8 Martin Bukatovic 2016-06-01 08:30:11 UTC
I wasn't able to reproduce the issue with new builds, likely because of the
redesign related to ceph-installer integration.

Comment 9 Martin Kudlej 2016-08-08 13:25:06 UTC
I haven't seen this issue for long time and it is bug reported before ceph-installer integration. --> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.