Red Hat Bugzilla – Bug 1029211
[Scale] Import 64 Node Cluster - One Host Fails To Successfully Come Up
Last modified: 2015-12-03 12:14:15 EST
Description of problem:
Import the following cluster fails to successfully import one of the hosts:
64 Hosts (all peer probed)
8 Volumes with 64 bricks each (each volume with 8 Hosts and 8 Bricks on ea host)
One of the hosts failed to come Up.
- Host list shows the host is in "Unassigned" State.
- Message log contains:
"State was set to Up for host <host ID>"
"Could not find gluster uuid of server 10.16.159.192 on Cluster <clusterName>
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create 64 node Gluster Cluster
2. Create 8 Volumes with 64 Bricks each volume (8 hosts with 8 bricks each)
3. Import the cluster
All hosts should successfully come up.
Created attachment 822636 [details]
Import - One Host Fails To Successfully Come Up
Note: the same host that failed to come Up had previously been successfully added to a cluster.
Matt, Pls. mention, which node/server did not come up, in actual result and also attach all the logs for debugging purpose.
After multiple rounds of probing/detaching/volume creation, I was able to see this issue only once where I had 64 nodes cluster and total 8 volumes with 512 bricks in all.
One host went to unassigned state after the imports.
BUT the very next sync up of the hosts, the one unassigned host also came up.
After analysis I find that there are GlusterHostUUIDNotFoundException in vdsm.log, which means getting the uuid details for the node has failed once, but next sync up time the same command was successful and so the host comes up.
Check if the same was the case in your scenario as well, or the host remains in unassigned state forever?
Lets try to test this using Corbett build CB10 and if we can reproduce it even with 5 min resynch, then we need to take a look at it.
We might document this bug in the release note.
Will retest in Corbett.
On the nodes that failed to install, vdsm failed to start with
2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:383 execute-result: ('/sbin/service', 'vdsmd', 'start'), rc=1
2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:441 execute-output: ('/sbin/service', 'vdsmd', 'start') stdout:
Stopping ksmtuned: [FAILED]
vdsm: Stop conflicting ksmtuned[FAILED]
Tim, could you look into this
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/
If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.