Bug 1029211 - [Scale] Import 64 Node Cluster - One Host Fails To Successfully Come Up
[Scale] Import 64 Node Cluster - One Host Fails To Successfully Come Up
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc (Show other bugs)
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Timothy Asir
Depends On:
  Show dependency treegraph
Reported: 2013-11-11 17:06 EST by Matt Mahoney
Modified: 2015-12-03 12:14 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-12-03 12:14:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Import - One Host Fails To Successfully Come Up (1.49 MB, application/x-gzip)
2013-11-11 17:09 EST, Matt Mahoney
no flags Details

  None (edit)
Description Matt Mahoney 2013-11-11 17:06:44 EST
Description of problem:
Import the following cluster fails to successfully import one of the hosts:

 64 Hosts (all peer probed)
 8 Volumes with 64 bricks each (each volume with 8 Hosts and 8 Bricks on ea host)

One of the hosts failed to come Up.

- Host list shows the host is in "Unassigned" State.
- Message log contains: 
    "State was set to Up for host <host ID>"
    "Could not find gluster uuid of server on Cluster <clusterName>

Version-Release number of selected component (if applicable):
Big Bend

How reproducible:

Steps to Reproduce:
1. Create 64 node Gluster Cluster
2. Create 8 Volumes with 64 Bricks each volume (8 hosts with 8 bricks each)
3. Import the cluster

Actual results:

Expected results:
All hosts should successfully come up.

Additional info:
Comment 1 Matt Mahoney 2013-11-11 17:09:11 EST
Created attachment 822636 [details]
Import - One Host Fails To Successfully Come Up
Comment 2 Matt Mahoney 2013-11-11 17:10:41 EST
Note: the same host that failed to come Up had previously been successfully added to a cluster.
Comment 4 Dusmant 2013-11-12 08:29:48 EST
Matt, Pls. mention, which node/server did not come up, in actual result and also attach all the logs for debugging purpose.
Comment 5 Shubhendu Tripathi 2013-11-26 04:24:53 EST

After multiple rounds of probing/detaching/volume creation, I was able to see this issue only once where I had 64 nodes cluster and total 8 volumes with 512 bricks in all.
One host went to unassigned state after the imports.

BUT the very next sync up of the hosts, the one unassigned host also came up.

After analysis I find that there are GlusterHostUUIDNotFoundException in vdsm.log, which means getting the uuid details for the node has failed once, but next sync up time the same command was successful and so the host comes up.

Check if the same was the case in your scenario as well, or the host remains in unassigned state forever?
Comment 7 Dusmant 2013-12-10 11:36:47 EST
Lets try to test this using Corbett build CB10 and if we can reproduce it even with 5 min resynch, then we need to take a look at it.
Comment 8 Dusmant 2013-12-10 11:38:35 EST
We might document this bug in the release note.
Comment 9 Matt Mahoney 2013-12-10 11:39:45 EST
Will retest in Corbett.
Comment 13 Sahina Bose 2013-12-19 00:43:33 EST
On the nodes that failed to install, vdsm failed to start with

2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:383 execute-result: ('/sbin/service', 'vdsmd', 'start'), rc=1
2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:441 execute-output: ('/sbin/service', 'vdsmd', 'start') stdout:
Stopping ksmtuned: [FAILED]
vdsm: Stop conflicting ksmtuned[FAILED]
vdsm start[FAILED]

Tim, could you look into this
Comment 18 Vivek Agarwal 2015-12-03 12:14:15 EST
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.