Bug 1029211 - [Scale] Import 64 Node Cluster - One Host Fails To Successfully Come Up
Summary: [Scale] Import 64 Node Cluster - One Host Fails To Successfully Come Up
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhsc
Version: 2.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Timothy Asir
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-11 22:06 UTC by Matt Mahoney
Modified: 2015-12-03 17:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-03 17:14:15 UTC
Embargoed:


Attachments (Terms of Use)
Import - One Host Fails To Successfully Come Up (1.49 MB, application/x-gzip)
2013-11-11 22:09 UTC, Matt Mahoney
no flags Details

Description Matt Mahoney 2013-11-11 22:06:44 UTC
Description of problem:
Import the following cluster fails to successfully import one of the hosts:

 64 Hosts (all peer probed)
 8 Volumes with 64 bricks each (each volume with 8 Hosts and 8 Bricks on ea host)

One of the hosts failed to come Up.

- Host list shows the host is in "Unassigned" State.
- Message log contains: 
    "State was set to Up for host <host ID>"
    "Could not find gluster uuid of server 10.16.159.192 on Cluster <clusterName>


Version-Release number of selected component (if applicable):
Big Bend

How reproducible:


Steps to Reproduce:
1. Create 64 node Gluster Cluster
2. Create 8 Volumes with 64 Bricks each volume (8 hosts with 8 bricks each)
3. Import the cluster

Actual results:


Expected results:
All hosts should successfully come up.

Additional info:

Comment 1 Matt Mahoney 2013-11-11 22:09:11 UTC
Created attachment 822636 [details]
Import - One Host Fails To Successfully Come Up

Comment 2 Matt Mahoney 2013-11-11 22:10:41 UTC
Note: the same host that failed to come Up had previously been successfully added to a cluster.

Comment 4 Dusmant 2013-11-12 13:29:48 UTC
Matt, Pls. mention, which node/server did not come up, in actual result and also attach all the logs for debugging purpose.

Comment 5 Shubhendu Tripathi 2013-11-26 09:24:53 UTC
Matt,

After multiple rounds of probing/detaching/volume creation, I was able to see this issue only once where I had 64 nodes cluster and total 8 volumes with 512 bricks in all.
One host went to unassigned state after the imports.

BUT the very next sync up of the hosts, the one unassigned host also came up.

After analysis I find that there are GlusterHostUUIDNotFoundException in vdsm.log, which means getting the uuid details for the node has failed once, but next sync up time the same command was successful and so the host comes up.

Check if the same was the case in your scenario as well, or the host remains in unassigned state forever?

Comment 7 Dusmant 2013-12-10 16:36:47 UTC
Lets try to test this using Corbett build CB10 and if we can reproduce it even with 5 min resynch, then we need to take a look at it.

Comment 8 Dusmant 2013-12-10 16:38:35 UTC
We might document this bug in the release note.

Comment 9 Matt Mahoney 2013-12-10 16:39:45 UTC
Will retest in Corbett.

Comment 13 Sahina Bose 2013-12-19 05:43:33 UTC
On the nodes that failed to install, vdsm failed to start with

2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:383 execute-result: ('/sbin/service', 'vdsmd', 'start'), rc=1
2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:441 execute-output: ('/sbin/service', 'vdsmd', 'start') stdout:
Stopping ksmtuned: [FAILED]
vdsm: Stop conflicting ksmtuned[FAILED]
vdsm start[FAILED]


Tim, could you look into this

Comment 18 Vivek Agarwal 2015-12-03 17:14:15 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.


Note You need to log in before you can comment on or make changes to this bug.