Bug 1029211

Summary:

[Scale] Import 64 Node Cluster - One Host Fails To Successfully Come Up

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Matt Mahoney <mmahoney>

Component:

rhsc

Assignee:

Timothy Asir <tjeyasin>

Status:

CLOSED EOL

QA Contact:

storage-qa-internal <storage-qa-internal>

Severity:

high

Docs Contact:

Priority:

high

Version:

2.1

CC:

dpati, mmahoney, rhs-bugs, sabose, vagarwal

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-12-03 17:14:15 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Import - One Host Fails To Successfully Come Up	none

Description Matt Mahoney 2013-11-11 22:06:44 UTC

Description of problem:
Import the following cluster fails to successfully import one of the hosts:

 64 Hosts (all peer probed)
 8 Volumes with 64 bricks each (each volume with 8 Hosts and 8 Bricks on ea host)

One of the hosts failed to come Up.

- Host list shows the host is in "Unassigned" State.
- Message log contains: 
    "State was set to Up for host <host ID>"
    "Could not find gluster uuid of server 10.16.159.192 on Cluster <clusterName>


Version-Release number of selected component (if applicable):
Big Bend

How reproducible:


Steps to Reproduce:
1. Create 64 node Gluster Cluster
2. Create 8 Volumes with 64 Bricks each volume (8 hosts with 8 bricks each)
3. Import the cluster

Actual results:


Expected results:
All hosts should successfully come up.

Additional info:

Comment 1 Matt Mahoney 2013-11-11 22:09:11 UTC

Created attachment 822636 [details]
Import - One Host Fails To Successfully Come Up

Comment 2 Matt Mahoney 2013-11-11 22:10:41 UTC

Note: the same host that failed to come Up had previously been successfully added to a cluster.

Comment 4 Dusmant 2013-11-12 13:29:48 UTC

Matt, Pls. mention, which node/server did not come up, in actual result and also attach all the logs for debugging purpose.

Comment 5 Shubhendu Tripathi 2013-11-26 09:24:53 UTC

Matt,

After multiple rounds of probing/detaching/volume creation, I was able to see this issue only once where I had 64 nodes cluster and total 8 volumes with 512 bricks in all.
One host went to unassigned state after the imports.

BUT the very next sync up of the hosts, the one unassigned host also came up.

After analysis I find that there are GlusterHostUUIDNotFoundException in vdsm.log, which means getting the uuid details for the node has failed once, but next sync up time the same command was successful and so the host comes up.

Check if the same was the case in your scenario as well, or the host remains in unassigned state forever?

Comment 7 Dusmant 2013-12-10 16:36:47 UTC

Lets try to test this using Corbett build CB10 and if we can reproduce it even with 5 min resynch, then we need to take a look at it.

Comment 8 Dusmant 2013-12-10 16:38:35 UTC

We might document this bug in the release note.

Comment 9 Matt Mahoney 2013-12-10 16:39:45 UTC

Will retest in Corbett.

Comment 13 Sahina Bose 2013-12-19 05:43:33 UTC

On the nodes that failed to install, vdsm failed to start with

2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:383 execute-result: ('/sbin/service', 'vdsmd', 'start'), rc=1
2013-12-17 17:18:43 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:441 execute-output: ('/sbin/service', 'vdsmd', 'start') stdout:
Stopping ksmtuned: [FAILED]
vdsm: Stop conflicting ksmtuned[FAILED]
vdsm start[FAILED]


Tim, could you look into this

Comment 18 Vivek Agarwal 2015-12-03 17:14:15 UTC

Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.