Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1498390

Summary:

Centos regressions fail on slave20

Product:

[Community] GlusterFS

Reporter:

Nithya Balachandran <nbalacha>

Component:

project-infrastructure

Assignee:

bugs <bugs>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

mainline

CC:

bugs, gluster-infra, mscherer, nigelb

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-12-06 08:46:24 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Centos regression failure.	none

Description Nithya Balachandran 2017-10-04 08:12:13 UTC

Description of problem:

A failed run:
https://build.gluster.org/job/centos6-regression/6678/


Error reported:

Triggered by Gerrit: https://review.gluster.org/17851
ERROR: Issue with creating launcher for agent slave20.cloud.gluster.org. The agent is being disconnected


Complete exception log in the attachment.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Nithya Balachandran 2017-10-04 08:12:42 UTC

Created attachment 1334089 [details]
Centos regression failure.

Comment 2 M. Scherer 2017-10-04 08:27:38 UTC

slave27 is also affected, I am rebooting it, after taking slave20 out of rotation and inspecting the others.

Comment 3 M. Scherer 2017-10-04 13:14:35 UTC

Might be relatex to https://review.gluster.org/#/c/17789/ since this patch did ran on slave20 and slave27, who are broken and slave25, that I try to investigate right now.

Comment 4 M. Scherer 2017-10-04 13:30:19 UTC

So I reenabled slave27, I rebooted slave25. I guess I will try to dig a bit more on slave20 to dig what is the error, but likely reboot once I figure I have no idea to debug what is going with it.

Comment 5 M. Scherer 2017-10-04 14:01:23 UTC

The process on slave20 seems to have been started by https://review.gluster.org/18271 

# (for i in $(ps fax |grep gluster |awk '{ print $1}' ); do cat /proc/$i/environ |sed 's/=/\n/g' |grep -a -A 1 GERRIT_CHANGE_URL |sed 's/SUDO_COMMAND//' ; done;) |grep -a http | sort -u

https://review.gluster.org/18271


I am gonna reboot and keep a eye on this one. If any other builder fail, please ping me on irc.

Comment 6 M. Scherer 2017-10-05 10:57:07 UTC

So slave27 issue was different. I did had to wipe the /home/jenkins/root to make it work again for some reason. I looked on the rpm content, no issue, no issue with selinux, no disk full. I am a bit puzzled.

Comment 7 M. Scherer 2017-10-06 07:51:38 UTC

So today, that's slave22 with a ton of process broken after running regressions for https://review.gluster.org/#/c/18271/

Comment 8 M. Scherer 2017-10-09 14:11:57 UTC

For the record, I did reboot slave22, but forgot to write it down, sorry about that.

Comment 9 M. Scherer 2017-10-09 14:17:46 UTC

Slave22 wasn't fully recovered, i suspect we have a 2nd issue on our hands. I terminated all java process on that server, and did restart the agent. 

The log say nothing useful from where I look.

Comment 10 Nigel Babu 2017-12-06 08:46:24 UTC

This issue need a full node restarted and a disconnect/reconnect. We don't see this problem anymore.