Bug 1348074

Summary: netbsd regessions are failing on nbslave7h
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: project-infrastructureAssignee: Nigel Babu <nigelb>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-infra, nigelb
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-22 04:59:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ravishankar N 2016-06-20 06:06:58 UTC
Description of problem:

Something seems to be wrong with nbslave7h. The last 7 or 8 netbsd regressions triggered on it have failed with the following message:

-------------------------------------------------------------------
22:56:38 Triggered by Gerrit: http://review.gluster.org/14764
22:56:38 Building remotely on nbslave7h.cloud.gluster.org (netbsd7_regression) in workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered
22:56:38 java.io.IOException: remote file operation failed: /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at hudson.remoting.Channel@289a5bf1:nbslave7h.cloud.gluster.org: hudson.remoting.ChannelClosedException: channel is already closed
22:56:38 	at hudson.FilePath.act(FilePath.java:986)
22:56:38 	at hudson.FilePath.act(FilePath.java:968)
22:56:38 	at hudson.FilePath.mkdirs(FilePath.java:1151)
22:56:38 	at hudson.model.AbstractProject.checkout(AbstractProject.java:1267)
22:56:38 	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
22:56:38 	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
22:56:38 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
22:56:38 	at hudson.model.Run.execute(Run.java:1738)
22:56:38 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
22:56:38 	at hudson.model.ResourceController.execute(ResourceController.java:98)
22:56:38 	at hudson.model.Executor.run(Executor.java:410)
22:56:38 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
22:56:38 	at hudson.remoting.Channel.send(Channel.java:578)
22:56:38 	at hudson.remoting.Request.call(Request.java:130)
22:56:38 	at hudson.remoting.Channel.call(Channel.java:780)
22:56:38 	at hudson.FilePath.act(FilePath.java:979)
22:56:38 	... 10 more
22:56:38 Caused by: java.io.IOException
22:56:38 	at hudson.remoting.Channel.close(Channel.java:1163)
22:56:38 	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118)
22:56:38 	at hudson.remoting.PingThread.ping(PingThread.java:126)
22:56:38 	at hudson.remoting.PingThread.run(PingThread.java:85)
22:56:38 Caused by: java.util.concurrent.TimeoutException: Ping started at 1466189026283 hasn't completed by 1466189266283
22:56:38 	... 2 more
22:56:38 Finished: FAILURE
-------------------------------------------------------------------

Some of the runs:

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17641/console

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17640/console

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17639/console

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17638/console


https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17637/console

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17636/console 

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17635/console

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/17634/console

Comment 1 Nigel Babu 2016-06-20 06:14:06 UTC
23:56:59 tset: standard error: Inappropriate ioctl for device
23:56:59 Another regression is already running on this host (Jenkins bug).
23:56:59 Abort regression.
23:56:59 + RET=1
23:56:59 + '[' 1 = 0 ']'
23:56:59 + V=-1
23:56:59 + R=0
23:56:59 + VERDICT=FAILED


This seems to be the issue. I'm going to reboot it.

Comment 2 Nigel Babu 2016-06-20 06:25:11 UTC
Rebooted the machine only to get this:

11:48:46 Retriggered by user nigelb for Gerrit: http://review.gluster.org/14764
11:48:46 Building remotely on nbslave7h.cloud.gluster.org (netbsd7_regression) in workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered
11:48:46 java.io.IOException: remote file operation failed: /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at hudson.remoting.Channel@289a5bf1:nbslave7h.cloud.gluster.org: hudson.remoting.ChannelClosedException: channel is already closed
11:48:46 	at hudson.FilePath.act(FilePath.java:986)
11:48:46 	at hudson.FilePath.act(FilePath.java:968)
11:48:46 	at hudson.FilePath.mkdirs(FilePath.java:1151)
11:48:46 	at hudson.model.AbstractProject.checkout(AbstractProject.java:1267)
11:48:46 	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
11:48:46 	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
11:48:46 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
11:48:46 	at hudson.model.Run.execute(Run.java:1738)
11:48:46 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
11:48:46 	at hudson.model.ResourceController.execute(ResourceController.java:98)
11:48:46 	at hudson.model.Executor.run(Executor.java:410)
11:48:46 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
11:48:46 	at hudson.remoting.Channel.send(Channel.java:578)
11:48:46 	at hudson.remoting.Request.call(Request.java:130)
11:48:46 	at hudson.remoting.Channel.call(Channel.java:780)
11:48:46 	at hudson.FilePath.act(FilePath.java:979)
11:48:46 	... 10 more
11:48:46 Caused by: java.io.IOException
11:48:46 	at hudson.remoting.Channel.close(Channel.java:1163)
11:48:46 	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118)
11:48:46 	at hudson.remoting.PingThread.ping(PingThread.java:126)
11:48:46 	at hudson.remoting.PingThread.run(PingThread.java:85)
11:48:46 Caused by: java.util.concurrent.TimeoutException: Ping started at 1466189026283 hasn't completed by 1466189266283
11:48:46 	... 2 more
11:48:46 Finished: FAILURE


Turns out after reboot, you need to disconnect the node from Jenkins and re-launch the slave agent. This should be fixed now. I'm retrying a job to confirm.

Comment 3 Nigel Babu 2016-06-20 08:12:05 UTC
The tests seem to be running now and at least not failing in the same place. Closing this bug as resolved.