Bug 1619838 - Jenkins connection issues failing tests
Summary: Jenkins connection issues failing tests
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Nigel Babu
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-21 21:17 UTC by Yaniv Kaul
Modified: 2018-10-03 04:13 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-10-03 04:13:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Yaniv Kaul 2018-08-21 21:17:28 UTC
Description of problem:
See https://build.gluster.org/job/smoke/43221/console for example:
23:39:34 Triggered by Gerrit: https://review.gluster.org/20776
23:39:35 Building remotely on builder24.int.rht.gluster.org (smoke7 rpm7) in workspace /home/jenkins/root/workspace/smoke
23:39:35 Wiping out workspace first.
23:39:35 Cloning the remote Git repository
23:39:35 Cloning repository git://review.gluster.org/glusterfs.git
23:39:35  > git init /home/jenkins/root/workspace/smoke # timeout=10
23:39:35 Fetching upstream changes from git://review.gluster.org/glusterfs.git
23:39:35  > git --version # timeout=10
23:39:35  > git fetch --tags --progress git://review.gluster.org/glusterfs.git +refs/heads/*:refs/remotes/origin/*
23:39:35 ERROR: Error cloning remote repo 'origin'
23:39:35 hudson.plugins.git.GitException: Command "git fetch --tags --progress git://review.gluster.org/glusterfs.git +refs/heads/*:refs/remotes/origin/*" returned status code 128:
23:39:35 stdout: 
23:39:35 stderr: fatal: read error: Connection reset by peer
23:39:35 
23:39:35 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2016)
23:39:35 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1735)
23:39:35 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:72)
23:39:35 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:420)
23:39:35 	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:629)
23:39:35 	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
23:39:35 	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
23:39:35 	at hudson.remoting.UserRequest.perform(UserRequest.java:212)
23:39:35 	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
23:39:35 	at hudson.remoting.Request$2.run(Request.java:369)
23:39:35 	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
23:39:35 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
23:39:35 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
23:39:35 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
23:39:35 	at java.lang.Thread.run(Thread.java:748)
23:39:35 	Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to builder24.int.rht.gluster.org

Comment 1 Nigel Babu 2018-08-22 04:55:30 UTC
The bit that's failing here is the git clone, so we need to check the following bits:

* Did the clone attempt get logged by git daemon.
* Did the clone attempt get interrupted by a network event between the node and the Gerrit server.
* Did the clone attempt get interrupted by something on the Gerrit server itself.

I'm taking this bug to look at the Gerrit-side of things, but if it's not Gerrit, I'm going to bounce this one over to you, Michael.

Comment 2 Yaniv Kaul 2018-08-22 05:58:43 UTC
(In reply to Nigel Babu from comment #1)
> The bit that's failing here is the git clone, so we need to check the
> following bits:
> 
> * Did the clone attempt get logged by git daemon.

Can we take the opportunity and look at shallow clone?

Comment 3 Yaniv Kaul 2018-08-22 06:05:11 UTC
BTW, it doesn't happen on a single host. For example:
https://build.gluster.org/job/smoke/43219/console 

23:39:01 	Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to builder23.int.rht.gluster.org
23:39:01 		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
23:39:01 		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)

Comment 4 M. Scherer 2018-08-22 08:33:56 UTC
So not wanting to say network is perfect, but this started since the upgrade to new gerrit, no ? Could it be some issue where gerrit drop if there is too much client or something like this ?

Comment 5 M. Scherer 2018-08-22 08:36:49 UTC
[root@gerrit-new logs]# grep 'reset by peer' error_log |wc -l
18


Seems it happen quite often :/

Comment 6 M. Scherer 2018-08-22 08:39:05 UTC
And it happen since a long time too, so unlikely to be the upgrade, if the error in the log is the same as the one reported.

Comment 7 Yaniv Kaul 2018-08-22 09:09:29 UTC
(In reply to M. Scherer from comment #4)
> So not wanting to say network is perfect, but this started since the upgrade
> to new gerrit, no ? Could it be some issue where gerrit drop if there is too
> much client or something like this ?

I wonder if it happens when I 'flood' it with multiple patches (all in the same topic).

Comment 8 M. Scherer 2018-08-22 09:11:35 UTC
yeah, I wanted to explore that road too by doing *cough* load testing of the git server on the staging env, but no git port (or I am not awake enough).

Comment 9 Yaniv Kaul 2018-08-22 09:19:08 UTC
(In reply to Yaniv Kaul from comment #7)
> (In reply to M. Scherer from comment #4)
> > So not wanting to say network is perfect, but this started since the upgrade
> > to new gerrit, no ? Could it be some issue where gerrit drop if there is too
> > much client or something like this ?
> 
> I wonder if it happens when I 'flood' it with multiple patches (all in the
> same topic).

Can we look at Gerrit logs?

Comment 10 M. Scherer 2018-08-22 09:23:26 UTC
I did and didn't found anything that did seemed relevant. I may have missed something however, and Nigel is also looking. That's a transient issue, so not easy to diagnose.

Comment 11 M. Scherer 2018-08-22 10:37:00 UTC
So, Nigel pointed out that git is done by xinetd, and the log show nothing except some ipv6 errors. While it might be related, i think it is not, especially since taht's only for the rackspace builder, not the internal one.

Comment 12 M. Scherer 2018-08-22 10:45:20 UTC
Mhhh:

août 21 20:39:36 gerrit-new.rht.gluster.org xinetd[16437]: FAIL: git per_source_limit from=::ffff:8.43.85.181

Suspect that it might be the cause. 8.43.85.181 is the firewall ip.

Several solution:
- have a way to use internal IP
- add a exception for that IP.

The 2nd is easier, the 1st is cleaner. I will start by the 2nd.

Comment 13 M. Scherer 2018-08-22 10:55:22 UTC
So I pushed a fix, should deploy (unless the wifi break in my train). 

So the good news is that we can claim that we did made so more productivity progress that we did hit the limit, so that's positive :p

Comment 14 Nigel Babu 2018-10-03 04:13:01 UTC
The deploy seems to have fixed it. Closing this bug.


Note You need to log in before you can comment on or make changes to this bug.