Description of problem: See https://build.gluster.org/job/smoke/43221/console for example: 23:39:34 Triggered by Gerrit: https://review.gluster.org/20776 23:39:35 Building remotely on builder24.int.rht.gluster.org (smoke7 rpm7) in workspace /home/jenkins/root/workspace/smoke 23:39:35 Wiping out workspace first. 23:39:35 Cloning the remote Git repository 23:39:35 Cloning repository git://review.gluster.org/glusterfs.git 23:39:35 > git init /home/jenkins/root/workspace/smoke # timeout=10 23:39:35 Fetching upstream changes from git://review.gluster.org/glusterfs.git 23:39:35 > git --version # timeout=10 23:39:35 > git fetch --tags --progress git://review.gluster.org/glusterfs.git +refs/heads/*:refs/remotes/origin/* 23:39:35 ERROR: Error cloning remote repo 'origin' 23:39:35 hudson.plugins.git.GitException: Command "git fetch --tags --progress git://review.gluster.org/glusterfs.git +refs/heads/*:refs/remotes/origin/*" returned status code 128: 23:39:35 stdout: 23:39:35 stderr: fatal: read error: Connection reset by peer 23:39:35 23:39:35 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2016) 23:39:35 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1735) 23:39:35 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:72) 23:39:35 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:420) 23:39:35 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:629) 23:39:35 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153) 23:39:35 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146) 23:39:35 at hudson.remoting.UserRequest.perform(UserRequest.java:212) 23:39:35 at hudson.remoting.UserRequest.perform(UserRequest.java:54) 23:39:35 at hudson.remoting.Request$2.run(Request.java:369) 23:39:35 at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) 23:39:35 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 23:39:35 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 23:39:35 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 23:39:35 at java.lang.Thread.run(Thread.java:748) 23:39:35 Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to builder24.int.rht.gluster.org
The bit that's failing here is the git clone, so we need to check the following bits: * Did the clone attempt get logged by git daemon. * Did the clone attempt get interrupted by a network event between the node and the Gerrit server. * Did the clone attempt get interrupted by something on the Gerrit server itself. I'm taking this bug to look at the Gerrit-side of things, but if it's not Gerrit, I'm going to bounce this one over to you, Michael.
(In reply to Nigel Babu from comment #1) > The bit that's failing here is the git clone, so we need to check the > following bits: > > * Did the clone attempt get logged by git daemon. Can we take the opportunity and look at shallow clone?
BTW, it doesn't happen on a single host. For example: https://build.gluster.org/job/smoke/43219/console 23:39:01 Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to builder23.int.rht.gluster.org 23:39:01 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) 23:39:01 at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
So not wanting to say network is perfect, but this started since the upgrade to new gerrit, no ? Could it be some issue where gerrit drop if there is too much client or something like this ?
[root@gerrit-new logs]# grep 'reset by peer' error_log |wc -l 18 Seems it happen quite often :/
And it happen since a long time too, so unlikely to be the upgrade, if the error in the log is the same as the one reported.
(In reply to M. Scherer from comment #4) > So not wanting to say network is perfect, but this started since the upgrade > to new gerrit, no ? Could it be some issue where gerrit drop if there is too > much client or something like this ? I wonder if it happens when I 'flood' it with multiple patches (all in the same topic).
yeah, I wanted to explore that road too by doing *cough* load testing of the git server on the staging env, but no git port (or I am not awake enough).
(In reply to Yaniv Kaul from comment #7) > (In reply to M. Scherer from comment #4) > > So not wanting to say network is perfect, but this started since the upgrade > > to new gerrit, no ? Could it be some issue where gerrit drop if there is too > > much client or something like this ? > > I wonder if it happens when I 'flood' it with multiple patches (all in the > same topic). Can we look at Gerrit logs?
I did and didn't found anything that did seemed relevant. I may have missed something however, and Nigel is also looking. That's a transient issue, so not easy to diagnose.
So, Nigel pointed out that git is done by xinetd, and the log show nothing except some ipv6 errors. While it might be related, i think it is not, especially since taht's only for the rackspace builder, not the internal one.
Mhhh: août 21 20:39:36 gerrit-new.rht.gluster.org xinetd[16437]: FAIL: git per_source_limit from=::ffff:8.43.85.181 Suspect that it might be the cause. 8.43.85.181 is the firewall ip. Several solution: - have a way to use internal IP - add a exception for that IP. The 2nd is easier, the 1st is cleaner. I will start by the 2nd.
So I pushed a fix, should deploy (unless the wifi break in my train). So the good news is that we can claim that we did made so more productivity progress that we did hit the limit, so that's positive :p
The deploy seems to have fixed it. Closing this bug.