Bug 1619838
Summary: | Jenkins connection issues failing tests | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Yaniv Kaul <ykaul> |
Component: | project-infrastructure | Assignee: | Nigel Babu <nigelb> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | bugs, gluster-infra, mscherer, nigelb |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-03 04:13:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yaniv Kaul
2018-08-21 21:17:28 UTC
The bit that's failing here is the git clone, so we need to check the following bits: * Did the clone attempt get logged by git daemon. * Did the clone attempt get interrupted by a network event between the node and the Gerrit server. * Did the clone attempt get interrupted by something on the Gerrit server itself. I'm taking this bug to look at the Gerrit-side of things, but if it's not Gerrit, I'm going to bounce this one over to you, Michael. (In reply to Nigel Babu from comment #1) > The bit that's failing here is the git clone, so we need to check the > following bits: > > * Did the clone attempt get logged by git daemon. Can we take the opportunity and look at shallow clone? BTW, it doesn't happen on a single host. For example: https://build.gluster.org/job/smoke/43219/console 23:39:01 Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to builder23.int.rht.gluster.org 23:39:01 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) 23:39:01 at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) So not wanting to say network is perfect, but this started since the upgrade to new gerrit, no ? Could it be some issue where gerrit drop if there is too much client or something like this ? [root@gerrit-new logs]# grep 'reset by peer' error_log |wc -l 18 Seems it happen quite often :/ And it happen since a long time too, so unlikely to be the upgrade, if the error in the log is the same as the one reported. (In reply to M. Scherer from comment #4) > So not wanting to say network is perfect, but this started since the upgrade > to new gerrit, no ? Could it be some issue where gerrit drop if there is too > much client or something like this ? I wonder if it happens when I 'flood' it with multiple patches (all in the same topic). yeah, I wanted to explore that road too by doing *cough* load testing of the git server on the staging env, but no git port (or I am not awake enough). (In reply to Yaniv Kaul from comment #7) > (In reply to M. Scherer from comment #4) > > So not wanting to say network is perfect, but this started since the upgrade > > to new gerrit, no ? Could it be some issue where gerrit drop if there is too > > much client or something like this ? > > I wonder if it happens when I 'flood' it with multiple patches (all in the > same topic). Can we look at Gerrit logs? I did and didn't found anything that did seemed relevant. I may have missed something however, and Nigel is also looking. That's a transient issue, so not easy to diagnose. So, Nigel pointed out that git is done by xinetd, and the log show nothing except some ipv6 errors. While it might be related, i think it is not, especially since taht's only for the rackspace builder, not the internal one. Mhhh: août 21 20:39:36 gerrit-new.rht.gluster.org xinetd[16437]: FAIL: git per_source_limit from=::ffff:8.43.85.181 Suspect that it might be the cause. 8.43.85.181 is the firewall ip. Several solution: - have a way to use internal IP - add a exception for that IP. The 2nd is easier, the 1st is cleaner. I will start by the 2nd. So I pushed a fix, should deploy (unless the wifi break in my train). So the good news is that we can claim that we did made so more productivity progress that we did hit the limit, so that's positive :p The deploy seems to have fixed it. Closing this bug. |