Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1355931

Summary: not able to git pull the source
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: project-infrastructureAssignee: Nigel Babu <nigelb>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-infra, mscherer, nigelb
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-02 12:26:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Atin Mukherjee 2016-07-13 04:29:02 UTC
Description of problem:

Along with git pull, even review.gluster.org is not responding.

Comment 1 Nigel Babu 2016-07-13 04:51:59 UTC
Restarted gerrit to fix the issue.

Comment 2 Nigel Babu 2016-07-13 05:10:25 UTC
This looks like the problem:

org.apache.sshd.common.channel.WindowClosedException: Already closed
        at org.apache.sshd.common.channel.Window.waitForSpace(Window.java:163)
        at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:116)
        at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:84)
        at java.io.OutputStream.write(OutputStream.java:75)
        at org.eclipse.jgit.transport.PacketLineOut.writePacket(PacketLineOut.java:119)
        at org.eclipse.jgit.transport.PacketLineOut.writeString(PacketLineOut.java:103)
        at org.eclipse.jgit.transport.RefAdvertiser$PacketLineOutRefAdvertiser.writeOne(RefAdvertiser.java:81)
        at org.eclipse.jgit.transport.RefAdvertiser.advertiseId(RefAdvertiser.java:294)
        at org.eclipse.jgit.transport.RefAdvertiser.advertiseAny(RefAdvertiser.java:258)
        at org.eclipse.jgit.transport.RefAdvertiser.send(RefAdvertiser.java:202)
        at org.eclipse.jgit.transport.UploadPack.sendAdvertisedRefs(UploadPack.java:901)
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:715)
        at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:666)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:80)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:437)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:377)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2016-07-12 01:01:15,928] [NioProcessor-1] WARN

Comment 3 Nigel Babu 2016-07-13 05:23:49 UTC
The solution seems to be to edit the config to set a timeout for cache diff as explained here: https://bugs.chromium.org/p/gerrit/issues/detail?id=3940

I'm not doing anything right now. But if this happens again, I'll modify the configuration.

Comment 4 Nigel Babu 2016-07-13 06:27:06 UTC
This has happened again. For some reason, connection formicary and the VM console is particularly slow. misc, do you know why that's happening?

Comment 5 Nigel Babu 2016-07-13 07:02:18 UTC
Kaushal has pointed out a good deal of packet loss at RDU which is probably related to an ongoing outage in RDU.

Comment 6 M. Scherer 2016-07-13 07:58:35 UTC
So the network is back, RH IT switched to a different network upstream route, and as of 7h20 UTC, this seems to be fine. Can we confirm things are back to normal now, and close the ticket ?

and for people asking how long it took to diagnose, it was seen quite fast, but switching provider involve BGP change, and it take a while to propagate around the internet, like DNS. IT is still dealing with time warner (ie, waiting on their support) to fix the primary route and network, but this should impact us in any way.

Comment 7 M. Scherer 2016-07-13 10:30:47 UTC
grmblb, so it seems that the current network setup wasn't switched for some reason, so since issue come and go, it was not seen right away. But so they are looking now.

Comment 8 M. Scherer 2016-07-13 10:45:14 UTC
So there is 1 single link for those servers, so no switch, and TWC is aware and working on it.

Comment 9 Nigel Babu 2016-07-15 03:02:42 UTC
The immediate issue is now fixed. This warrants a discussion about how we can avoid this in the future.

Comment 10 M. Scherer 2016-07-15 08:25:56 UTC
Having HA on gerrit could be a solution. Not sure what this entails, or if this will really fix a major party of the issue. This also mean having a 2nd hosting provider.