Bug 1355931 - not able to git pull the source
Summary: not able to git pull the source
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nigel Babu
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-13 04:29 UTC by Atin Mukherjee
Modified: 2016-08-02 12:26 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-08-02 12:26:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Atin Mukherjee 2016-07-13 04:29:02 UTC
Description of problem:

Along with git pull, even review.gluster.org is not responding.

Comment 1 Nigel Babu 2016-07-13 04:51:59 UTC
Restarted gerrit to fix the issue.

Comment 2 Nigel Babu 2016-07-13 05:10:25 UTC
This looks like the problem:

org.apache.sshd.common.channel.WindowClosedException: Already closed
        at org.apache.sshd.common.channel.Window.waitForSpace(Window.java:163)
        at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:116)
        at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:84)
        at java.io.OutputStream.write(OutputStream.java:75)
        at org.eclipse.jgit.transport.PacketLineOut.writePacket(PacketLineOut.java:119)
        at org.eclipse.jgit.transport.PacketLineOut.writeString(PacketLineOut.java:103)
        at org.eclipse.jgit.transport.RefAdvertiser$PacketLineOutRefAdvertiser.writeOne(RefAdvertiser.java:81)
        at org.eclipse.jgit.transport.RefAdvertiser.advertiseId(RefAdvertiser.java:294)
        at org.eclipse.jgit.transport.RefAdvertiser.advertiseAny(RefAdvertiser.java:258)
        at org.eclipse.jgit.transport.RefAdvertiser.send(RefAdvertiser.java:202)
        at org.eclipse.jgit.transport.UploadPack.sendAdvertisedRefs(UploadPack.java:901)
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:715)
        at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:666)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:80)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:437)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:377)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2016-07-12 01:01:15,928] [NioProcessor-1] WARN

Comment 3 Nigel Babu 2016-07-13 05:23:49 UTC
The solution seems to be to edit the config to set a timeout for cache diff as explained here: https://bugs.chromium.org/p/gerrit/issues/detail?id=3940

I'm not doing anything right now. But if this happens again, I'll modify the configuration.

Comment 4 Nigel Babu 2016-07-13 06:27:06 UTC
This has happened again. For some reason, connection formicary and the VM console is particularly slow. misc, do you know why that's happening?

Comment 5 Nigel Babu 2016-07-13 07:02:18 UTC
Kaushal has pointed out a good deal of packet loss at RDU which is probably related to an ongoing outage in RDU.

Comment 6 M. Scherer 2016-07-13 07:58:35 UTC
So the network is back, RH IT switched to a different network upstream route, and as of 7h20 UTC, this seems to be fine. Can we confirm things are back to normal now, and close the ticket ?

and for people asking how long it took to diagnose, it was seen quite fast, but switching provider involve BGP change, and it take a while to propagate around the internet, like DNS. IT is still dealing with time warner (ie, waiting on their support) to fix the primary route and network, but this should impact us in any way.

Comment 7 M. Scherer 2016-07-13 10:30:47 UTC
grmblb, so it seems that the current network setup wasn't switched for some reason, so since issue come and go, it was not seen right away. But so they are looking now.

Comment 8 M. Scherer 2016-07-13 10:45:14 UTC
So there is 1 single link for those servers, so no switch, and TWC is aware and working on it.

Comment 9 Nigel Babu 2016-07-15 03:02:42 UTC
The immediate issue is now fixed. This warrants a discussion about how we can avoid this in the future.

Comment 10 M. Scherer 2016-07-15 08:25:56 UTC
Having HA on gerrit could be a solution. Not sure what this entails, or if this will really fix a major party of the issue. This also mean having a 2nd hosting provider.


Note You need to log in before you can comment on or make changes to this bug.