Description of problem: Along with git pull, even review.gluster.org is not responding.
Restarted gerrit to fix the issue.
This looks like the problem: org.apache.sshd.common.channel.WindowClosedException: Already closed at org.apache.sshd.common.channel.Window.waitForSpace(Window.java:163) at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:116) at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:84) at java.io.OutputStream.write(OutputStream.java:75) at org.eclipse.jgit.transport.PacketLineOut.writePacket(PacketLineOut.java:119) at org.eclipse.jgit.transport.PacketLineOut.writeString(PacketLineOut.java:103) at org.eclipse.jgit.transport.RefAdvertiser$PacketLineOutRefAdvertiser.writeOne(RefAdvertiser.java:81) at org.eclipse.jgit.transport.RefAdvertiser.advertiseId(RefAdvertiser.java:294) at org.eclipse.jgit.transport.RefAdvertiser.advertiseAny(RefAdvertiser.java:258) at org.eclipse.jgit.transport.RefAdvertiser.send(RefAdvertiser.java:202) at org.eclipse.jgit.transport.UploadPack.sendAdvertisedRefs(UploadPack.java:901) at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:715) at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:666) at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:80) at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101) at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32) at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70) at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:437) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:377) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2016-07-12 01:01:15,928] [NioProcessor-1] WARN
The solution seems to be to edit the config to set a timeout for cache diff as explained here: https://bugs.chromium.org/p/gerrit/issues/detail?id=3940 I'm not doing anything right now. But if this happens again, I'll modify the configuration.
This has happened again. For some reason, connection formicary and the VM console is particularly slow. misc, do you know why that's happening?
Kaushal has pointed out a good deal of packet loss at RDU which is probably related to an ongoing outage in RDU.
So the network is back, RH IT switched to a different network upstream route, and as of 7h20 UTC, this seems to be fine. Can we confirm things are back to normal now, and close the ticket ? and for people asking how long it took to diagnose, it was seen quite fast, but switching provider involve BGP change, and it take a while to propagate around the internet, like DNS. IT is still dealing with time warner (ie, waiting on their support) to fix the primary route and network, but this should impact us in any way.
grmblb, so it seems that the current network setup wasn't switched for some reason, so since issue come and go, it was not seen right away. But so they are looking now.
So there is 1 single link for those servers, so no switch, and TWC is aware and working on it.
The immediate issue is now fixed. This warrants a discussion about how we can avoid this in the future.
Having HA on gerrit could be a solution. Not sure what this entails, or if this will really fix a major party of the issue. This also mean having a 2nd hosting provider.