Bug 1620243 - Gerrit is non-responsive (503)
Summary: Gerrit is non-responsive (503)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-22 19:30 UTC by Yaniv Kaul
Modified: 2018-08-29 14:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-29 14:34:35 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Yaniv Kaul 2018-08-22 19:30:07 UTC
Description of problem:

I'm getting 503 Service Unavailable from it.


Might have been after I've uploaded a rather large patchset:
remote: New Changes:        
remote:   https://review.gluster.org/#/c/glusterfs/+/20894 glfs-fops.c, glfs.c: strncpy() -> sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20895 {cli-cmd-parser|cli-rpc-ops||cli-xml-output}.c: strncpy()->sprintf(), reduce ...        
remote:   https://review.gluster.org/#/c/glusterfs/+/20896 {mount-common|fusermount|mount_darwin|umountd}.c: strncpy()->sprintf(), ...        
remote:   https://review.gluster.org/#/c/glusterfs/+/20897 extras/geo-rep/gsync-sync-gfid.c: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20898 multiple files: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20899 multiple files: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20900 bit-rot xlator: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20901 changelog xlator: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20902 changetimerecoder xlator: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20903 xlators: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20904 NFS server (mount3.c, nfs-inodes.c): strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20905 multiple xlators: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20906 multiple xlators: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20907 multiple xlators (mgmt): strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20908 multiple xlators (storage/posix): strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20909 Various files: strncpy()->sprintf(), reduce strlen()'s        
remote: 
remote: Pushing to refs/publish/* is deprecated, use refs/for/* instead.        
To ssh://review.gluster.org/glusterfs
 * [new branch]          HEAD -> refs/publish/master/remove_strncpy2

Comment 1 Vijay Bellur 2018-08-22 20:57:09 UTC
Encountering the same problem:

Service Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Comment 2 M. Scherer 2018-08-22 21:36:25 UTC
Looking at it.

Comment 3 M. Scherer 2018-08-22 21:47:44 UTC
So Nigel did restart Gerrit, and this seems to be working now.

Comment 4 Yaniv Kaul 2018-08-23 06:23:17 UTC
(In reply to M. Scherer from comment #3)
> So Nigel did restart Gerrit, and this seems to be working now.

Any RCA? My commits are not there. Should I re-submit? Now?

Comment 5 Nigel Babu 2018-08-23 06:48:07 UTC
From gerrit's sshd_log:

[2018-08-22 19:07:52,728 +0000] 78847570 mykaul a/1001977 LOGIN FROM 127.0.0.1
[2018-08-22 19:08:07,718 +0000] 78847570 mykaul a/1001977 git-upload-pack./glusterfs 2ms 14372ms 0
[2018-08-22 19:08:08,088 +0000] 78847570 mykaul a/1001977 LOGOUT
[2018-08-22 19:08:15,459 +0000] 1873f98b mykaul a/1001977 LOGIN FROM 127.0.0.1
[2018-08-22 19:08:21,644 +0000] 1873f98b mykaul a/1001977 git-receive-pack./glusterfs 2ms 5568ms 0 git/2.17.1
[2018-08-22 19:08:21,845 +0000] 1873f98b mykaul a/1001977 LOGOUT

From /var/log/messages
Aug 22 19:09:05 gerrit-new kernel: git invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
.
.
.
.
Aug 22 19:09:07 gerrit-new kernel: Out of memory: Kill process 11061 (java) score 388 or sacrifice child
Aug 22 19:09:07 gerrit-new kernel: Killed process 11061 (java) total-vm:3814660kB, anon-rss:1503912kB, file-rss:0kB, shmem-rss:0kB


It looks like pushing so many patches in one go triggered Gerrit and git to consume large amounts of memory. This lead to Gerrit being OOM killed. It looks like we don't have an swap space on this box. I've just added 1G of swap to reduce a chance this happens next time.

Yaniv, can you try pushing again?

Comment 7 Nigel Babu 2018-08-23 07:12:33 UTC
Alright, closing this bug as fixed. Both the issue, and the problems that caused the issue. If we get OOM killed again, the next course of action is to increase the RAM available on this box. Currently we're at 4G, and we may just need 6 G.

Comment 8 Yaniv Kaul 2018-08-23 07:38:29 UTC
Note, Gerrit is very slow. Commands take a lot of time. The UI also seems sluggish.

Comment 9 M. Scherer 2018-08-23 07:47:26 UTC
If swap is added, can it be also added in ansible ?

Comment 10 Yaniv Kaul 2018-08-23 07:55:46 UTC
It died again.

Comment 11 Yaniv Kaul 2018-08-23 07:56:18 UTC
Do you really want to submit the above commits?
Type 'yes' to confirm, other to cancel: yes
remote: 
remote: Processing changes: (\)
remote: Processing changes: updated: 15 (|)
remote: Processing changes: updated: 15 (/)
remote: Processing changes: updated: 15 (-)
remote: Processing changes: updated: 15 (\)
remote: Processing changes: updated: 15 (|)
remote: Processing changes: updated: 15 (/)
remote: Processing changes: updated: 15 (-)
remote: Processing changes: updated: 15 (-)
remote: Processing changes: updated: 15, done            
remote: (W) 6b1e5f8: commit subject >50 characters; use shorter first paragraph        
remote: (W) 67bda53: commit subject >50 characters; use shorter first paragraph        
remote: (W) 6ab7621: commit subject >50 characters; use shorter first paragraph        
remote: (W) 3064b37: commit subject >50 characters; use shorter first paragraph        
remote: (W) f73257c: commit subject >50 characters; use shorter first paragraph        
remote: (W) 41d1f6a: commit subject >50 characters; use shorter first paragraph        
remote: (W) 2ed19f0: commit subject >50 characters; use shorter first paragraph        
remote: (W) 7125acf: commit subject >50 characters; use shorter first paragraph        
remote: (W) fe85ce3: commit subject >50 characters; use shorter first paragraph        
remote: (W) 2254eed: commit subject >50 characters; use shorter first paragraph        
remote: (W) d15ba41: commit subject >50 characters; use shorter first paragraph        
remote: 
remote: Updated Changes:        
remote:   https://review.gluster.org/#/c/glusterfs/+/20919 {cli-cmd-parser|cli-rpc-ops||cli-xml-output}.c: strncpy()->sprintf(), reduce ...        
remote:   https://review.gluster.org/#/c/glusterfs/+/20920 {mount-common|fusermount|mount_darwin|umountd}.c: strncpy()->sprintf(), ...        
remote:   https://review.gluster.org/#/c/glusterfs/+/20921 extras/geo-rep/gsync-sync-gfid.c: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20922 multiple files: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20923 multiple files: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20924 bit-rot xlator: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20925 changelog xlator: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20926 changetimerecoder xlator: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20927 xlators: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20928 NFS server (mount3.c, nfs-inodes.c): strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20929 multiple xlators: move from strlen() to sizeof()        
remote:   https://review.gluster.org/#/c/glusterfs/+/20930 multiple xlators: strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20931 multiple xlators (mgmt): strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20932 multiple xlators (storage/posix): strncpy()->sprintf(), reduce strlen()'s        
remote:   https://review.gluster.org/#/c/glusterfs/+/20933 Various files: strncpy()->sprintf(), reduce strlen()'s        
remote: 
remote: Pushing to refs/publish/* is deprecated, use refs/for/* instead.        
To ssh://review.gluster.org/glusterfs
 * [new branch]          HEAD -> refs/publish/master/remove_strncpy2

Comment 12 Nigel Babu 2018-08-23 08:06:47 UTC
Alright, so 1 GB of swap isn't enough. Michael, can you give the VM 4 more GB of RAM? Please add another 10 Gig of disk space as well so we can have a larger swap partition.

For each patch, there's a git process started, there's at least one email sent out, and Gerrit triggers 5 to 6 Jenkins jobs for smoke. When this is done all at once for 10+ patches, this consumes quite a bit of memory. I remember we used to give Gerrit a lot of RAM and consciously cut it down since it was using it all the time.

Comment 13 M. Scherer 2018-08-23 08:10:13 UTC
It would need a reboot for that.

Comment 14 M. Scherer 2018-08-23 08:22:53 UTC
Also, I can't increase the root partition for some reason, and I need to figure why, cause lvm do say there is enough space. 

I did changed the configuration to have max 8G of ram, so a reboot (not just reboot from inside the system, reboot from outside, so destroy/start of the VM in virsh) is needed.

Comment 16 Nigel Babu 2018-08-29 14:34:35 UTC
After the increase in RAM, this seems to work way better. Going to close the bug.


Note You need to log in before you can comment on or make changes to this bug.