Bug 1627718

Summary: freebsd-smoke jobs failing with nospace left error
Product: [Community] GlusterFS Reporter: Amar Tumballi <atumball>
Component: project-infrastructureAssignee: Nigel Babu <nigelb>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-infra, mscherer, nigelb, srakonde
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-05 06:19:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amar Tumballi 2018-09-11 10:16:14 UTC
Description of problem:

Freebsd-smoke jobs are failing with no-space-left error, ref: 
https://build.gluster.org/job/freebsd-smoke/30671/console 

Version-Release number of selected component (if applicable):
master

How reproducible:
100% (2/2)

Steps to Reproduce:
1. submit a patch, and wait for the smoke job to trigger
2.
3.

Actual results:
job fails

Expected results:
job should pass

Additional info:
https://build.gluster.org/job/freebsd-smoke/30671/console

Comment 1 Nigel Babu 2018-09-11 10:58:49 UTC
*** Bug 1627719 has been marked as a duplicate of this bug. ***

Comment 2 Nigel Babu 2018-09-11 11:01:15 UTC
This is now fixed.

Root cause is a postfix email that's kept looping. misc, can we stop running a postfix server on build servers? I don't think they are needed and is most likely going to cause more problems than they solve.

Comment 3 M. Scherer 2018-09-11 11:19:22 UTC
I rather keep postfix running, so it should alert if a cronjob fail in the future. However, I do not think that freebsd builder is setup for that.

So the issue is that /var was full because logs kept growing ?

I see that this loop have been going since a few days and can't see exactly what happen, and since the mail have been removed from the queue (from what I see, /var/spool/clientmqueue was cleaned), I can't find much what is going on :/

Comment 4 M. Scherer 2018-09-11 11:24:01 UTC
Ok so I stopped sendmail on freebsd, and I have enough in the mailqeue to see what is going on.

Comment 5 M. Scherer 2018-09-11 11:26:33 UTC
Seems to be a cronjob for saving entropy:

Subject: Cron <operator@freebsd0> /usr/libexec/save-entropy

And there is a error message:

Deferred: Operation timed out with [127.0.0.1]

Not sure what is it about :/

Comment 6 M. Scherer 2018-09-11 11:43:08 UTC
So:

/var/db is owned by jenkins:jenkins, which is likely why various things do fail on the builder. 

Why and when this did happen, I do not know, but I stongly think we should remove jenkins sudo access if that caused the problem. I am gonna fix the permission and see what break.

Comment 7 M. Scherer 2018-09-11 11:44:12 UTC
Seems to date back to 26 july , around 14:22.

Comment 8 Nigel Babu 2018-09-17 07:18:30 UTC
Please don't remove sudo access across the board. We depend on that for other tests. However, once distributed regressions are running, sudo access should be safe to remove.