Bug 1627718

Summary:	freebsd-smoke jobs failing with nospace left error
Product:	[Community] GlusterFS	Reporter:	Amar Tumballi <atumball>
Component:	project-infrastructure	Assignee:	Nigel Babu <nigelb>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, gluster-infra, mscherer, nigelb, srakonde
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-05 06:19:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Amar Tumballi 2018-09-11 10:16:14 UTC

Description of problem:

Freebsd-smoke jobs are failing with no-space-left error, ref: 
https://build.gluster.org/job/freebsd-smoke/30671/console 

Version-Release number of selected component (if applicable):
master

How reproducible:
100% (2/2)

Steps to Reproduce:
1. submit a patch, and wait for the smoke job to trigger
2.
3.

Actual results:
job fails

Expected results:
job should pass

Additional info:
https://build.gluster.org/job/freebsd-smoke/30671/console

Comment 1 Nigel Babu 2018-09-11 10:58:49 UTC

*** Bug 1627719 has been marked as a duplicate of this bug. ***

Comment 2 Nigel Babu 2018-09-11 11:01:15 UTC

This is now fixed.

Root cause is a postfix email that's kept looping. misc, can we stop running a postfix server on build servers? I don't think they are needed and is most likely going to cause more problems than they solve.

Comment 3 M. Scherer 2018-09-11 11:19:22 UTC

I rather keep postfix running, so it should alert if a cronjob fail in the future. However, I do not think that freebsd builder is setup for that.

So the issue is that /var was full because logs kept growing ?

I see that this loop have been going since a few days and can't see exactly what happen, and since the mail have been removed from the queue (from what I see, /var/spool/clientmqueue was cleaned), I can't find much what is going on :/

Comment 4 M. Scherer 2018-09-11 11:24:01 UTC

Ok so I stopped sendmail on freebsd, and I have enough in the mailqeue to see what is going on.

Comment 5 M. Scherer 2018-09-11 11:26:33 UTC

Seems to be a cronjob for saving entropy:

Subject: Cron <operator@freebsd0> /usr/libexec/save-entropy

And there is a error message:

Deferred: Operation timed out with [127.0.0.1]

Not sure what is it about :/

Comment 6 M. Scherer 2018-09-11 11:43:08 UTC

So:

/var/db is owned by jenkins:jenkins, which is likely why various things do fail on the builder. 

Why and when this did happen, I do not know, but I stongly think we should remove jenkins sudo access if that caused the problem. I am gonna fix the permission and see what break.

Comment 7 M. Scherer 2018-09-11 11:44:12 UTC

Seems to date back to 26 july , around 14:22.

Comment 8 Nigel Babu 2018-09-17 07:18:30 UTC

Please don't remove sudo access across the board. We depend on that for other tests. However, once distributed regressions are running, sudo access should be safe to remove.