Bug 1627718 - freebsd-smoke jobs failing with nospace left error
Summary: freebsd-smoke jobs failing with nospace left error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nigel Babu
QA Contact:
URL:
Whiteboard:
: 1627719 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-11 10:16 UTC by Amar Tumballi
Modified: 2018-10-05 06:19 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-10-05 06:19:43 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Amar Tumballi 2018-09-11 10:16:14 UTC
Description of problem:

Freebsd-smoke jobs are failing with no-space-left error, ref: 
https://build.gluster.org/job/freebsd-smoke/30671/console 

Version-Release number of selected component (if applicable):
master

How reproducible:
100% (2/2)

Steps to Reproduce:
1. submit a patch, and wait for the smoke job to trigger
2.
3.

Actual results:
job fails

Expected results:
job should pass

Additional info:
https://build.gluster.org/job/freebsd-smoke/30671/console

Comment 1 Nigel Babu 2018-09-11 10:58:49 UTC
*** Bug 1627719 has been marked as a duplicate of this bug. ***

Comment 2 Nigel Babu 2018-09-11 11:01:15 UTC
This is now fixed.

Root cause is a postfix email that's kept looping. misc, can we stop running a postfix server on build servers? I don't think they are needed and is most likely going to cause more problems than they solve.

Comment 3 M. Scherer 2018-09-11 11:19:22 UTC
I rather keep postfix running, so it should alert if a cronjob fail in the future. However, I do not think that freebsd builder is setup for that.

So the issue is that /var was full because logs kept growing ?

I see that this loop have been going since a few days and can't see exactly what happen, and since the mail have been removed from the queue (from what I see, /var/spool/clientmqueue was cleaned), I can't find much what is going on :/

Comment 4 M. Scherer 2018-09-11 11:24:01 UTC
Ok so I stopped sendmail on freebsd, and I have enough in the mailqeue to see what is going on.

Comment 5 M. Scherer 2018-09-11 11:26:33 UTC
Seems to be a cronjob for saving entropy:

Subject: Cron <operator@freebsd0> /usr/libexec/save-entropy

And there is a error message:

Deferred: Operation timed out with [127.0.0.1]

Not sure what is it about :/

Comment 6 M. Scherer 2018-09-11 11:43:08 UTC
So:

/var/db is owned by jenkins:jenkins, which is likely why various things do fail on the builder. 

Why and when this did happen, I do not know, but I stongly think we should remove jenkins sudo access if that caused the problem. I am gonna fix the permission and see what break.

Comment 7 M. Scherer 2018-09-11 11:44:12 UTC
Seems to date back to 26 july , around 14:22.

Comment 8 Nigel Babu 2018-09-17 07:18:30 UTC
Please don't remove sudo access across the board. We depend on that for other tests. However, once distributed regressions are running, sudo access should be safe to remove.


Note You need to log in before you can comment on or make changes to this bug.