1623508 – "archive: No space left on device" error on Jenkins

Bug 1623508 - "archive: No space left on device" error on Jenkins

Summary: "archive: No space left on device" error on Jenkins

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	project-infrastructure
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Nigel Babu
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-29 14:07 UTC by Yaniv Kaul
Modified:	2018-10-03 04:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-10-03 04:11:16 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Yaniv Kaul 2018-08-29 14:07:29 UTC

Description of problem:
https://build.gluster.org/job/strfmt_errors/13347/console :

16:43:52 Building remotely on builder17.int.rht.gluster.org (smoke7 rpm7 regression7) in workspace /home/jenkins/root/workspace/strfmt_errors
...
16:53:02 INFO: Cleaning up build root ('cleanup_on_success=True')
16:53:02 Start: clean chroot
16:53:08 Finish: clean chroot
16:53:08 Finish: run
16:53:08 Archiving artifacts
16:53:08 ERROR: Step ‘Archive the artifacts’ aborted due to exception: 
16:53:08 java.nio.file.FileSystemException: /var/lib/jenkins/jobs/strfmt_errors/builds/13347/archive: No space left on device
16:53:08 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
16:53:08 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
16:53:08 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
16:53:08 	at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
16:53:08 	at java.nio.file.Files.createDirectory(Files.java:674)
16:53:08 	at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
16:53:08 	at java.nio.file.Files.createDirectories(Files.java:767)
16:53:08 	at hudson.FilePath.mkdirs(FilePath.java:3102)
16:53:08 	at hudson.FilePath.readFromTar(FilePath.java:2458)
16:53:08 Caused: java.io.IOException: Failed to extract /home/jenkins/root/workspace/strfmt_errors/transfer of 5 files
16:53:08 	at hudson.FilePath.readFromTar(FilePath.java:2474)
16:53:08 	at hudson.FilePath.copyRecursiveTo(FilePath.java:2360)
16:53:08 	at jenkins.model.StandardArtifactManager.archive(StandardArtifactManager.java:61)
16:53:08 	at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:235)
16:53:08 	at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
16:53:08 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
16:53:08 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
16:53:08 	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
16:53:08 	at hudson.model.Build$BuildExecution.post2(Build.java:186)
16:53:08 	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635)
16:53:08 	at hudson.model.Run.execute(Run.java:1823)
16:53:08 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
16:53:08 	at hudson.model.ResourceController.execute(ResourceController.java:97)
16:53:08 	at hudson.model.Executor.run(Executor.java:429)
16:53:08 Finished: FAILURE

Comment 1 Nigel Babu 2018-08-29 14:16:00 UTC

Huh, I was under the impression we added more space to Jenkins we did a restart. Seemingly not. Going to correct this right away.

Comment 2 Nigel Babu 2018-08-29 14:33:16 UTC

Root cause of this is that we've run more jobs in the last 2 weeks than we normally do. This has blown away the space estimates that we had. For now I've deleted archives older than 2 weeks.

We need to do some clean up on that server so we have more free space. Once that's done we'll attach more space. Leaving the bug open to track the increased space.

Comment 3 M. Scherer 2018-08-29 15:05:03 UTC

We did add 40G, since the default was 150G and that's now 190G.

Comment 4 Yaniv Kaul 2018-08-29 15:26:37 UTC

(In reply to Nigel Babu from comment #2)
> Root cause of this is that we've run more jobs in the last 2 weeks than we
> normally do. This has blown away the space estimates that we had. For now
> I've deleted archives older than 2 weeks.
> 
> We need to do some clean up on that server so we have more free space. Once
> that's done we'll attach more space. Leaving the bug open to track the
> increased space.

I wonder if VDO can help...

I also think we need to look at shallow cloning:

[ykaul@ykaul tmp]$ time git clone ssh://mykaul.org/glusterfs
Cloning into 'glusterfs'...
remote: Counting objects: 2933, done
remote: Finding sources: 100% (71/71)
remote: Total 165164 (delta 0), reused 165119 (delta 0)
Receiving objects: 100% (165164/165164), 89.17 MiB | 2.43 MiB/s, done.
Resolving deltas: 100% (102537/102537), done.

real	0m52.042s
user	0m25.482s
sys	0m1.876s


[ykaul@ykaul tmp]$ du -ch glusterfs |grep total
124M	total

[ykaul@ykaul tmp]$ ls -lR glusterfs | wc -l
3764


[ykaul@ykaul tmp]$ time git clone --depth 1 ssh://mykaul.org/glusterfs
Cloning into 'glusterfs'...
remote: Counting objects: 2486, done
remote: Finding sources: 100% (2486/2486)
remote: Total 2486 (delta 86), reused 1325 (delta 86)
Receiving objects: 100% (2486/2486), 4.50 MiB | 1.56 MiB/s, done.
Resolving deltas: 100% (86/86), done.

real	0m10.380s
user	0m0.603s
sys	0m0.352s

[ykaul@ykaul tmp]$ du -ch glusterfs |grep total
35M	total

[ykaul@ykaul tmp]$ ls -lR glusterfs | wc -l
3764

Comment 5 M. Scherer 2018-08-29 15:35:22 UTC

VDO is still in beta, and would requires a new partition, and so space to copy. I doubt we can do miracle  here. It would also be trading space for CPU, not sure if that's worthwhile.

The git clone are done on the builders, I doubt that's gonna decrease the space on the jenkins server itself.

Comment 6 Yaniv Kaul 2018-08-29 15:58:51 UTC

(In reply to M. Scherer from comment #5)
> VDO is still in beta, and would requires a new partition, and so space to
> copy. I doubt we can do miracle  here. It would also be trading space for
> CPU, not sure if that's worthwhile.

VDO is NOT on beta. It was released as GA on RHEL 7.5 (I'm talking about VDO on our Jenkins hosts).

> 
> The git clone are done on the builders, I doubt that's gonna decrease the
> space on the jenkins server itself.

It's going to decrease time and space usage on the builders. That's important too. (In fact, I believe we should checkout to /dev/shm, ensure GCC temp dir is on /dev/shm and compile it there)

Comment 7 M. Scherer 2018-08-29 16:10:53 UTC

Oh, indeed, I did misread the title of blog post, who was "VDO in RHEL 7.5 beta" as beta for for VDO, but that was for RHEL 7.5. But we would still need to do some heavy partition changes and I rather avoid doing that right now. Longer term, yeah, compression could help a lot, not sure about dedup.

As for builders, this is irrelevant to that bug. If you wish, open a new one, but I am not sure we can do that, since the git clone is done by a jenkins plugin, and maybe it doesn't support that. 

But I also think that would be a minimal improvement for the time of regression, maybe reduce the time for a git clone could help having less process on gerrit side. 

But again, let's not mix bugs, or it is gonna become a mess.

Comment 8 Nigel Babu 2018-10-03 04:11:16 UTC

All of the original problems with space are fixed, so I'm closing this bug now.

Note You need to log in before you can comment on or make changes to this bug.