nbslave71.cloud.gluster.org nbslave72.cloud.gluster.org nbslave74.cloud.gluster.org nbslave75.cloud.gluster.org nbslave7g.cloud.gluster.org nbslave7i.cloud.gluster.org These machines are inactive or problematic. We should bring up new ones and replace the lot or even bring up just 3 to replace them.
The path of least resistance according to Emmanuel is to clone nbslave70. misc, what's the process to assign domain names after I bring up new machines? (That bit can probably be put directly on the docs.)
All these machines have issues due to lack of space. What are the folders that we normally clear when the build machines run out of space?
Action items to solve this: 1. Script to clear out old builds and archives. 2. Attach 10-15 GB of block storage to all the netbsd machines so this isn't an issue. The cost of time we spend debugging/fixing this is definitely greater than the cost of additional storage.
We have lots of space left on our existing machines. From Emmanuel on gluster-infra@ That is the case: diskabel xbd0 says # size offset fstype [fsize bsize cpg/sgs] a: 19922881 63 4.2BSD 2048 16384 0 # (Cyl. 0*- 9727) b: 4194304 19922944 swap # (Cyl. 9728 - 11775) c: 20971457 63 unused 0 0 # (Cyl. 0*- 10239) d: 83886080 0 unused 0 0 # (Cyl. 0 - 40959) e: 8388608 24117248 4.2BSD 2048 16384 0 # (Cyl. 11776 - 15871) NetBSD has some historic curiosity: c is the NetBSD partition in MBR, d is the whole disk. This means you have 51380224 sectors of 512 bytes left after partiton e: 24 GB. Run disklabel -e xbd0 and add a f line: f: 51380161 32505856 4.2BSD 2048 16384 0 While there it will not hurt to resize c (for the sake of clarity) c: 83886017 63 unused 0 0 And still while there, fdisk -iau xbd0 to ajust NetBSD partiton size in MBR. Then you can newfs /dev/rxbd0f add /dev/xbd0f in :etc/fstab mount /dev/xbd0f
I've tried this for nbslave71 and I'm going to try it on all machines out of rota for lack of disk space.
The following machines have had disks added: nbslave71.cloud.gluster.org nbslave72.cloud.gluster.org nbslave74.cloud.gluster.org nbslave75.cloud.gluster.org nbslave77.cloud.gluster.org nbslave7g.cloud.gluster.org I've made /archives and /build a symlink from the new hard disk. I'll leave this open overnight to monitor what's going on in these machines. Note: I've also changed the build timeout to 120 mins and turned off concurrent builds (which should have never been enabled).
Nithya has pointed out that we depend on the path being /build/install because we do this: df -h 2>&1 | sed 's#/build/install##' | grep -e "[[:space:]]/run/gluster/${V0}$" -e "[[:space:]]/var/run/gluster/${V0}$" This test is marked as bad test on master, so we haven't noticed mass failures yet. I've made the new disk mount as /build on nbslave77 to see if that solves the problems instead. If that goes well, I'll make that change across all machines.
The bug has been fixed in http://review.gluster.org/#/c/14991 I've gone and fixed up the following machines as well: nbslave79.cloud.gluster.org nbslave7c.cloud.gluster.org nbslave7h.cloud.gluster.org nbslave7i.cloud.gluster.org nbslave7j.cloud.gluster.org The only machine that's left is netbsd7.cloud.gluster.org.
netbsd7.cloud.gluster.org is now deleted.