Description of problem: atomic-openshift-node service fails to (re)start on infra nodes Version-Release number of selected component (if applicable): 3.9.14 How reproducible: Every Steps to Reproduce: 1. Install openshift 2. restart atomic-openshift-node on type=node nodes 3. Actual results: Fails with this error: -- Unit run-94204.scope has begun starting up. Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal atomic-openshift-node[94192]: I0409 22:14:03.911813 94192 mount_linux.go:208] Detected OS with systemd Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal atomic-openshift-node[94192]: W0409 22:14:03.911981 94192 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal atomic-openshift-node[94192]: I0409 22:14:03.914041 94192 node.go:294] Starting openshift-sdn network plugin Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal atomic-openshift-node[94192]: F0409 22:14:03.920305 94192 node.go:108] Could not set up local quota, /var/lib/origin/openshift.local.volumes is not on a filesystem mounted with the grpquota option Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal dnsmasq[3290]: setting upstream servers from DBus Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal dnsmasq[3290]: using nameserver 172.31.0.2#53 Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal systemd[1]: Failed to start OpenShift Node. -- Subject: Unit atomic-openshift-node.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit atomic-openshift-node.service has failed. -- -- The result is failed. Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal systemd[1]: Unit atomic-openshift-node.service entered failed state. Apr 09 22:14:03 ip-172-31-73-199.us-east-2.compute.internal systemd[1]: atomic-openshift-node.service failed. NB: "Could not set up local quota" When did this become necessary? It wasn't an issue on <=3.9.7 Expected results: Succeeds! Additional info: The node-config.yaml file:
Adding sdodson since openshift-ansbible should ensure that the grpquota mount option is set in the AMI during provisioning. The GCP provisioner already has this set correctly in roles/openshift_gcp/tasks/configure_gcp_base_image.yml
Need to if /var is a mount, if so set gquota on it. else, set gquota on / They're doing this for azure too, I'm not sure if we'll have a generic solution for all providers as the filesystem mounts and types may vary. https://github.com/openshift/openshift-ansible/pull/7783/files#diff-076e901ea50a9e0fc07272be1e6d035dR37
https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_gcp/tasks/configure_gcp_base_image.yml#L7
We don't specify any publicly available AMI or any publicly-available steps for building the base AMI. The installer doesn't doesn't have any insight into what is being mounted where. If these options are required at install time, they should be built into the AMI, which I believe we discussed this before. GCE's steps are specific to an image which is part of the CI job. Those steps might not be generally useful to people using other images, or at the least their images would have to conform to what the installer is doing. In order to fix this, we need the steps to create the AMI in question published so others can consume our build process, or we need a public AMI.
# # /etc/fstab # Created by anaconda on Thu Nov 16 21:00:13 2017 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/rootvg-rootvol / xfs defaults 0 0 UUID=a8e4daf2-900b-4719-a468-ea60b590a38c /boot ext4 defaults 1 2 /dev/mapper/rootvg-var /var xfs defaults 0 0 Adding a 'grpquota' to the 'defaults' does the right thing and allows atomic-openshift-node to start successfully.
I've just deployed a 3.9.25 cluster and this problem is resolved. atomic-openshift-node restarts fine and the 'grpquota' option on xfs formated /var partitions is NOT required: /dev/mapper/rootvg-var on /var type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/mapper/rootvg-var on /var/lib/docker/containers type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/mapper/rootvg-var on /var/lib/docker/devicemapper type xfs (rw,relatime,seclabel,attr2,inode64,noquota) # systemctl status atomic-openshift-node ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d └─openshift-sdn-ovs.conf Active: active (running) since Tue 2018-05-22 15:24:09 UTC; 2min 39s ago ...
This needs to be fixed in whatever mechanism is provisioning and mounting the filesystems. This is not handled by openshift-ansible.