Description of problem: I have the sysroot partition is full and it's mounted on a disk partition (no lvm). Filesystem Size Used Avail Use% Mounted on devtmpfs 7.6G 0 7.6G 0% /dev tmpfs 7.7G 0 7.7G 0% /dev/shm tmpfs 7.7G 751M 6.9G 10% /run tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup /dev/nvme0n1p3 15G 15G 20K 100% /sysroot /dev/nvme0n1p2 976M 135M 774M 15% /boot tmpfs 1.6G 0 1.6G 0% /run/user/1000 [root@ip-10-0-1-182 libexec]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 16G 0 disk ├─nvme0n1p1 259:1 0 1M 0 part ├─nvme0n1p2 259:2 0 1G 0 part /boot └─nvme0n1p3 259:3 0 15G 0 part /sysroot I extended the ebs size (the setup is on aws) and I tried to `oc debug node/` and run `/usr/libexec/coreos-growpart` Results KO: ---------- oc debug node/ip-10-0-1-182.eu-west-3.compute.internal Starting pod/ip-10-0-1-182eu-west-3computeinternal-debug ... To use host binaries, run `chroot /host` Removing debug pod ... Error from server (BadRequest): container "container-00" in pod "ip-10-0-1-182eu-west-3computeinternal-debug" is not available I ssh the machine and tried to extend it, some issue: root@ip-10-0-1-182 libexec]# ./coreos-growpart /sysroot mkdir: cannot create directory '/tmp/growpart.19846': No space left on device FAILED: failed to make temp dir meta-data=/dev/nvme0n1p3 isize=512 agcount=4, agsize=982848 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=3931392, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 No documentation about to extend a partition on the docs.openshift.com it's on OCP 4.1.4
Thanks for the report. These lines ... > mkdir: cannot create directory '/tmp/growpart.19846': No space left on device > FAILED: failed to make temp dir seem to indicate that growpart was unsuccessful because there was no space left. Freeing up space first may end up allowing you to run growpart. Can you provide the deployment information (rpm-ostree status) as well as the size of container storage (sudo du -d 1 -ch /var/lib/containers/)? It's possible some space can be freed up in one of these locations. Checking podman specific storage (sudo podman system df), looking at the images (sudo podman images) and containers (sudo podman ps -a) may also be helpful in finding items which can be removed for space.
How did you end up w/ a 15G / ? Looking at the UPI on AWS documentation, they recommend resizing the image to 120G at deployment time. 15G is much too small for an OCP deployment. Resources: Master0: Type: AWS::EC2::Instance Properties: ImageId: !Ref RhcosAmi BlockDeviceMappings: - DeviceName: /dev/xvda Ebs: VolumeSize: "120" VolumeType: "gp2"
(In reply to Steve Milner from comment #1) > Thanks for the report. > > These lines ... > > > mkdir: cannot create directory '/tmp/growpart.19846': No space left on device > > FAILED: failed to make temp dir > > seem to indicate that growpart was unsuccessful because there was no space > left. Freeing up space first may end up allowing you to run growpart. > > Can you provide the deployment information (rpm-ostree status) as well as > the size of container storage (sudo du -d 1 -ch /var/lib/containers/)? It's > possible some space can be freed up in one of these locations. Checking > podman specific storage (sudo podman system df), looking at the images (sudo > podman images) and containers (sudo podman ps -a) may also be helpful in > finding items which can be removed for space. I lost the master now, can't ssh into nor debug. I will do it once recovered.
(In reply to Ben Breard from comment #2) > How did you end up w/ a 15G / ? > > Looking at the UPI on AWS documentation, they recommend resizing the image > to 120G at deployment time. 15G is much too small for an OCP deployment. > > Resources: > Master0: > Type: AWS::EC2::Instance > Properties: > ImageId: !Ref RhcosAmi > BlockDeviceMappings: > - DeviceName: /dev/xvda > Ebs: > VolumeSize: "120" > VolumeType: "gp2" After checking there's some missing in the master configuration.
Sorry, I see the machine configuration we already have the 120GB configuration: apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: selfLink: >- /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/silver-56dsw-master-0 resourceVersion: '27146009' name: silver-56dsw-master-0 uid: 41613ac5-a80f-11e9-bd33-0a5dc4646e7c creationTimestamp: '2019-07-16T21:18:28Z' generation: 1 namespace: openshift-machine-api finalizers: - machine.machine.openshift.io labels: machine.openshift.io/cluster-api-cluster: silver-56dsw machine.openshift.io/cluster-api-machine-role: master machine.openshift.io/cluster-api-machine-type: master spec: metadata: creationTimestamp: null providerSpec: value: userDataSecret: name: master-user-data placement: availabilityZone: eu-west-3a region: eu-west-3 credentialsSecret: name: aws-cloud-credentials instanceType: m5.xlarge metadata: creationTimestamp: null publicIp: null blockDevices: - ebs: iops: 0 volumeSize: 120 volumeType: gp2 securityGroups: - filters: - name: 'tag:Name' values: - silver-56dsw-master-sg kind: AWSMachineProviderConfig loadBalancers: - name: silver-56dsw-ext type: network - name: silver-56dsw-int type: network tags: - name: kubernetes.io/cluster/silver-56dsw value: owned - name: auto_shut_bool value: 'True' deviceIndex: 0 ami: id: ami-064c1a19b5600d4bb subnet: filters: - name: 'tag:Name' values: - silver-56dsw-private-eu-west-3a apiVersion: awsproviderconfig.openshift.io/v1beta1 iamInstanceProfile: id: silver-56dsw-master-profile status: lastUpdated: '2019-07-16T21:18:49Z' providerStatus: apiVersion: awsproviderconfig.openshift.io/v1beta1 conditions: - lastProbeTime: '2019-07-16T21:18:49Z' lastTransitionTime: '2019-07-16T21:18:49Z' message: >- error launching instance: error getting subnet IDs: no subnet IDs were found, reason: MachineCreationFailed status: 'True' type: MachineCreation kind: AWSMachineProviderStatus
Based on > [root@ip-10-0-1-182 libexec]# lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > nvme0n1 259:0 0 16G 0 disk > ├─nvme0n1p1 259:1 0 1M 0 part > ├─nvme0n1p2 259:2 0 1G 0 part /boot > └─nvme0n1p3 259:3 0 15G 0 part /sysroot it doesn't look like there is a 120G device mounted in for image storage and the system itself, as noted, is 16G in total (15G for /) which, if also used for image storage, could run out of space quickly.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days