Bug 1656927
Summary: | PV(NFS) stop working after upgrade to 3.10.72 in atomic host | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nicolas Nosenzo <nnosenzo> |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Wenqi He <wehe> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 3.10.0 | CC: | andcosta, aos-bugs, aos-storage-staff, david.schweikert, jokerman, jsafrane, lxia, mmccomas, sreber, thomas.schilling, tsmetana |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | 3.10.z | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-01-11 10:14:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
>> Merged into Origin 3.9, it will be part of the next OSE 3.9.z > I think you meant *3.10*, just to avoid any confusion. Sorry, I indeed meant 3.10.z. > What we need to know is on whether the issue affects only OpenShift Container Platform 3.10 or 3.11 Only 3.10 is affected, 3.11 should be fine. > if a fix can be pushed to become available within the next OpenShift Container Platform 3.10 Errata. Yes, as noted above, the patch has been merged into 3.10.z The same issue actually also affects our 3.9 installation. Upgrading from 3.9.43 -> 3.9.57 broke it. Is there a chance this could be backported to the 3.9 branch (+ new patch release)? > Is there a chance this could be backported to the 3.9 branch (+ new patch release)? Tracked as https://bugzilla.redhat.com/show_bug.cgi?id=1663260 Tested on below version openshift v3.10.97 kubernetes v1.10.0+b81c8f8 # uname -a Linux ip-172-18-0-253.ec2.internal 3.10.0-862.11.6.el7.x86_64 #1 SMP Fri Aug 10 16:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux Atomic Host release 7.5 Pod using NFS volume is running. # oc get pv -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: 2019-01-08T08:28:57Z finalizers: - kubernetes.io/pv-protection name: nfs-y49ir namespace: "" resourceVersion: "40305" selfLink: /api/v1/persistentvolumes/nfs-y49ir uid: 7117a724-131f-11e9-9a81-0e18e55051b0 spec: accessModes: - ReadWriteMany capacity: storage: 5Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: nfsc namespace: y49ir resourceVersion: "40303" uid: 732b270c-131f-11e9-9a81-0e18e55051b0 nfs: path: / server: 172.30.227.52 persistentVolumeReclaimPolicy: Retain status: phase: Bound kind: List metadata: resourceVersion: "" selfLink: "" # oc get pods NAME READY STATUS RESTARTS AGE nfs 1/1 Running 0 1h This bug is fixed in Errata RHBA-2019:0026 It this bugfix really released? - RHBA-2019:0026 doesn't mention this bug - The latest released version seems to be 3.10.89 and the test above was done with 3.10.97 Yes, .89 is the right release * Mon Dec 17 2018 AOS Automation Release Team <aos-team-art> 3.10.89-1 - UPSTREAM: 62304: Remove isNotDir error check (jsafrane) Do you still experience any issues in this area? |
Created attachment 1512194 [details] node journal logs Description of problem: After cluster upgrade to 3.10.72 (from 3.10.66), all the NFS backed PVs fail to be mounted from containers, even though the NFS export can be mounted from nodes without issues. Error: Dec 06 11:19:54 node5.example.net atomic-openshift-node[21705]: I1206 11:19:54.606296 21717 reconciler.go:252] operationExecutor.MountVolume started for volume "example-pv" (UniqueName: "kubernetes.io/nfs/52f4d86d-f93f-11e8-94a6-0050569df267-example-pv") pod "mortimer-db-7d595fc8fb-nf2w5" (UID: "52f4d86d-f93f-11e8-94a6-0050569df267") Dec 06 11:19:54 node5.example.net atomic-openshift-node[21705]: I1206 11:19:54.608059 21717 nsenter.go:151] failed to resolve symbolic links on /var/lib/origin/openshift.local.volumes/pods/52f4d86d-f93f-11e8-94a6-0050569df267/volumes/kubernetes.io~nfs/example-pv: exit status 1 Dec 06 11:19:54 node5.example.net atomic-openshift-node[21705]: E1206 11:19:54.608142 21717 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/nfs/52f4d86d-f93f-11e8-94a6-0050569df267-example-pv\" (\"52f4d86d-f93f-11e8-94a6-0050569df267\")" failed. No retries permitted until 2018-12-06 11:21:56.608117627 +0100 CET m=+1159.964741283 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"example-pv\" (UniqueName: \"kubernetes.io/nfs/52f4d86d-f93f-11e8-94a6-0050569df267-example-pv\") pod \"mortimer-db-7d595fc8fb-nf2w5\" (UID: \"52f4d86d-f93f-11e8-94a6-0050569df267\") : exit status 1" Version-Release number of selected component (if applicable): Atomic Host: redhat-release-atomic-host-7.6-20180503.0.atomic.el7.1.x86_64 Docker: docker-1.13.1-84.git07f3374.el7.x86_64 / container-selinux-2.74-1 Ansible playbook used for the update: openshift-ansible-3.10.73-1.git.0.8b65cea.el7.noarch How reproducible: Can't reproduce in a RHEL hosted-cluster Additional info: - virt_use_nfs is enabled