Created attachment 1713489 [details] aws cluster with relevant disks added to workers in us-east-2a and us-east-2b Description of problem: Not getting any symlinks to /dev/[a-z]+ in /dev/disk/by-id Version-Release number of selected component (if applicable): $ oc version Client Version: 4.5.3 Server Version: 4.5.6 Kubernetes Version: v1.18.3+002a51f How reproducible: 2/2 times on default and m5.xlarge nodes Steps to Reproduce: #### 1. attach an EBS volume to an worker node: $ aws ec2 create-volume --availability-zone us-east-2b --size 50 --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=test-50-b},{Key=owner,Value=rohan}]' { "AvailabilityZone": "us-east-2b", "CreateTime": "2020-09-02T17:19:22.000Z", "Encrypted": false, "Size": 50, "SnapshotId": "", "State": "creating", "VolumeId": "vol-02891676f4c9044e6", "Iops": 150, "Tags": [ { "Key": "Name", "Value": "test-50-b" }, { "Key": "owner", "Value": "rohan" } ], "VolumeType": "gp2" } #### 2. attach it $ aws ec2 attach-volume --volume-id vol-02891676f4c9044e6 --instance-id i-07f2272864f03d12a --device /dev/sdg { "AttachTime": "2020-09-02T17:19:29.127Z", "Device": "/dev/sdg", "InstanceId": "i-07f2272864f03d12a", "State": "attaching", "VolumeId": "vol-02891676f4c9044e6" } #### 3. get node shell and observe and observe oc debug node/$MY_NODE_NAME Starting pod/ip-10-0-161-76us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.161.76 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 120G 0 disk |-xvda1 202:1 0 384M 0 part /boot |-xvda2 202:2 0 127M 0 part /boot/efi |-xvda3 202:3 0 1M 0 part `-xvda4 202:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot xvdg 202:96 0 50G 0 disk #### <------ disk exists #### 4. nothing in by-id sh-4.4# ls -l /dev/disk/by-id total 0 lrwxrwxrwx. 1 root root 10 Sep 2 16:52 dm-name-coreos-luks-root-nocrypt -> ../../dm-0 find: File system loop detected; '/dev/fd/3/ostree/repo/extensions/rpmostree/pkgcache' is part of the same file system loop as '/dev/fd/3/ostree/repo'. ^C sh-4.4# find -L /dev/ -samefile /dev/xvdg /dev/xvdg /dev/block/202:96 find: File system loop detected; '/dev/fd/3/ostree/repo/extensions/rpmostree/pkgcache' is part of the same file system loop as '/dev/fd/3/ostree/repo'. ^C sh-4.4# find -L /dev/disk -samefile /dev/xvdg sh-4.4# Actual results: No symlink for /dev/something in /dev/disk/by-id Expected results: Should have symlink Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: Attaching my openshift-dev cluster's kubeconfig and leaving it alive in case someone wants to RCA Using the same aws commands, I never saw devices name xvda before.
Spinning down the AWS cluster in 1h(when I sleep) if no one comments. Reproduced consistently for me.
These symlinks are being created by udev. I think the rules are owned by the RHCOS team. Reassigning accordingly.
I booted a single RHCOS (45.82.202008010929-0) node in AWS using m5.xlarge and performed the same steps for creating a volume and attaching it: ``` ### Creating the disk $ aws --profile rh-dev ec2 create-volume --availability-zone us-east-1d --size 50 --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=test-50-b},{Key=owner,Value=miabbott}]' { "AvailabilityZone": "us-east-1d", "CreateTime": "2020-09-04T19:55:30.000Z", "Encrypted": false, "Size": 50, "SnapshotId": "", "State": "creating", "VolumeId": "vol-08c4f1d683d1906aa", "Iops": 150, "Tags": [ { "Key": "Name", "Value": "test-50-b" }, { "Key": "owner", "Value": "miabbott" } ], "VolumeType": "gp2" } ### Attach $ aws --profile rh-dev ec2 attach-volume --volume-id vol-08c4f1d683d1906aa --instance-id i-0c4483d23f8043f98 --device /dev/xvdg { "AttachTime": "2020-09-04T19:56:49.461Z", "Device": "/dev/xvdg", "InstanceId": "i-0c4483d23f8043f98", "State": "attaching", "VolumeId": "vol-08c4f1d683d1906aa" } ### Inspect node $ rpm-ostree status State: idle Deployments: ● ostree://f9d88d07921009f524c39773d0935a7d1642a02bd37e0d621696bf4f766a0540 Version: 45.82.202008010929-0 (2020-08-01T09:33:23Z) [core@ip-172-18-2-114 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 16G 0 disk ├─nvme0n1p4 259:1 0 15.5G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 15.5G 0 dm /sysroot ├─nvme0n1p1 259:5 0 384M 0 part /boot ├─nvme0n1p2 259:6 0 127M 0 part /boot/efi └─nvme0n1p3 259:7 0 1M 0 part [core@ip-172-18-2-114 ~]$ journalctl -f -- Logs begin at Fri 2020-09-04 19:50:22 UTC. -- Sep 04 19:53:50 ip-172-18-2-114 rpm-ostree[1816]: Reading config file '/etc/rpm-ostreed.conf' Sep 04 19:53:50 ip-172-18-2-114 rpm-ostree[1816]: In idle state; will auto-exit in 64 seconds Sep 04 19:53:50 ip-172-18-2-114 dbus-daemon[1584]: [system] Successfully activated service 'org.projectatomic.rpmostree1' Sep 04 19:53:50 ip-172-18-2-114 systemd[1]: Started rpm-ostree System Management Daemon. Sep 04 19:53:50 ip-172-18-2-114 rpm-ostree[1816]: Allowing active client :1.18 (uid 1000) Sep 04 19:53:50 ip-172-18-2-114 rpm-ostree[1816]: client(id:cli dbus:1.18 unit:session-1.scope uid:1000) added; new total=1 Sep 04 19:53:50 ip-172-18-2-114 rpm-ostree[1816]: client(id:cli dbus:1.18 unit:session-1.scope uid:1000) vanished; remaining=0 Sep 04 19:53:50 ip-172-18-2-114 rpm-ostree[1816]: In idle state; will auto-exit in 60 seconds Sep 04 19:53:50 ip-172-18-2-114 polkitd[1803]: Unregistered Authentication Agent for unix-process:1799:21153 (system bus name :1.15, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale C.UTF-8) (disconnected from bus) Sep 04 19:54:51 ip-172-18-2-114 rpm-ostree[1816]: In idle state; will auto-exit in 64 seconds Sep 04 19:56:49 ip-172-18-2-114 kernel: pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802 Sep 04 19:56:49 ip-172-18-2-114 kernel: pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff] Sep 04 19:56:49 ip-172-18-2-114 kernel: pci 0000:00:1f.0: BAR 0: assigned [mem 0xc0000000-0xc0003fff] Sep 04 19:56:49 ip-172-18-2-114 kernel: nvme nvme1: pci function 0000:00:1f.0 Sep 04 19:56:49 ip-172-18-2-114 kernel: nvme 0000:00:1f.0: enabling device (0000 -> 0002) Sep 04 19:56:49 ip-172-18-2-114 kernel: PCI Interrupt Link [LNKC] enabled at IRQ 10 Sep 04 19:56:49 ip-172-18-2-114 kernel: nvme nvme1: 2/0/0 default/read/poll queues ^C [core@ip-172-18-2-114 ~]$ dmesg | tail [ 22.557348] IPv6: ADDRCONF(NETDEV_UP): ens5: link is not ready [ 22.564505] IPv6: ADDRCONF(NETDEV_UP): ens5: link is not ready [ 22.569696] IPv6: ADDRCONF(NETDEV_CHANGE): ens5: link becomes ready [ 391.330290] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802 [ 391.332570] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff] [ 391.335588] pci 0000:00:1f.0: BAR 0: assigned [mem 0xc0000000-0xc0003fff] [ 391.338292] nvme nvme1: pci function 0000:00:1f.0 [ 391.340271] nvme 0000:00:1f.0: enabling device (0000 -> 0002) [ 391.344109] PCI Interrupt Link [LNKC] enabled at IRQ 10 [ 391.454416] nvme nvme1: 2/0/0 default/read/poll queues [core@ip-172-18-2-114 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 16G 0 disk ├─nvme0n1p4 259:1 0 15.5G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 15.5G 0 dm /sysroot ├─nvme0n1p1 259:5 0 384M 0 part /boot ├─nvme0n1p2 259:6 0 127M 0 part /boot/efi └─nvme0n1p3 259:7 0 1M 0 part nvme1n1 259:2 0 50G 0 disk [core@ip-172-18-2-114 ~]$ ls -l /dev/disk/by-id total 0 lrwxrwxrwx. 1 root root 10 Sep 4 19:50 dm-name-coreos-luks-root-nocrypt -> ../../dm-0 lrwxrwxrwx. 1 root root 13 Sep 4 19:56 nvme-Amazon_Elastic_Block_Store_vol08c4f1d683d1906aa -> ../../nvme1n1 lrwxrwxrwx. 1 root root 13 Sep 4 19:50 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc -> ../../nvme0n1 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part1 -> ../../nvme0n1p1 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part2 -> ../../nvme0n1p2 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part3 -> ../../nvme0n1p3 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part4 -> ../../nvme0n1p4 lrwxrwxrwx. 1 root root 13 Sep 4 19:56 nvme-nvme.1d0f-766f6c3038633466316436383364313930366161-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme1n1 lrwxrwxrwx. 1 root root 13 Sep 4 19:50 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme0n1 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1 -> ../../nvme0n1p1 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part2 -> ../../nvme0n1p2 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part3 -> ../../nvme0n1p3 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part4 -> ../../nvme0n1p4 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1 -> ../../nvme0n1p1 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part2 -> ../../nvme0n1p2 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part3 -> ../../nvme0n1p3 lrwxrwxrwx. 1 root root 15 Sep 4 19:50 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part4 -> ../../nvme0n1p4 ``` I disconnected the volume and pivoted the node to 4.5.6 (RHCOS 45.82.202008101249-0) and repeated the test: ``` [core@ip-172-18-2-114 ~]$ rpm-ostree status State: idle Deployments: ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1bb71228ee1e7ae0522b566a42c2454af309de470a512d54a729e9cd2aae4604 CustomOrigin: Managed by machine-config-operator Version: 45.82.202008101249-0 (2020-08-10T12:52:59Z) ostree://f9d88d07921009f524c39773d0935a7d1642a02bd37e0d621696bf4f766a0540 Version: 45.82.202008010929-0 (2020-08-01T09:33:23Z) [core@ip-172-18-2-114 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 16G 0 disk ├─nvme0n1p1 259:1 0 384M 0 part /boot ├─nvme0n1p2 259:2 0 127M 0 part /boot/efi ├─nvme0n1p3 259:3 0 1M 0 part └─nvme0n1p4 259:4 0 15.5G 0 part └─coreos-luks-root-nocrypt 253:0 0 15.5G 0 dm /sysroot [core@ip-172-18-2-114 ~]$ journalctl -f -- Logs begin at Fri 2020-09-04 19:50:22 UTC. -- Sep 04 20:22:32 ip-172-18-2-114 rpm-ostree[1457]: In idle state; will auto-exit in 63 seconds Sep 04 20:22:32 ip-172-18-2-114 dbus-daemon[1238]: [system] Successfully activated service 'org.projectatomic.rpmostree1' Sep 04 20:22:32 ip-172-18-2-114 systemd[1]: Started rpm-ostree System Management Daemon. Sep 04 20:22:32 ip-172-18-2-114 rpm-ostree[1457]: Allowing active client :1.18 (uid 1000) Sep 04 20:22:32 ip-172-18-2-114 rpm-ostree[1457]: client(id:cli dbus:1.18 unit:session-1.scope uid:1000) added; new total=1 Sep 04 20:22:32 ip-172-18-2-114 rpm-ostree[1457]: client(id:cli dbus:1.18 unit:session-1.scope uid:1000) vanished; remaining=0 Sep 04 20:22:32 ip-172-18-2-114 rpm-ostree[1457]: In idle state; will auto-exit in 63 seconds Sep 04 20:22:32 ip-172-18-2-114 polkitd[1444]: Unregistered Authentication Agent for unix-process:1440:20757 (system bus name :1.15, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale C.UTF-8) (disconnected from bus) Sep 04 20:22:38 ip-172-18-2-114 sshd[1462]: Received disconnect from 51.159.29.133 port 55088:11: Normal Shutdown, Thank you for playing [preauth] Sep 04 20:22:38 ip-172-18-2-114 sshd[1462]: Disconnected from authenticating user root 51.159.29.133 port 55088 [preauth] Sep 04 20:22:51 ip-172-18-2-114 sshd[1465]: Received disconnect from 51.159.29.133 port 34718:11: Normal Shutdown, Thank you for playing [preauth] Sep 04 20:22:51 ip-172-18-2-114 sshd[1465]: Disconnected from authenticating user root 51.159.29.133 port 34718 [preauth] Sep 04 20:22:56 ip-172-18-2-114 kernel: pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802 Sep 04 20:22:56 ip-172-18-2-114 kernel: pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff] Sep 04 20:22:56 ip-172-18-2-114 kernel: pci 0000:00:1f.0: BAR 0: assigned [mem 0xc0000000-0xc0003fff] Sep 04 20:22:56 ip-172-18-2-114 kernel: nvme nvme1: pci function 0000:00:1f.0 Sep 04 20:22:56 ip-172-18-2-114 kernel: nvme 0000:00:1f.0: enabling device (0000 -> 0002) Sep 04 20:22:56 ip-172-18-2-114 kernel: PCI Interrupt Link [LNKC] enabled at IRQ 10 Sep 04 20:22:57 ip-172-18-2-114 kernel: nvme nvme1: 2/0/0 default/read/poll queues ^C [core@ip-172-18-2-114 ~]$ dmesg | tail [ 10.161935] RPC: Registered rdma transport module. [ 10.167100] RPC: Registered rdma backchannel transport module. [ 10.474044] IPv6: ADDRCONF(NETDEV_UP): ens5: link is not ready [ 231.441283] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802 [ 231.443563] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff] [ 231.446684] pci 0000:00:1f.0: BAR 0: assigned [mem 0xc0000000-0xc0003fff] [ 231.449500] nvme nvme1: pci function 0000:00:1f.0 [ 231.451255] nvme 0000:00:1f.0: enabling device (0000 -> 0002) [ 231.455150] PCI Interrupt Link [LNKC] enabled at IRQ 10 [ 232.362089] nvme nvme1: 2/0/0 default/read/poll queues [core@ip-172-18-2-114 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 16G 0 disk ├─nvme0n1p1 259:1 0 384M 0 part /boot ├─nvme0n1p2 259:2 0 127M 0 part /boot/efi ├─nvme0n1p3 259:3 0 1M 0 part └─nvme0n1p4 259:4 0 15.5G 0 part └─coreos-luks-root-nocrypt 253:0 0 15.5G 0 dm /sysroot nvme1n1 259:5 0 50G 0 disk [core@ip-172-18-2-114 ~]$ ls -l /dev/disk/by-id/ total 0 lrwxrwxrwx. 1 root root 10 Sep 4 20:19 dm-name-coreos-luks-root-nocrypt -> ../../dm-0 lrwxrwxrwx. 1 root root 13 Sep 4 20:22 nvme-Amazon_Elastic_Block_Store_vol08c4f1d683d1906aa -> ../../nvme1n1 lrwxrwxrwx. 1 root root 13 Sep 4 20:19 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc -> ../../nvme0n1 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part1 -> ../../nvme0n1p1 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part2 -> ../../nvme0n1p2 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part3 -> ../../nvme0n1p3 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-Amazon_Elastic_Block_Store_vol0dc27fe9aab93e0bc-part4 -> ../../nvme0n1p4 lrwxrwxrwx. 1 root root 13 Sep 4 20:22 nvme-nvme.1d0f-766f6c3038633466316436383364313930366161-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme1n1 lrwxrwxrwx. 1 root root 13 Sep 4 20:19 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001 -> ../../nvme0n1 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1 -> ../../nvme0n1p1 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part2 -> ../../nvme0n1p2 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part3 -> ../../nvme0n1p3 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 nvme-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part4 -> ../../nvme0n1p4 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part1 -> ../../nvme0n1p1 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part2 -> ../../nvme0n1p2 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part3 -> ../../nvme0n1p3 lrwxrwxrwx. 1 root root 15 Sep 4 20:19 wwn-nvme.1d0f-766f6c3064633237666539616162393365306263-416d617a6f6e20456c617374696320426c6f636b2053746f7265-00000001-part4 -> ../../nvme0n1p4 ``` So I'm unable to reproduce what you have experienced in a quick set of tests. I'm not sure what is happening with your node; could you attach the full journal output of the node that is having issues?
Marking for an `UpcomingSprint` while we wait for additional info and triage.
Conservatively targeting 4.7 with low priority until more information is provided.
It did not reproduce when I tried again on another day :)
@Rohan thank you for responding. I will close this BZ. Please reopen if you encounter it again.