This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2227180 - [Azure] Encryption at host breaks mkfs.xfs
Summary: [Azure] Encryption at host breaks mkfs.xfs
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: xfsprogs
Version: 8.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-28 08:50 UTC by Klaas Demter
Modified: 2023-09-23 12:03 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-23 12:03:36 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
xfs_metadump /dev/sdb1 xfs_metadata.dump (19.47 KB, application/x-xz)
2023-07-28 14:28 UTC, Klaas Demter
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-7987 0 None Migrated None 2023-09-23 12:03:30 UTC
Red Hat Issue Tracker RHELPLAN-163753 0 None None None 2023-07-28 08:51:50 UTC

Description Klaas Demter 2023-07-28 08:50:40 UTC
Description of problem:
After enabling encryption at host the mkfs.xfs being called by cloud-init is no longer producing a filesystem that is mountable.


Version-Release number of selected component (if applicable):
xfsprogs-5.0.0-11.el8_8.x86_64
kernel 4.18.0-477.15.1.el8_8.x86_64


How reproducible:
After booting a current rhel 8.8 systems that was deallocated cloud init initializes the ephemeral disk. it calls mkfs.xfs on a partition it created according to my cloud-init userdata:

#cloud-config
disk_setup:
  ephemeral0:
    table_type: gpt
    layout: [66, [33,82]]
    overwrite: true
fs_setup:
  - device: ephemeral0.1
    filesystem: xfs
    overwrite: true
  - device: ephemeral0.2
    filesystem: swap
mounts:
  - ["ephemeral0.1", "/mnt/resource"]
  - ["ephemeral0.2", "none", "swap", "sw", "0", "0"]



Steps to Reproduce:
1. Create VM with that user-data in Azure
2. 
3.

Actual results:
Depending on the encryption at host setting it either works or not

Expected results:
Works with encryption at host enabled and disabled


Additional info:
I use the pay as you go image:
      "imageReference": {
        "id": "",
        "offer": "RHEL",
        "publisher": "RedHat",
        "sku": "8-lvm-gen2",
        "version": "latest"
fully updated to all released errata as of today

Attached Red Hat Support case that includes sos reports of both states: 03572027
Microsoft Support Case: 2307280050000960

Comment 1 Klaas Demter 2023-07-28 08:52:51 UTC
Actual mount error:
[root@hostname~]# mount /mnt/resource/
mount: /mnt/resource: mount(2) system call failed: Structure needs cleaning.

Comment 2 Klaas Demter 2023-07-28 08:57:54 UTC
Command called by cloud init during boot is:
2023-07-28 08:11:50,351 - subp.py[DEBUG]: Running command ['/usr/sbin/mkfs.xfs', '/dev/sdc1', '-f'] with allowed return codes [0] (shell=False, capture=True)
It succeeds in both cases (encryption at host enabled and disabled)

Comment 3 Klaas Demter 2023-07-28 11:47:34 UTC
I can get it to mount by forcing log zeroing

[root@hostname ~]# xfs_repair -v /dev/sdc1
Phase 1 - find and verify superblock...
        - block cache size set to 753200 entries
Phase 2 - using internal log
        - zero log...
totally zeroed log
zero_log: head block 0 tail block 0
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Fri Jul 28 11:36:25 2023

Phase           Start           End             Duration
Phase 1:        07/28 11:36:22  07/28 11:36:22
Phase 2:        07/28 11:36:22  07/28 11:36:25  3 seconds
Phase 3:        07/28 11:36:25  07/28 11:36:25
Phase 4:        07/28 11:36:25  07/28 11:36:25
Phase 5:        07/28 11:36:25  07/28 11:36:25
Phase 6:        07/28 11:36:25  07/28 11:36:25
Phase 7:        07/28 11:36:25  07/28 11:36:25

Total run time: 3 seconds
done
[root@hostname ~]# mount /mnt/resource/
mount: /mnt/resource: mount(2) system call failed: Structure needs cleaning.
[root@hostname ~]# xfs_repair -v -L /dev/sdc1
Phase 1 - find and verify superblock...
        - block cache size set to 753200 entries
Phase 2 - using internal log
        - zero log...
totally zeroed log
zero_log: head block 0 tail block 0
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Fri Jul 28 11:42:31 2023

Phase           Start           End             Duration
Phase 1:        07/28 11:42:28  07/28 11:42:28
Phase 2:        07/28 11:42:28  07/28 11:42:31  3 seconds
Phase 3:        07/28 11:42:31  07/28 11:42:31
Phase 4:        07/28 11:42:31  07/28 11:42:31
Phase 5:        07/28 11:42:31  07/28 11:42:31
Phase 6:        07/28 11:42:31  07/28 11:42:31
Phase 7:        07/28 11:42:31  07/28 11:42:31

Total run time: 3 seconds
done
[root@hostname ~]# mount /mnt/resource/
[root@hostname ~]#

Comment 4 Eric Sandeen 2023-07-28 14:13:47 UTC
Can you please provide the dmesg when mount fails, as well as an xfs_metadump of the problematic device created immediately after mkfs.xfs? (you can compress the metadump file so that it is hopefully small enough to attach.)

My first thought was that perhaps the encrypted device is not honoring FALLOC_FL_ZERO_RANGE that we use for efficient zeroing, but we use that same mechanism when zeroing the log from xfs_repair.

The "totally zeroed log" message from repair also indicates that the log was in fact already completely zeroed out before xfs_repair tried to do it again.

So we need to know what was actually wrong with the filesystem which caused mount to fail; dmesg and metadump will hopefully give us what we need.

Comment 7 Klaas Demter 2023-07-28 14:28:18 UTC
Created attachment 1980463 [details]
xfs_metadump /dev/sdb1 xfs_metadata.dump

Comment 8 Klaas Demter 2023-07-28 14:30:32 UTC
This has to be specific to cloud-init calling it or something happening during boot. If I run mkfs.xfs myself in the running system it works and I can mount it

dmesg of failed mount:
[Jul28 14:23] XFS (sdb1): Mounting V5 Filesystem
[  +0.003285] XFS (sdb1): totally zeroed log
[  +6.971766] XFS (sdb1): Internal error head_block >= tail_block || head_cycle != tail_cycle + 1 at line 1656>
[  +0.009456] CPU: 2 PID: 8569 Comm: mount Kdump: loaded Not tainted 4.18.0-477.15.1.el8_8.x86_64 #1
[  +0.000004] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release >
[  +0.000001] Call Trace:
[  +0.000004]  dump_stack+0x41/0x60
[  +0.000007]  xfs_corruption_error+0x8b/0x90 [xfs]
[  +0.000068]  ? xlog_clear_stale_blocks+0x177/0x1c0 [xfs]
[  +0.000062]  ? xlog_verify_head+0xd4/0x190 [xfs]
[  +0.000060]  xlog_clear_stale_blocks+0x1a1/0x1c0 [xfs]
[  +0.000061]  ? xlog_clear_stale_blocks+0x177/0x1c0 [xfs]
[  +0.000060]  xlog_find_tail+0x20f/0x350 [xfs]
[  +0.000060]  xlog_recover+0x2b/0x160 [xfs]
[  +0.000060]  xfs_log_mount+0x28c/0x2b0 [xfs]
[  +0.000060]  xfs_mountfs+0x45e/0x8e0 [xfs]
[  +0.000063]  xfs_fs_fill_super+0x36c/0x6a0 [xfs]
[  +0.000060]  ? xfs_mount_free+0x30/0x30 [xfs]
[  +0.000060]  get_tree_bdev+0x18f/0x270
[  +0.000006]  vfs_get_tree+0x25/0xc0
[  +0.000003]  do_mount+0x2e9/0x950
[  +0.000005]  ? memdup_user+0x4b/0x80
[  +0.000002]  ksys_mount+0xbe/0xe0
[  +0.000003]  __x64_sys_mount+0x21/0x30
[  +0.000003]  do_syscall_64+0x5b/0x1b0
[  +0.000004]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  +0.000003] RIP: 0033:0x7f7985dc435e
[  +0.000003] Code: 48 8b 0d 2d 4b 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f >
[  +0.000002] RSP: 002b:00007ffc841c8358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  +0.000003] RAX: ffffffffffffffda RBX: 00005645dd8de5d0 RCX: 00007f7985dc435e
[  +0.000001] RDX: 00005645dd8e0970 RSI: 00005645dd8e1510 RDI: 00005645dd8e1530
[  +0.000001] RBP: 00007f7986c35184 R08: 0000000000000000 R09: 00005645dd8d9016
[  +0.000002] R10: 00000000c0ed0000 R11: 0000000000000246 R12: 0000000000000000
[  +0.000001] R13: 00000000c0ed0000 R14: 00005645dd8e1530 R15: 00005645dd8e0970
[  +0.000002] XFS (sdb1): Corruption detected. Unmount and run xfs_repair
[  +0.003330] XFS (sdb1): failed to locate log tail
[  +0.000001] XFS (sdb1): log mount/recovery failed: error -117
[  +0.000149] XFS (sdb1): log mount failed

xfs_metadata dump  attached, I also attached a full dd of the complete filesystem to the case 03572027.

Comment 9 Klaas Demter 2023-07-28 14:39:46 UTC
the metadata dump is xz compressed.

Comment 10 Klaas Demter 2023-07-28 14:52:03 UTC
I just noticed I cut some lines in the dmesg output:
[Jul28 14:23] XFS (sdb1): Mounting V5 Filesystem
[  +0.003285] XFS (sdb1): totally zeroed log
[  +6.971766] XFS (sdb1): Internal error head_block >= tail_block || head_cycle != tail_cycle + 1 at line 1656 of file fs/xfs/xfs_log_recover.c.  Caller xlog_clear_stale_blocks+0x177/0x1c0 [xfs]
[  +0.009456] CPU: 2 PID: 8569 Comm: mount Kdump: loaded Not tainted 4.18.0-477.15.1.el8_8.x86_64 #1
[  +0.000004] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
[  +0.000001] Call Trace:
[  +0.000004]  dump_stack+0x41/0x60
[  +0.000007]  xfs_corruption_error+0x8b/0x90 [xfs]
[  +0.000068]  ? xlog_clear_stale_blocks+0x177/0x1c0 [xfs]
[  +0.000062]  ? xlog_verify_head+0xd4/0x190 [xfs]
[  +0.000060]  xlog_clear_stale_blocks+0x1a1/0x1c0 [xfs]
[  +0.000061]  ? xlog_clear_stale_blocks+0x177/0x1c0 [xfs]
[  +0.000060]  xlog_find_tail+0x20f/0x350 [xfs]
[  +0.000060]  xlog_recover+0x2b/0x160 [xfs]
[  +0.000060]  xfs_log_mount+0x28c/0x2b0 [xfs]
[  +0.000060]  xfs_mountfs+0x45e/0x8e0 [xfs]
[  +0.000063]  xfs_fs_fill_super+0x36c/0x6a0 [xfs]
[  +0.000060]  ? xfs_mount_free+0x30/0x30 [xfs]
[  +0.000060]  get_tree_bdev+0x18f/0x270
[  +0.000006]  vfs_get_tree+0x25/0xc0
[  +0.000003]  do_mount+0x2e9/0x950
[  +0.000005]  ? memdup_user+0x4b/0x80
[  +0.000002]  ksys_mount+0xbe/0xe0
[  +0.000003]  __x64_sys_mount+0x21/0x30
[  +0.000003]  do_syscall_64+0x5b/0x1b0
[  +0.000004]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  +0.000003] RIP: 0033:0x7f7985dc435e
[  +0.000003] Code: 48 8b 0d 2d 4b 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 4a 38 00 f7 d8 64 89 01 48
[  +0.000002] RSP: 002b:00007ffc841c8358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  +0.000003] RAX: ffffffffffffffda RBX: 00005645dd8de5d0 RCX: 00007f7985dc435e
[  +0.000001] RDX: 00005645dd8e0970 RSI: 00005645dd8e1510 RDI: 00005645dd8e1530
[  +0.000001] RBP: 00007f7986c35184 R08: 0000000000000000 R09: 00005645dd8d9016
[  +0.000002] R10: 00000000c0ed0000 R11: 0000000000000246 R12: 0000000000000000
[  +0.000001] R13: 00000000c0ed0000 R14: 00005645dd8e1530 R15: 00005645dd8e0970
[  +0.000002] XFS (sdb1): Corruption detected. Unmount and run xfs_repair
[  +0.003330] XFS (sdb1): failed to locate log tail
[  +0.000001] XFS (sdb1): log mount/recovery failed: error -117
[  +0.000149] XFS (sdb1): log mount failed

Comment 11 Klaas Demter 2023-07-28 14:56:04 UTC
and just to make sure I got this across properly, this is not a one off example it, it happens 100% of the time and I have that issue on hundreds of RHEL 8.8 VMs.

Comment 12 Klaas Demter 2023-07-28 15:01:10 UTC
well more precise 100% of the cloud init initiated mkfs.xfs commands during boot :D but you get what I mean

Comment 13 Eric Sandeen 2023-07-28 22:02:30 UTC
Klaas, I'm going to let support work with you from here on this, they scale better than I do for initial triage of customer problems.

My gut feeling is that something is misbehaving with the block device, not mkfs.xfs but I'll let them see if they can work it out.

Thanks,
-Eric

Comment 14 Eric Sandeen 2023-08-04 15:38:40 UTC
I will note that after restoring the provided metadump to a filesystem image with xfs_mdrestore, it mounts (via loopback) without problem for me on the same RHEL8 kernel version, likely indicating a problem with the block device or environment, not mkfs.xfs or the filesystem itself.

Comment 15 Klaas Demter 2023-08-30 04:26:16 UTC
Microsoft got back to me with some more information, seems not only red hat is affected:

Ubuntu 23: 6.2.0-1009-azure -->fails
Ubuntu 22: 5.15.0-1037-azure  --> fails
Ubuntu 20: 5.15.0-1042-azure  --->  ok
Ubuntu 18: 5.4.0-1109-azure ----> ok
Redhat 9.2: 5.14.0-284.18.1.el9_2.x86_64 -->fails
Redhat 8.8 : 4.18.0-477.10.1.el8_8.x86_64 --> fails
Redhat 7.7 SAP: 3.10.0-1062.52.2.el7.x86_64 ----> ok
Redhat 7.6 raw: 3.10.0-957.72.1.el7.x86_64 --> ok
SUSE 15 sp4: 5.14.21-150400.14.46-azure --> fails
SUSE 12 sp5: 4.12.14-16.139-azure --> ok

I am guessing this needs a very indepth analysis that includes the Microsoft Team that owns the encryption at host feature :) I have convinced Red Hat support to open a collaboration request with Microsoft via TSAnet. Lets see if we can finally get some results.

Comment 16 RHEL Program Management 2023-09-23 12:02:24 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 17 RHEL Program Management 2023-09-23 12:03:36 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.