RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1664876 - cloud-init Storage-Management Functionality Is Erasing Filesystems
Summary: cloud-init Storage-Management Functionality Is Erasing Filesystems
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: cloud-init
Version: 7.6
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Eduardo Otubo
QA Contact: Huijuan Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1687799
TreeView+ depends on / blocked
 
Reported: 2019-01-09 21:33 UTC by Thomas Jones
Modified: 2022-12-09 01:34 UTC (History)
12 users (show)

Fixed In Version: cloud-init-18.2-3.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1687799 (view as bug list)
Environment:
Last Closed: 2020-04-15 12:14:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Thomas Jones 2019-01-09 21:33:42 UTC
Description of problem:

We use cloud-init to act as provisioning-automation bootstrap for AWS-hosted resources. When we launch instances that include provisioning of secondary EBS volumes, cloud-init recently started to reformat the secondary EBS volumes with every system boot where, previously, cloud-init would only do so on the first launch of an instance from a given AMI. This is obviously a major problem for us


Version-Release number of selected component (if applicable):

cloud-init-18.2-1.el7_6.1

How reproducible: 100%


Steps to Reproduce:
1. Create an AMI with cloud-init-18.2-1.el7_6.1 pre-installed
2. Launch an instance from the AMI with a secondary EBS volume attached
3. Make use of a cloud-config block within the instance's UserData that contains storage-configuration directives similar to:

     bootcmd:
     - cloud-init-per instance mkfs-appvolume mkfs -t ext4 /dev/nvme1n1
     mounts:
     - [ /dev/nvme1n1, /opt/gitlab ]

4. Use further automation to install application into the storage formatted via cloud-init
5. Reboot system
6. Login to find that not only has application failed to start (and associated systemd unit is faulted) but that the filesystem controlled by cloud-init has been reformatted at the reboot

Note that if the system is rebooted an arbitrary number of times, the filesystem managed by cloud-init will be reformatted with each and every reboot.


Actual results:

Filesystem(s) controlled by cloud-init will have been reformatted at every reboot

Expected results:

Filesystem(s) should be formatted only at initial launch/boot (per the "instance" argument to "cloud-init-per", above

Additional info:

Found that coincident to this new behavior being exhibited, the previously missing ability to use cloud-init's "disk_setup" and "fs_setup" to format storage now works.

Comment 3 Vitaly Kuznetsov 2019-01-15 18:42:42 UTC
Hi Thomas,

I'm checking 'cloud-init-per' manually and it seems to work as expected:

ssh -i ...

[root@ip-10-1-152-147 ~]# cloud-init-per instance myecho echo 'RUN!' >> /var/lib/c-i-p
[root@ip-10-1-152-147 ~]# cat /var/lib/c-i-p
RUN!
[root@ip-10-1-152-147 ~]# cloud-init-per instance myecho echo 'RUN!' >> /var/lib/c-i-p
[root@ip-10-1-152-147 ~]# cat /var/lib/c-i-p
RUN!
(no second line, good)

[root@ip-10-1-152-147 ~]# reboot 

ssh -i ...

[root@ip-10-1-152-147 ~]# cloud-init-per instance myecho echo 'RUN!' >> /var/lib/c-i-p
[root@ip-10-1-152-147 ~]# cat /var/lib/c-i-p
RUN!
(again, no second line appeared - good).

Could you please do a few things to assist debugging:
1) Check what happened after the first and second runs, to be precise:

# ls -laZ /var/lib/cloud/instance/sem/bootper.*
# cat /var/lib/cloud/instance/sem/bootper.*

2) Try replacing 'instance' with 'once' to see if it makes a difference? (I know AWS instance ids don't change across reboots under normal circumstances but let's check regardless).

3) Try invoking 'cloud-init-per' from your userdata directly, just like an ordinary shell command and not as a cloud-config block?

4) Attach your complete UserData script so we can try to replicate your environment?

Thanks,

Comment 4 Frank Liang 2019-01-16 04:44:47 UTC
I am wondering whether this bug exists before.
Below are my steps and cannot reproduce this bug, the 2nd disk only was formatted once at the first launch.

1. Create ami with the latest released RHEL7.6.z or only update cloud-init-18.2-1.el7_6.1
2. Add the 2nd standard ebs volume at launching t3.large instance.
3. Added below to user data while launching instance:
#cloud-config
bootcmd:
- cloud-init-per instance mkfs-appvolume mkfs -t ext4 /dev/nvme1n1
mounts:
- [ /dev/nvme1n1, /opt/gitlab ]
4. skipped installed application and only write some data to /opt/gitlab.
[root@ip-10-116-1-61 gitlab]# lsblk
NAME        MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1     259:1    0  10G  0 disk 
├─nvme0n1p1 259:2    0   1M  0 part 
└─nvme0n1p2 259:3    0  10G  0 part /
nvme1n1     259:0    0   8G  0 disk /opt/gitlab

5. Reboot system
6. the data still exists in /opt/gitlab after reboot or stop,start.
[root@ip-10-116-1-61 gitlab]# ls
lost+found  test2
[root@ip-10-116-1-61 gitlab]# lsblk
NAME        MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1     259:1    0  10G  0 disk 
├─nvme0n1p1 259:2    0   1M  0 part 
└─nvme0n1p2 259:3    0  10G  0 part /
nvme1n1     259:0    0   8G  0 disk /opt/gitlab

Comment 5 Thomas Jones 2019-01-16 20:25:35 UTC
Just wanted to reply back to prevent an auto-close:

Had doctor's appointment (thus unable to test), today and am at a customer site all day, tomorrow (thus unable to test). I'll see what I can do about providing further datapoints, Friday.

Comment 6 Thomas Jones 2019-01-18 16:30:29 UTC
Ok, looks like there's something in how the updated version of cloud-init parses the bootcmd (or, more likely, the cloud-init-per) logic:

    bootcmd:
    - cloud-init-per instance mkfs-appvolume mkfs -t ext4 /dev/nvme1n1

Fails, but:

    bootcmd:
    - cloud-init-per instance appvolume mkfs -t ext4 /dev/nvme1n1

Works.

Comment 7 Vitaly Kuznetsov 2019-01-21 10:33:20 UTC
Thank you,

I see the issue now. 'cloud-init-per' is not compatible with 'migrator' module in cloud-init which renames all legacy semaphores in /var/lib/cloud/instance/sem/ replacing dashes with underscores and 'cloud-init-per' fails to find it after.

To avoid the issue you can also remove 'migrator' from 'cloud_init_modules:' in /etc/cloud/cloud.cfg

I plan submit a pull request upstream.

Comment 8 Vitaly Kuznetsov 2019-02-19 13:59:15 UTC
My pull request got merged upstream: https://code.launchpad.net/~vkuznets/cloud-init/+git/cloud-init/+merge/362024

Comment 9 Thomas Jones 2019-02-28 12:47:53 UTC
Awesome. Thanks for the update (and fix)!

Comment 12 Miroslav Rezanina 2019-03-05 17:17:06 UTC
Fix included in cloud-init-18.2-3.el7

Comment 18 Huijuan Zhao 2020-03-17 03:17:50 UTC
Hi Thomas,

As we did not reproduce the issue in comment 4 and the bug has not been updated for a long time, could I know if the issue still occur on your ENV?

@Vitaly, for the general issue you found in comment 7, I wonder if it is the same issue with the reporter's description? If they are the same issue, we should have reproduced it in comment 3 and 4. So if not the same issue, should we open a new bug to track the general issue in comment 7 ?

Thanks!

Comment 19 Vitaly Kuznetsov 2020-03-31 14:40:28 UTC
I'm failing to remember all the details but AFAIR you need to provide cloud-init userdata script
with dashes in command name:

bootcmd:
     - cloud-init-per instance my-test-command /run/my/test/command

AND

have 'migrator' module enabled in cloud-init configuration.

Pre-patch this was executed on every boot which is unexpected ('cloud-init-per instance' is
supposed to execute only once).

Maybe something has changed since.

Comment 25 Thomas Jones 2020-06-29 22:56:28 UTC
We've since updated all of our automation to avoid the offending tokens. So, haven't had the issue since making the modifications part of our standard processes. If no code-updates have occurred, presumably the problem is still present, we just aren't running into it any more.

Comment 26 rasrivas 2020-07-01 06:27:13 UTC
Hello Thomas,

I think you were not able to read my last comment as it was marked as private, I am from Insights rule development team and planning to write a customer-facing rule for this RHBZ, I tried to reproduce the above issue, but I was not able to reproduce it successfully, can you please help to review the below steps which I tried to reproduce the issue.

I have followed below steps:

Step 1: 

# history 
 1 ip ad
 2 lsblk 
 3 cat /etc/fstab 
 4 fdisk /dev/sdb
 5 fdisk /dev/xvdb 
 6 lsblk 
 7 mkdir /opt/gitlab
 8 history
 
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 10G 0 disk 
├─xvda1 202:1 0 1M 0 part 
└─xvda2 202:2 0 10G 0 part /
xvdb 202:16 0 8G 0 disk 
└─xvdb1 202:17 0 8G 0 part 

# ls -la /opt/gitlab
total 0
drwxr-xr-x. 2 root root 6 May 27 10:05 .
drwxr-xr-x. 3 root root 20 May 27 10:05 ..


Step 2: stoped the instance and added below parameters

bootcmd:
- cloud-init-per always mkfs-appvolume mkfs -t ext4 /dev/xvdb1
mounts:
- [ /dev/xvdb1, /opt/gitlab ]


Step 3: started the instance

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 10G 0 disk
├─xvda1 202:1 0 1M 0 part
└─xvda2 202:2 0 10G 0 part /
xvdb 202:16 0 8G 0 disk
└─xvdb1 202:17 0 8G 0 part
[ec2-user@ip-172-31-1-187 ~]$


Then I collected the logs which contain data related to "/dev/xvdb1".

Log related to /dev/xvdb1:


2020-05-27 10:14:07,759 - util.py[DEBUG]: Peeking at /dev/xvdb1 (max_bytes=512)

2020-05-27 10:14:07,760 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpMhL8OV'] with allowed return codes [0] (shell=False, capture=True)

2020-05-27 10:14:07,783 - util.py[DEBUG]: Failed mount of '/dev/xvdb1' as 'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpMhL8OV']

Stderr: mount: wrong fs type, bad option, bad superblock on /dev/xvdb1,
               missing codepage or helper program, or other error
------------------------------------------------------------------------------------------------------------------------------------------

2020-05-27 10:14:07,784 - DataSourceOVF.py[DEBUG]: /dev/xvdb1 not mountable as iso9660

2020-05-27 10:14:36,674 - util.py[DEBUG]: Peeking at /dev/xvdb1 (max_bytes=512)

2020-05-27 10:14:36,675 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpme664c'] with allowed return codes [0] (shell=False, capture=True)

2020-05-27 10:14:36,704 - util.py[DEBUG]: Failed mount of '/dev/xvdb1' as 'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpme664c']

Stderr: mount: wrong fs type, bad option, bad superblock on /dev/xvdb1,
               missing codepage or helper program, or other error

2020-05-27 10:14:36,704 - DataSourceOVF.py[DEBUG]: /dev/xvdb1 not mountable as iso9660


Thanks and Regards,

Rahul Srivastava

Comment 27 Thomas Jones 2020-07-01 12:15:04 UTC
I'm trying to dig through my commit-history for one of (the many) CloudFormation templates that had the offending logic (that was subsequently changed to work around the issue). Given the amount of time that's passed, I can't even remember which project I was working on that originally triggered this problem.


Note You need to log in before you can comment on or make changes to this bug.