Bug 1664876
| Summary: | cloud-init Storage-Management Functionality Is Erasing Filesystems | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Thomas Jones <redhat> | |
| Component: | cloud-init | Assignee: | Eduardo Otubo <eterrell> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Huijuan Zhao <huzhao> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.6 | CC: | eterrell, huzhao, jgreguske, linl, loren, rasrivas, ribarry, vkuznets, wshi, xiachen, xiliang, yacao | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | cloud-init-18.2-3.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1687799 (view as bug list) | Environment: | ||
| Last Closed: | 2020-04-15 12:14:00 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1687799 | |||
Hi Thomas, I'm checking 'cloud-init-per' manually and it seems to work as expected: ssh -i ... [root@ip-10-1-152-147 ~]# cloud-init-per instance myecho echo 'RUN!' >> /var/lib/c-i-p [root@ip-10-1-152-147 ~]# cat /var/lib/c-i-p RUN! [root@ip-10-1-152-147 ~]# cloud-init-per instance myecho echo 'RUN!' >> /var/lib/c-i-p [root@ip-10-1-152-147 ~]# cat /var/lib/c-i-p RUN! (no second line, good) [root@ip-10-1-152-147 ~]# reboot ssh -i ... [root@ip-10-1-152-147 ~]# cloud-init-per instance myecho echo 'RUN!' >> /var/lib/c-i-p [root@ip-10-1-152-147 ~]# cat /var/lib/c-i-p RUN! (again, no second line appeared - good). Could you please do a few things to assist debugging: 1) Check what happened after the first and second runs, to be precise: # ls -laZ /var/lib/cloud/instance/sem/bootper.* # cat /var/lib/cloud/instance/sem/bootper.* 2) Try replacing 'instance' with 'once' to see if it makes a difference? (I know AWS instance ids don't change across reboots under normal circumstances but let's check regardless). 3) Try invoking 'cloud-init-per' from your userdata directly, just like an ordinary shell command and not as a cloud-config block? 4) Attach your complete UserData script so we can try to replicate your environment? Thanks, I am wondering whether this bug exists before. Below are my steps and cannot reproduce this bug, the 2nd disk only was formatted once at the first launch. 1. Create ami with the latest released RHEL7.6.z or only update cloud-init-18.2-1.el7_6.1 2. Add the 2nd standard ebs volume at launching t3.large instance. 3. Added below to user data while launching instance: #cloud-config bootcmd: - cloud-init-per instance mkfs-appvolume mkfs -t ext4 /dev/nvme1n1 mounts: - [ /dev/nvme1n1, /opt/gitlab ] 4. skipped installed application and only write some data to /opt/gitlab. [root@ip-10-116-1-61 gitlab]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:1 0 10G 0 disk ├─nvme0n1p1 259:2 0 1M 0 part └─nvme0n1p2 259:3 0 10G 0 part / nvme1n1 259:0 0 8G 0 disk /opt/gitlab 5. Reboot system 6. the data still exists in /opt/gitlab after reboot or stop,start. [root@ip-10-116-1-61 gitlab]# ls lost+found test2 [root@ip-10-116-1-61 gitlab]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:1 0 10G 0 disk ├─nvme0n1p1 259:2 0 1M 0 part └─nvme0n1p2 259:3 0 10G 0 part / nvme1n1 259:0 0 8G 0 disk /opt/gitlab Just wanted to reply back to prevent an auto-close: Had doctor's appointment (thus unable to test), today and am at a customer site all day, tomorrow (thus unable to test). I'll see what I can do about providing further datapoints, Friday. Ok, looks like there's something in how the updated version of cloud-init parses the bootcmd (or, more likely, the cloud-init-per) logic:
bootcmd:
- cloud-init-per instance mkfs-appvolume mkfs -t ext4 /dev/nvme1n1
Fails, but:
bootcmd:
- cloud-init-per instance appvolume mkfs -t ext4 /dev/nvme1n1
Works.
Thank you, I see the issue now. 'cloud-init-per' is not compatible with 'migrator' module in cloud-init which renames all legacy semaphores in /var/lib/cloud/instance/sem/ replacing dashes with underscores and 'cloud-init-per' fails to find it after. To avoid the issue you can also remove 'migrator' from 'cloud_init_modules:' in /etc/cloud/cloud.cfg I plan submit a pull request upstream. My pull request got merged upstream: https://code.launchpad.net/~vkuznets/cloud-init/+git/cloud-init/+merge/362024 Awesome. Thanks for the update (and fix)! Fix included in cloud-init-18.2-3.el7 Hi Thomas, As we did not reproduce the issue in comment 4 and the bug has not been updated for a long time, could I know if the issue still occur on your ENV? @Vitaly, for the general issue you found in comment 7, I wonder if it is the same issue with the reporter's description? If they are the same issue, we should have reproduced it in comment 3 and 4. So if not the same issue, should we open a new bug to track the general issue in comment 7 ? Thanks! I'm failing to remember all the details but AFAIR you need to provide cloud-init userdata script
with dashes in command name:
bootcmd:
- cloud-init-per instance my-test-command /run/my/test/command
AND
have 'migrator' module enabled in cloud-init configuration.
Pre-patch this was executed on every boot which is unexpected ('cloud-init-per instance' is
supposed to execute only once).
Maybe something has changed since.
We've since updated all of our automation to avoid the offending tokens. So, haven't had the issue since making the modifications part of our standard processes. If no code-updates have occurred, presumably the problem is still present, we just aren't running into it any more. Hello Thomas,
I think you were not able to read my last comment as it was marked as private, I am from Insights rule development team and planning to write a customer-facing rule for this RHBZ, I tried to reproduce the above issue, but I was not able to reproduce it successfully, can you please help to review the below steps which I tried to reproduce the issue.
I have followed below steps:
Step 1:
# history
1 ip ad
2 lsblk
3 cat /etc/fstab
4 fdisk /dev/sdb
5 fdisk /dev/xvdb
6 lsblk
7 mkdir /opt/gitlab
8 history
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 10G 0 disk
├─xvda1 202:1 0 1M 0 part
└─xvda2 202:2 0 10G 0 part /
xvdb 202:16 0 8G 0 disk
└─xvdb1 202:17 0 8G 0 part
# ls -la /opt/gitlab
total 0
drwxr-xr-x. 2 root root 6 May 27 10:05 .
drwxr-xr-x. 3 root root 20 May 27 10:05 ..
Step 2: stoped the instance and added below parameters
bootcmd:
- cloud-init-per always mkfs-appvolume mkfs -t ext4 /dev/xvdb1
mounts:
- [ /dev/xvdb1, /opt/gitlab ]
Step 3: started the instance
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 10G 0 disk
├─xvda1 202:1 0 1M 0 part
└─xvda2 202:2 0 10G 0 part /
xvdb 202:16 0 8G 0 disk
└─xvdb1 202:17 0 8G 0 part
[ec2-user@ip-172-31-1-187 ~]$
Then I collected the logs which contain data related to "/dev/xvdb1".
Log related to /dev/xvdb1:
2020-05-27 10:14:07,759 - util.py[DEBUG]: Peeking at /dev/xvdb1 (max_bytes=512)
2020-05-27 10:14:07,760 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpMhL8OV'] with allowed return codes [0] (shell=False, capture=True)
2020-05-27 10:14:07,783 - util.py[DEBUG]: Failed mount of '/dev/xvdb1' as 'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpMhL8OV']
Stderr: mount: wrong fs type, bad option, bad superblock on /dev/xvdb1,
missing codepage or helper program, or other error
------------------------------------------------------------------------------------------------------------------------------------------
2020-05-27 10:14:07,784 - DataSourceOVF.py[DEBUG]: /dev/xvdb1 not mountable as iso9660
2020-05-27 10:14:36,674 - util.py[DEBUG]: Peeking at /dev/xvdb1 (max_bytes=512)
2020-05-27 10:14:36,675 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpme664c'] with allowed return codes [0] (shell=False, capture=True)
2020-05-27 10:14:36,704 - util.py[DEBUG]: Failed mount of '/dev/xvdb1' as 'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdb1', '/tmp/tmpme664c']
Stderr: mount: wrong fs type, bad option, bad superblock on /dev/xvdb1,
missing codepage or helper program, or other error
2020-05-27 10:14:36,704 - DataSourceOVF.py[DEBUG]: /dev/xvdb1 not mountable as iso9660
Thanks and Regards,
Rahul Srivastava
I'm trying to dig through my commit-history for one of (the many) CloudFormation templates that had the offending logic (that was subsequently changed to work around the issue). Given the amount of time that's passed, I can't even remember which project I was working on that originally triggered this problem. |
Description of problem: We use cloud-init to act as provisioning-automation bootstrap for AWS-hosted resources. When we launch instances that include provisioning of secondary EBS volumes, cloud-init recently started to reformat the secondary EBS volumes with every system boot where, previously, cloud-init would only do so on the first launch of an instance from a given AMI. This is obviously a major problem for us Version-Release number of selected component (if applicable): cloud-init-18.2-1.el7_6.1 How reproducible: 100% Steps to Reproduce: 1. Create an AMI with cloud-init-18.2-1.el7_6.1 pre-installed 2. Launch an instance from the AMI with a secondary EBS volume attached 3. Make use of a cloud-config block within the instance's UserData that contains storage-configuration directives similar to: bootcmd: - cloud-init-per instance mkfs-appvolume mkfs -t ext4 /dev/nvme1n1 mounts: - [ /dev/nvme1n1, /opt/gitlab ] 4. Use further automation to install application into the storage formatted via cloud-init 5. Reboot system 6. Login to find that not only has application failed to start (and associated systemd unit is faulted) but that the filesystem controlled by cloud-init has been reformatted at the reboot Note that if the system is rebooted an arbitrary number of times, the filesystem managed by cloud-init will be reformatted with each and every reboot. Actual results: Filesystem(s) controlled by cloud-init will have been reformatted at every reboot Expected results: Filesystem(s) should be formatted only at initial launch/boot (per the "instance" argument to "cloud-init-per", above Additional info: Found that coincident to this new behavior being exhibited, the previously missing ability to use cloud-init's "disk_setup" and "fs_setup" to format storage now works.