Bug 2023092
Summary: | mdadm.service fails to start: mdmonitor.service: Can't open PID file /run/mdadm/mdadm.pid (yet?) after start: Operation not permitted | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan Vesely <jan.vesely> |
Component: | mdadm | Assignee: | XiaoNi <xni> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 37 | CC: | agk, dave, dledford, evansj3000, jes.sorensen, rhel, xni |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-12-05 21:02:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Vesely
2021-11-14 16:56:52 UTC
Hi Jan I tried to reproduce this in my environment and I didn't reproduce it. The steps I did are: 1. Install f35 2. Create some loop devices and create a raid1 3. mdadm -Es > /etc/mdadm.conf (mdmonitor service needs the config file) 4. systemctl start mdmonitor 5. systemctl status mdmonitor ● mdmonitor.service - Software RAID monitoring and management Loaded: loaded (/usr/lib/systemd/system/mdmonitor.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2021-11-15 20:10:36 EST; 5s ago Process: 81616 ExecStart=/sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid (code=exited, status=0/> Main PID: 81617 (mdadm) Tasks: 1 (limit: 14247) Memory: 456.0K CPU: 6ms CGroup: /system.slice/mdmonitor.service └─81617 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid The mdadm.pid file is created automatically after running `systemctl start mdmonitor` How about restart your mdmonitor service? Can it success? Thanks Xiao From the opening post: "the service starts ok when restarted manually after the machine boot completes" so yes, manual restarting the mdmonitor service works. (In reply to Jan Vesely from comment #2) > From the opening post: > > "the service starts ok when restarted manually after the machine boot > completes" > > so yes, manual restarting the mdmonitor service works. How about lsblk after boot? So I can try to reproduce this problem in my machine. NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 931.5G 0 disk ├─md126 9:126 0 1.8T 0 raid0 │ ├─md126p1 259:4 0 260M 0 part │ ├─md126p2 259:5 0 16M 0 part │ ├─md126p3 259:6 0 325.8G 0 part │ ├─md126p4 259:7 0 995M 0 part │ └─md126p5 259:8 0 1.5T 0 part │ └─luks-f3e4f35e-5a93-4c2a-a1b2-56da7dc057b6 │ 253:4 0 1.5T 0 crypt /mnt/big-data └─md127 9:127 0 0B 0 md sdb 8:16 0 931.5G 0 disk ├─md126 9:126 0 1.8T 0 raid0 │ ├─md126p1 259:4 0 260M 0 part │ ├─md126p2 259:5 0 16M 0 part │ ├─md126p3 259:6 0 325.8G 0 part │ ├─md126p4 259:7 0 995M 0 part │ └─md126p5 259:8 0 1.5T 0 part │ └─luks-f3e4f35e-5a93-4c2a-a1b2-56da7dc057b6 │ 253:4 0 1.5T 0 crypt /mnt/big-data └─md127 9:127 0 0B 0 md zram0 252:0 0 15.5G 0 disk [SWAP] nvme0n1 259:0 0 238.5G 0 disk ├─nvme0n1p1 259:1 0 200M 0 part /boot/efi ├─nvme0n1p2 259:2 0 1G 0 part /boot └─nvme0n1p3 259:3 0 237.3G 0 part └─luks-a99fc9de-32dd-43f7-9133-699710a861ef 253:0 0 237.3G 0 crypt ├─fedora-root 253:1 0 50G 0 lvm / ├─fedora-swap 253:2 0 15.7G 0 lvm [SWAP] └─fedora-home 253:3 0 171.6G 0 lvm /home sorry for the delay I have this same problem on a virtual private server (VPS) at OVH that I just upgraded from Fedora 34 to 35. NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 511M 0 part ├─sda2 8:2 0 1.8T 0 part │ └─md2 9:2 0 3.6T 0 raid0 /var/lib/containers/storage/overlay │ / ├─sda3 8:3 0 512M 0 part [SWAP] └─sda4 8:4 0 2M 0 part sdb 8:16 0 1.8T 0 disk ├─sdb1 8:17 0 511M 0 part /boot/efi ├─sdb2 8:18 0 1.8T 0 part │ └─md2 9:2 0 3.6T 0 raid0 /var/lib/containers/storage/overlay │ / └─sdb3 8:19 0 512M 0 part [SWAP] still fails the same way in f36 I see similar failures in at least one more service: dnsmasq[1675]: chown of PID file /run/nm-dnsmasq-wlo1.pid failed: Operation not permitted but in the case of dnsmasq it doesn't result in service failure. It looks like the "chown of PID file" is comming from systemd instead of the mdadm daemon. I tried to follow the "yet?" part and changed mdmonitor service file to include: ExecStart=sh -c "/sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid && sleep 5" This instead causes the issue to be 100% reproducible even when starting mdmonitor manually. The experiment points to the issue being that there's a race between systemd accessing the file set up in "PIDFile=" and mdadm deleting the pidfile after exiting. Removing the "PIDFile=" part of mdmonitor.service file works around the issue. This message is a reminder that Fedora Linux 35 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '35'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 35 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed. Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13. Fedora Linux 35 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed. reopening, the issue is still present in fedora 37 This message is a reminder that Fedora Linux 37 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '37'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 37 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed. Fedora Linux 37 entered end-of-life (EOL) status on None. Fedora Linux 37 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed. Hello all. I am writing in this ticket as I have experienced this issue today in Fedora 41.
I am running on a Mac Pro 5,1 and have 4x 1TB HDDs in each SATA2 port.
>>> Removing the "PIDFile=" part of mdmonitor.service file works around the issue.
Jan's advice holds sound to this day. It seems there is a race condition when the service is being started, which then causes the service to fail because it seemingly cannot both write/read the file.
systemd[1]: mdmonitor.service: Can't open PID file /run/mdadm/mdadm.pid (yet?) after start: No such file or directory
Here is the steps for reproduction:
# Create the RAID
sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=4 /dev/sd[a-d]1
# Create ext4 fs
sudo mkfs.ext4 /dev/md0
# Create /etc/mdadm.conf from assembled array
sudo mdadm --detail --scan --verbose | sudo tee -a /etc/mdadm.conf
# Add to fstab and set nofail otherwise feel pain
sudo blkid /dev/md0
UUID=36149ec5-d782-4ad7-9dc0-962537b9870b /mnt/data ext4 defaults,nofail 0 0
sudo mount -a
# Make it mine :)
sudo chown $(whoami) /mnt/data
# I am unsure if this helps really but I've seen that it should be done
sudo dracut --force --add mdraid
# Reboot
sudo systemctl reboot
Intended result: the 'ConditionPathExists=/etc/mdadm.conf' line in the service config should kick the service into running at next boot, therefore the RAID should mount.
Problem: The service starts now that the config is there. However, the service dies:
Feb 15 16:07:07 hostname systemd[1]: Starting mdmonitor.service - Software RAID monitoring and management...
Feb 15 16:07:07 hostname systemd[1]: mdmonitor.service: Can't open PID file /run/mdadm/mdadm.pid (yet?) after start: No such file or directory
Feb 15 16:07:07 hostname systemd[1]: mdmonitor.service: Failed with result 'protocol'.
Feb 15 16:07:07 hostname systemd[1]: Failed to start mdmonitor.service - Software RAID monitoring and management.
Workaround/Solution:
sudo systemctl edit -full mdmonitor.service
Comment out this line
#PIDFile=/run/mdadm/mdadm.pid
The service will start on next boot and mount the RAID in the correct mountpoint. :D)
However, if something goes wrong I assume systemd not knowing the PID will cause trouble.
|