Bug 1003654
Summary: | Segfault in lvmetad during boot | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | rainman3d2002 | ||||||
Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 19 | CC: | agk, bmarzins, bmr, dwysocha, gansalmon, heinzm, itamar, jonathan, kernel-maint, lvm-team, madhu.chinakonda, marcelo.barbosa, mcsontos, michele, msnitzer, prajnoha, prockai, rainman3d2002, zkabelac | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | lvm2-2.02.98-13.fc19 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-11-03 04:32:08 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Hi, can you remove "quiet rhgb" from the grub command line and take a picture of the system messages when it hangs with 3.10.10-200? Just to confirm 3.10.9 works correctly, yes? thanks, Michele Now none of them work right. I had earlier rebooted 3.10.9 back into itself, and it had worked. Now to get these pictures, I booted 3.10.10, which didn't freeze the system, but it still didn't boot right. When I rebooted back into 3.10.9, it didn't work either. I've tried them both with those two things removed from the grub command line. Doesn't make a difference. 3.10.9 was the first kernel in F19 that actually booted correctly all the way into KDM. All the rest fail to activate most, but not all, of my logical volumes, freak out, drop me into maintenance mode where I have to type "vgchange -ay", then when that's done, I press Ctrl-D, which will finish booting, but it won't start KDM. Instead, it will freeze at the animated F. I then press Alt-F2 to get the console, login and type "startx" to get KDE to start. I had never been able to boot F17 into run-level-5, so eventually I gave up and switched it to run-level-3 instead. But it never failed to activate all of my logical volumes the way F19 does every time. Well, you know, not until my hardware blew up. :) It seems to work when you first install the OS, but the process usually breaks with the very first kernel upgrade. Was like that in F17, too. I don't know about F18 as I never install that version. My F19 install, by the way, is a fresh one. I simply replaced the OS, leaving my nonsystem logical data volumes intact. I will post pictures as soon as my cell phone syncs up with Google. It seems to be having issues at the moment. Is it important that something called "rngd" has failed? I've never noticed this one before. [root@System45 ~]# systemctl status rngd.service rngd.service - Hardware RNG Entropy Gatherer Daemon Loaded: loaded (/usr/lib/systemd/system/rngd.service; enabled) Active: failed (Result: exit-code) since Thu 2013-09-05 00:40:46 ADT; 28min ago Process: 1225 ExecStart=/sbin/rngd -f (code=exited, status=1/FAILURE) Kernel issues photos: https://plus.google.com/u/0/photos/115171266703960759122/albums/5920683260650026113?authkey=CMu13u78sZywwwE Hi, thanks for the photos. This is not a kernel issue per se. It's an lvm issue. You get: lvmetad[....]: segfault at .... in your boot logs which is the reason systemd is failing to mount your /dev/mapper/LVM3-home LV So this is likely either https://bugzilla.redhat.com/show_bug.cgi?id=1000894 or https://bugzilla.redhat.com/show_bug.cgi?id=1003278 I suggest you look at the troubleshooting tips in those BZ and try to understand which one affects you and then duplicate this bug against the lvm bug. hth, Michele Okay, but why do some kernels boot normally without experiencing this issue, while others don't? I'm talking about all the kernels released in F19. Could be timing, could be kernel changes triggering the metad crash (which should not happen anyway), could be random. With the evidence here, we need to fix lvmetad not the kernel *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. I installed 3.11.1-200 this morning as part of the normal update, and it booted fine, activating all of my logical volumes AND booting into runlevel 5. But is this issue is fixed, though, as I have seen this numerous times before. One kernel will suddenly work while 2 or 3 following won't, then the next one will, and so on. It started immediately upon installing F19 for the first time. It booted normally, but as soon as I did the first system update, including the kernel update, this pattern began. Again, this case needs to be closed or files against lvm. Kernel has nothing to do with it. Your issue is that lvmetad is crashing and it should not. No matter what kernel version you have. FYI for whoever at the LVM team is looking at this, having to reboot today to fix a network issue has brought back the original problem, in that my logical volumes(except, it seems, the ones named with default values like "vg_system45_lv??") aren't reactivating on boot and I'm forced to enter Maintenance Mode, type in "vgchange -ay" and then press Ctrl-D to finish boot. This doesn't boot into runlevel 5. Instead, I press Alt-F2, log in at the console and use the startx command to get into KDE. Hi, add more details please, at least following: 1. what's the LVM2 version? rpm -q lvm2 2. output of: systemctl status lvm2-lvmetad.service 3. attach the file produced by 'lvmdump' after successful boot 4. if there is a coredump (e.g in /var/spool/abrt/) post the stack trace (In reply to Michele Baldessari from comment #5) > Hi, > > thanks for the photos. This is not a kernel issue per se. It's an lvm issue. > You get: > lvmetad[....]: segfault at .... > > in your boot logs which is the reason systemd is failing to mount your > /dev/mapper/LVM3-home LV > > So this is likely either https://bugzilla.redhat.com/show_bug.cgi?id=1000894 > or https://bugzilla.redhat.com/show_bug.cgi?id=1003278 > > I suggest you look at the troubleshooting tips in those BZ and try to > understand which one affects you and then duplicate this bug against the lvm > bug. Stop trying with different kernels and accept it is not a kernel issue but a timing bug in lvmetad. If there is one kernel where it happens more frequently boot that. And then try to find more if it is any of above BZs as asked by Michele as so far you have provided us with as little clues as possible and without them there is nothing else we can do than close the bug with INSUFFICIENT_DATA as resolution. > > hth, > Michele Created attachment 802739 [details]
lvmdump file
1. [root@System45 ~]# rpm -q lvm2 lvm2-2.02.98-12.fc19.x86_64 2. [root@System45 ~]# systemctl status lvm2-lvmetad.service lvm2-lvmetad.service - LVM2 metadata daemon Loaded: loaded (/usr/lib/systemd/system/lvm2-lvmetad.service; disabled) Active: active (running) since Sat 2013-09-21 09:23:04 ADT; 3 days ago Docs: man:lvmetad(8) Main PID: 1172 (lvmetad) CGroup: name=systemd:/system/lvm2-lvmetad.service `-1172 /usr/sbin/lvmetad Sep 21 09:23:04 System45.localdomain systemd[1]: Stopped LVM2 metadata daemon. Sep 21 09:23:04 System45.localdomain systemd[1]: Starting LVM2 metadata daemon... 3. See attachment 4. There is no core dump file The reason I keep trying different kernels is that some of them seem to work, some don't, and some seem to work for a while before forcing me to reactivate my logical volumes manually with each reboot. Thanks for the info. So to summarize for prajnoha: just 2 local disks with 16 + 24 partitions (GPT) and there are about as many LVs as there are PVs. Seems you are adding work to yourself by using partition per LV and then extending LV by adding partitions to VG. Reading metadata areas from dozens of PVs you have at boot time must be somewhat lagging but still should not stop system from activating LVs. But in the messages file attached in lvmdump I see dozens of fsck processes running at once which definitely may cause problems - after all the checks start there is nothing in the log for 14 minutes and may be a udev timeout. Sep 25 08:22:02 System45 systemd[1]: Started File System Check on /dev/LVM3/home. Sep 25 08:36:18 System45 dbus-daemon[1259]: dbus[1259]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' What filesystems are on those volumes? Checking some of them is quite expensive operation. How much memory does the system have? Post output of `free` please. If there are any filesystems with useless data (HoldSpc? Are those just phony filesystems to hold space?) just remove them. If there are filesystems you do not need do not automount them (remove auto from entries in /etc/fstab) total used free shared buffers cached Mem: 7918048 1634544 6283504 0 63672 703244 -/+ buffers/cache: 867628 7050420 Swap: 8388600 0 8388600 Just rebooted, didn't have any issues this time. My partition layout is a holdover from older days when I had limited disk space. I guess I never really thought about rearranging things. Will need another hard drive for that, though. The Hold* partitions really do hold my data, I just never bothered giving them real names though you're right, I can probably deactivate at least HoldStuff or maybe move it to an external hard drive. It must be pointed out, however, that given that my physical volumes/logical volumes/partitions layout hasn't really changed since F17, I don't see why I should suddenly have these problems under F19 when I never did under F17. F17 had it's own share of problems, but never this. This bug more or less sound like duplicate of bug #1016322. lvm2-2.02.98-13.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/lvm2-2.02.98-13.fc19 Package lvm2-2.02.98-13.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing lvm2-2.02.98-13.fc19' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-20436/lvm2-2.02.98-13.fc19 then log in and leave karma (feedback). lvm2-2.02.98-13.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. I now have 2.02.98-13. On first reboot after the update, it failed again and I had to do "vgchange -ay" so it would boot, but as soon as I rebooted again, it worked properly. Will update after next reboot. Rebooted this morning for a kernel update, boot normally without issues. Hopefully this new tradition will continue... :) Happened again this morning. Had to reboot for a kernel update, got dumped into maintenance mode where I did 'vgchange -ay', but when I pressed ctrl-D, it finished booting normally into runlevel-5. It's still doing it, only this time when I did 'vgchange -ay', it didn't boot into rl-5, only into rl-3.. This happened immediately after a kernel update to 3.11.10-200.fc19.x86_64. As of kernel 3.12.5-200 update 4 days, it booted correctly. LVM version: 2.02.98(2) (2012-10-15) Library version: 1.02.77 (2012-10-15) Driver version: 4.26.0 |
Created attachment 792888 [details] CPU & Memory Info Description of problem: Version-Release number of selected component (if applicable): 3.10.10-200 How reproducible: Reboot Steps to Reproduce: 1. Upgrade from 3.10.9-200 2. Reboot 3. Actual results: System fails to boot, screen flashes as animated circle reaches about 3/4 and the whole thing comes to a half. Nothing else display, keyboard inoperative, can't Esc to see boot log, can't Alt-F2 to open a console. Expected results: System boots Additional info: Had to reboot using kernel 3.10.9-200. Attached is the CPU and memory info, if you need anything else, just ask.