Description of problem: Anaconda fails to recognize local standard disk drives that have been secure-erased. (In this case, an SATA SSD and an SATA HDD) when installing from Live USB (DVD ISO image). Version-Release number of selected component (if applicable): F21 Alpha-1 KDE live DVD (image run off of a USB drive) How reproducible: Always Steps to Reproduce: 1. Boot into F21-Alpha-1 KDE Live DVD environment 2. yum install hdparm 3. Perform hdparm's secure-erase of the drives to purge existing data and prepare them for installation: hdparm --user-master u --security-set-pass $password /dev/sdX hdparm --user-master u --security-erase $password /dev/sdX 4. Start Anaconda within the Live DVD environment to perform install to disk Actual results: Anaconda fails to recognize the freshly secure-erased SATA HDD drives, and no drives are available to select for installation. This occurs even though the drives have just been secure-erased within the KDE Live DVD environment and are recognized by the KDE Live DVD as having been freshly secure-erased. The drives are NOT frozen. It's just that Anaconda seems not to be able to recognize a drive that has just been secure-erased. For reference, I am attempting to perform refresh of the drive using secure-erase, rather than reformatting, to prevent imposing unnecessary wear on the SSD. Unfortunately, when secure-erase finishes erasing the drive, Anaconda won't recognize it even though the KDE Live DVD environment and hdparm recognize it as an unformatted drive. Expected results: Normal install with local standard disk recognition Additional info: This is happening with BOTH SATA drives that are installed on the system: an Intel SSD that is initialized by the live DVD as /dev/sda, and a WDC WD800 SATA HDD that is initialized by the live DVD as /dev/sdb. My objective is to perform a hybrid SSD/HDD install, placing infrequently modified partitions on the SSD (/boot and root) and partitions with frequently modified files on the HDD (/var, swap and /home). It appears that the KDE Live DVD OS will recognize a drive that has just been secure-erased and appears empty (recognition is successful even after a reboot), but that Anaconda won't ever recognize the secure-erased drives, either before or after a reboot. After exiting Anaconda following it's failure to recognize and partition the drives, the drives are recognized by the OS but have been LOCKED (presumably by Anaconda or by BIOS during the reboot, and they need to be hotpluged/power-cycled to reset them to a non-locked state. (AHCI is enabled.) Unfortunately, falling back to an F20 install isn't an option for me because F20 installation media will not work successfully on my hardware. These F20 failures are why I'm trying to use the F21-Alpha-1 install media. The F20 installs are effectively blocked by: 1) anaconda's failure to properly recognize the motherboard's AMD 970/SB950 chipset and NIC (Bug 1148617), 2) anaconda's failure to boot kernel on an AMD FX system due to a Nouveau bug causing AMD-Vi page faults with nVidia cards (Bug 1047637), and 3) Nouveau driver rendering all windows fully transparent on nVidia cards (Bug 1047197). It seems that there are lots of problems with support of AMD chipsets and nVidia video cards. Unfortunately, both F20 and F21 seem to be exceptionally difficult/impossible to install on my new hardware: MB: Asus M5A97LE R2.0 with AMD 970 / SB950 chipset CPU: AMD-FX-8350 SSD: Intel 330 HDD: WDC WD800 NIC: Realtek 8111F GBit Video: Nvidia Quadro 400, Quadro 600, GK-110 Titan GTX Prior to trying to secure-erase the drives, I had successfully installed F21-Alpha-1 on this box using this same hardware. In those cases, I secure-erased the drives and allowed anaconda to perform automatic partitioning. On this install, following the secure-erase, I selected anaconda's option to perform user-defined partitioning, and device encryption. These selections seem to have resulted in rendering the drives as unrecognizable by anaconda, even after repeated power-cycles/unlocks. What's odd is that the KDE Live DVD recognizes the drives without any problem. The bug seems to be within anaconda. Aargh.
Created attachment 948200 [details] hdparm -I /dev/sda > hdparm-ssd.txt
Created attachment 948201 [details] hdparm -I /dev/sdb > hdparm-hdd.txt
F20 DVD install medium / anaconda successfully identifies and initializes both drives. F21-alpha-1 DVD install medium / anaconda cannot identify or initialize either drive.
Please attach the log files from the installer, available in /tmp, to this bug as individual text/plain attachments.
Sorry for the delay. I got frustrated with the problem and decided to insstall another distribution that allowed me to perform a command line install, manually partitioning the disks before compiling the OS. Everything is working fine on the box. To assis you with this bug report, I just put the F21 Live / Installer USB back into the working system, and anaconda continues to fail to recognize the drives (no drives appear available for install). Before I post logs for you, I have couple of questions: 1. which log files do you want? everything in /tmp? there are 10 logs there, some of which appear not to be germane. 2. is there any identifying information in those logs? I am using your Live USB on a live system that has data on it. Please let me know which anaconda log files contain unique / identifying information, such as IP addresses, MAC addresses, UUIDS, etc., so that the personally identifying information can be screened before making the logs publicly available.
storage.log is the most important. anaconda.log, packaging.log and program.log would also be nice. storage.log will contain device serial numbers and filesystem UUIDs, so if you are concerned about that please remove the identifying information before posting the logs.
Created attachment 949241 [details] storage.log
Created attachment 949242 [details] program.log other requested logs are empty.
12:58:48,035 INFO program: Running... multipath -c /dev/sda 12:58:48,040 INFO program: /dev/sda is a valid multipath device path 12:58:48,040 DEBUG program: Return code: 0 12:58:48,064 INFO program: Running... multipath -c /dev/sdb 12:58:48,069 INFO program: /dev/sdb is a valid multipath device path 12:58:48,069 DEBUG program: Return code: 0 Your hard drives are being detected as multipath devices. Is this the case? If so, they should show up in the multipath tab when you click "Add a disk..." under Specialized and Network Disks. If they're not multipath then let's reassign to device-mapper-multipath.
my disks are plain-jane SATA disk drives. One is an SSD, the other is a HDD. Both use a single SATA cable to connect them to a plain-jane SATA port on the motherboard. There is nothing fancy going on here. No RAID on the SATA ports, no fault tolerant multiport implementations. Because the system has no PATA devices EHCI *is* enabled on the SATA ports. just to reiterate -- the F20 install media/anaconda detect the drives properly as plain SATA drives, as does the other linux distro. it's only the F21-A workstation media/F21A anaconda that has buggered the drive detection.
I have a question that I hope you can answer about the F21 release schedule. I'm looking at the release schedule, and it looks like we're beyond the "beta freeze" deadline for F21-Beta. I'm not sure how these deadlines work, so I'd like to ask if there is any chance that this installer/drive detection problem could be resolved in time to make it into the F21 final release's install media? I've been using Fedora for a decade now, and because I can't get either F20 or F21 to install on my new box, it looks like I'm going to be forced to find another OS product. I'm more than happy to continue to help debugging the pre-release F21 software, but if it's not possible for the fix to make it into the F21 final release, then I won't be able to plan on using Fedora on this box. :( Please let me know -- if I can't get a version of Fedora to install on this new box then I'm going to have to bite a turd sandwich and start looking for another distribution that will work. I really don't want to go back to compiling everything. Thanks.
Changes can still go in after beta, though after freeze they have to go through the Fedora blocker/freeze exception process. You can propose this bug as a blocker using the app at https://qa.fedoraproject.org/blockerbugs/propose_bug. This bug seems to fit the criteria of "The installer must be able to complete an installation using any supported locally connected storage interface.", https://fedoraproject.org/wiki/Fedora_21_Alpha_Release_Criteria#Storage_interfaces
I've tried logging on to propose the bug as you suggested. I have no credentials to get into the site: https://id.fedoraproject.org/authenticate/fedoauth.auth.fas.Auth_FAS/ Neither my fedora forum credentials, nor my redhat bugzilla credentials are accepted. I guess someone else will have to to the submission.
I just learned how to post this as an F21BetaBlocker directly through Bugzilla. thanks again.
Update: although the attached log files show the drives being detected as multipath devices, the when I run the installer the drives are not displayed anywhere in the GUI. That is to say, the drives are not appearing in the Local Standard Disks tab, nor are they appearing in the Specialized / Network Disk / Multipath tab. There are no drives showing up anywhere.
For the record, -1 Blocker from me. The "The installer must be able to complete an installation using any supported locally connected storage interface" implies (to me) physical storage that has not been manually manipulated before handing over to Anaconda. It sounds like the secure erase has a bug that is leaving the device inaccessible to the system, but that does not mean that the installer wouldn't work with hardware that had not been broken. As this can only be hit through user action outside of the installer, I'd rather add this to the Common Bugs page and not block Beta release. That being said, if at today's Go/No-Go meeting we decide to slip a week, I'd be fine with this being a Freeze Exception.
I have some additional information that I'd like to supply to help clarify things. I think that I was premature to blame the problem on secure erase. The only reason that I reported the problem as having been attributable to secure-erase is becasue I had indeed deviated from the normal install by using it. Since filing the original report, I have installed various linux installations on this machine, all of which work without problems. After installing each of the distributions, each of which successfully recognized, formatted and installed on the disks, I attempted to install F21A on top of them. It failed every time. F21A just won't install. I was expecting that it would recognize an existing linux installation and allow me to install over it. But F21A's installer still refuses to recognize the drives. For this reason I don't agree with a statement made in Comment 16 about secure erase "breaking" the device or "leaving it inaccessible to the system". I think that the recognition problem is a multipath recognition error: * The F20 installation media recognizes the drives -- either following secure erase, or after installation of another distribution on the drives -- and is capable of completing the partitioning process. * The Gentoo minimal installation media recognizes the drives following secure erase, and is capable of completing the partitioning process and building a fully functional system. * The Knoppix Live CD works as above. * The Ubuntu Live CD works as above. * The F21A installation Live DVD works somewhat -- in that the DVD media recognizes the drives following secure erase, and is capable of recognizing the drives and manually completing the partitioning process following secure erase, and it is capable of recognizing the drives after another linux distribution has been installed on them. * The only software that isn't able of recognizing the drives and completing the partitioning process following secure erase is the F21-A Anaconda installer, because device-mapper-multipath that it uses has an error in it that causes this application to fail to identify the drives. It doesn't matter if the drives have been secure erased or if the drives have any other linux OS installed on them -- F21A's installer *STILL* won't properly recognize the drives -- even when I don't "deviate" from the normal installation procedure by using secure-erase. To summarize: I can pop in a handful of Linux Live DVDs, and all of them will recognize the securely-erased drive, and all of them will complete an install on a secure-erased drive. The only one that won't recognize the securely-erased drive is the Install-to-Disk application on F21. And not only wll F21A fail to install on a system that has had a "deviant" secure erase procedure performed on it, F21A's anaconda won't even recognize the disks of an existing linux installation when no "deviation" from the standard installation procedure has been taken. This isn't really a problem that's attributable to secure-erase. I apologize for having misled everyone with my initial report. To further clarify the problem: The disks are no longer secure-erased. A complete linux system is installed on the drives, and even though the system has a complete/functional linux system installed, the Anaconda installer on the F21A DVD *STILL* will not recognize the drives. This bug is no longer about the installer failing to recognize secure erased drives. The installer is still failing to recognize the drives on a working linux system, even when I don't "manually manipulate the storage medium before handing it over to anaconda." I apologize for prematurely placing the blame on secure erase. I misled everyone by doing that. The fail-to-recognize problem in the bug report persists on the system, even if I don't deviate from a standard install, just booting the system and choosing install-to-disk to install on top of an existing linux installation. this is definitely a blocker, IMHO, as I'm now following the prescribed installation procedure to install over an existing linux installation and the installer still won't properly recognize the drives. Thanks for your time. Thanks for your time.
Discussed at 2014-10-24 Go/No-Go meeting: http://meetbot.fedoraproject.org/fedora-meeting-2/2014-10-24/f21_beta_gono-go_meeting.2014-10-24-17.01.log.txt . We agreed to delay decision on the status of this one for some input from anaconda devs and more debugging. There's clearly something odd going on in this specific case for the drives to be identified as a multipath set when apparently they aren't, but we haven't had any other reports of this case yet that we know of, so it seems quite unusual. The criterion cited means that in general the installer must be capable of installing to all the typical drive interface types - SATA, PATA, SCSI etc. When a bug is like this - only seems to affect a specific configuration - we need to figure out in more detail the circumstances that trigger the bug so we can determine if it's likely to be widespread enough to justify blocking the release for. anaconda folks, any idea what's going on here yet? Thanks!
Ben Marzinski - ping? this is a potential Fedora 21 Beta release blocker, but it'd really help to have some idea of what the problem is and how common it's likely to be. thanks!
Some additional info about the hardware, just in case it helps to answer some questions that came up in the meeting: The system is built around an 8000 series 8-core AMD CPU, specifically the 4.0 GHz AMD FX-8350, which is considered a high-end desktop platform. AMD's flagship product isn't really an "uncommon" hardware product, as someone suggested in the meeting notes. I'd say that the AMD FX line of processors is a fairly common platform. The motherboard is a new ASUS M5A97LE R2.0 motherboard. This motherboard has been out for 2 years. It's widely used and stable. I have been using one to run a mathematical computing server on Fedora for 2 solid years now, with no drive recognition problems until I tried F21Alpha on the new board. I was content with the performance of F20 on the first board once I got it installed, so I bought a second one to build another 8-core box. Although F20 works OK, I'm having problems installing F21A on the new box. I don't dare take the F21A media near the other box that has been working fine on F20 for a really long time. I don't want to tempt fate. This new box could qualify as a high-end gaming system, though I'll be deploying it it a mathematical modeling/GPU-based supercomputing role. It's video (GPU) subsystem currently uses a single nVidia GeForce GTX "Titan" / Kepler architecture mathematical processor card, though two more cards will be installed once the box is up and running F21. (The GTX Titan is the brand new, state of the art offering that is used for mathematical supercomputing tasks. This is a "common" garden-variety $1000 gaming card that "gamers" would consider a state of the art gaming card. It is a mainstream high-end product. In addition to beling deployed as a supercomputing SLI-type GPU, three cards of this type can be used with SLI to create video-wall and "wrap around" type gaming systems using 6 monitors.) Just to confirm that I didn't have a firmware issue on the brand-new motherboard that's preventing the SATA drive recognition, I installed a new BIOS yesterday, so now the system has the latest BIOS, v. 2501. Updating to the latest version of the BIOS had no effect on the drive recognition problem. To address those points about "uncommon hardware", a suspected "firmware issue" and "whether or not the bug effects a lot of people and that you should have heard something by now": 1. The AMD FX 8000 series CPUs, the AMD 970 chipset motherboards, and the nvidia Titan GTX video subsystems are common, current-production, state of the art hardware products. 2. The motherboard firmware has been updated to the latest BIOS. Other linux distrubtions work fine. It's just Fedora 21's multipath device recognition that seems to be the problem. 3. The problem may not be commonly reported because some AMD FX users may have moved to other distributions: As a coincidence of timing, getting Fedora 18/19/20 installed on the AMD FX chipset motherboards has been a royal PITA, because the rollout of the distribution media has been coincident with the appearance of well documented bugs with nouveau + nvidia in F18/F19/F20 distribution media kernels. These have included bugs related to kernel failures on the F20 installation media to allow boot on AMD FX systems, due to kernel bugs with IOMMU compatibility with nvidia cards and the nouveau driver. examples: Bug 1047637. Bug 1154225. Perhaps the reason that you aren't hearing about the problem -- i hate to say it out loud -- is that people got so sick and tired of not being able to get F18/F19/F20 distribution media working with AMD chipsets + nvidia cards + nouveau that they have learned from their mistakes and they're not even trying to install F21A. IMO they would be wrong in thinking that, as IMO the nouveau+IOMMU problems on the AMD FX + nvidia platform have recently been resolved. It's just too bad that this drive recognition problem is throwing a wet towel on the F21 release, which otherwise fixes those longstanding problems with the F18/F19/F20 distribution media. The unfavorable trend for AMD FX has persisted through 3 releases -- long enough that it might have driven people away from Fedora, which might explain why you aren't hearing common reports about the bug. Just a thought. As a power linux user (former developer for another distribution) I'm probably smarter than the average bear when it comes to figuring out how to work around these installation problems. For the past several releases the video bugs have forced me to install via SSH or VNC. It would be refreshing to be able to install F21 on an AMD box with an nvidia system without any major headaches. That's something I haven't been able to do for a period of years now. The persistent nature of this problem may have given people the impression that Fedora is just a bad choice for AMD FX + nvidia. It's been that way since F18. Right now I'm elated that these longstanding problems have been fixed after years of dysfunction in F18/F19/F20 install media. The only thing that's keeping me from jumping for joy right now is that this pesky drive recgognition problem is thowing a wet towel on the situation, and effectively blocking the fix on my AMD FX platform from being realized. My point is to comment that just because nobody else is complaining doesn't mean that the problem isn't effecting lots of people. The chirping crickets could be an indication that people who we might expect to be complaining might not be complaining because they've moved on.
Here are the output of dmesg and lspci when booting the sytem from the fedora 21 alpha live DVD. (I've pulled out the expensive GPU and replaced it with a less expensive workstation-type card in this example.) See attachments. I hope this information helps.
Created attachment 950551 [details] dmesg > dmesg.txt
Created attachment 950552 [details] lspci -vvv > lspci.txt
I think you're extrapolating trends from insufficient data. You have *one* system. It's dangerous to draw conclusions about "AMD FX + nvidia" from your single system. The most relevant bit of hardware information here is likely to be your disk controller. The CPU is almost certainly entirely irrelevant.
The CPU is entirely relevant in that it alone defines the disk controller in use, as there is only *one* chipset that will fully support the AMD FX series processors, and that's the AMD 9-series chipsets: http://www.amd.com/en-us/products/chipsets/9-series# These chipsets, in turn, rely upon the AMD SB950 for providing SATA support. I spoke of the CPU because the CPU alone is sufficient to define the SATA chipset that is in play here. FYI here is the SB950 errata: http://support.amd.com/TechDocs/49645.pdf#search=sb950
It would still have been a lot more straightforward to just ...say what the disk controller is. Or an lspci output would've done.
Sorry if I was unclear, or obfuscating. I thought that everyone who specializes in this area was aware that the only southbridge for *all* AMD FX motherboards is the SB920/950. I had previously listed the output of "lspci -vvv" in Comment 23. Plese let me know if any other information is needed. Although I still cannot install Fedora on the box I'm happy to try to continue to assist you.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=1114770 and its dupe https://bugzilla.redhat.com/show_bug.cgi?id=1114783 - cmurf also ran into cases of device-mapper-multipath deciding his disk is a multipath config, and sometimes being unable to install as a consequence (though for him sometimes the multipath configuration actually somehow succeeded and he could install to the 'multipath' device.)
Re: Blocker bug, I'd say if the multipathd confusion doesn't persist after rebooting following completed ATA Security Erase, and installation is possible, then I'm -1 blocker. If the confusion persists after a reboot then I'm +1 blocker. The use case to require successful installation in the same boot environment immediately following ATA Security Erase isn't something I follow. FWIW: I did not use ATA Security Erase in my case, and it's new in F21, I didn't have this problem ever in F20. And I haven't run into it with any of the F21 post-alpha/pre-beta builds. Not related to whether this should block: Security Erase does cause the drive to vanish, although I don't know specifics. ATA Security Erase issued to a drive MUST complete. You can't cancel it. If the computer is rebooted or even powered off, the drive will not reappear. I'm not sure if the spec requires autoresuming the erase, or if the drive just stays locked waiting for the erase command to be reissued. But it definitely isn't usable as a block device until the command completes. I've done bunches of non-interrupted ATA Security Erases, and upon completion the drive re-appears similar to a hot plug event as far as the kernel is concerned. I wonder if hot plug removing then readding the drive causes the same multipathd confusion? If so, that'd seem to be a bug. But again not necessarily blocking.
The reporter already stated several comments back that the issue persists long after the erase, including after installing a different distro to the disk. I'm gonna edit that out of the subject because all it's doing now is generating unnecessary word count.
intermittently throwing an error when booting the F21A live DVD: dmesg | grep multipath [ 21.873214] device-mapper: multipath: version 1.7.0 loaded [ 27.393719] device-mapper: multipath service-time: version 0.2.0 loaded [ 38.914076] device-mapper: table: 253:7: multipath: error getting device
(In reply to Adam Williamson (Red Hat) from comment #30) I just did an ATA Security Erase and install of F21 beta, so I don't think this problem is Security Erase related, it's just coincidence. I think it's the same bug as the one I reported; I was also getting bogus lines like this one reported in anaconda program.log, with unrevealing information in dmesg to explain it. 12:58:48,051 INFO blivet: type detected on 'sda' is 'multipath_member' Has anything newer than Fedora 21 Alpha been tested? The posted dmesg says it's kernel 3.16.1 which was a while ago, we're up to 3.17.1 now and I haven't had multipath confusion in over a month (with builds more recent than alpha). Can we get a Fedora 21 Beta RC1 tested on this hardware? And if there's a reproducer, the journal file from the installer environment's /tmp directory would be more useful than dmesg.
Those are both good points - bob, could you test with Beta RC1: https://dl.fedoraproject.org/pub/alt/stage/21_Beta_RC1/ and let us know if you still have the problem, and provide the output Chris mentioned if so? Thanks!
F21 Beta RC1 works for me. I put the .iso on a USB thumb drive and did a successful install.* It would seem that Chris Murphy is right, the problem seems to have vanished in builds more recent than alpha. -1 F21BetaBlocker for me. :) * For all of you LVM guys: Anaconda F21B RC1 still has the annoying problem of not letting the user place "/" and swap on separate volume groups on separate devices. Before you ask -- Yes, I'm aware of how to create additional vg on different devices, using the anaconda menus. The problem is that when I change the device target for the vg, Anaconda forces "/" swap and "/var" onto the same partitions. To get the partitions separated onto the SSD and HHD in the manner I mentioned previously I had to choose not to install to LVM. I've filed bug reports on this before and I keep getting told that LVM works fine when it doesn't.
Rather than -1 blocker, let's say it's fixed. We can re-open if the issue re-occurs. Chris, are you going to close your copy too?
Reopening. Sorry. My previous experience about the bug being resolved was a fluke; the bug remains intermittent. 1. On my first try of the F21Beta RC1 installation media (yesterday), both SATA drives (SSD and HDD) were properly detected, and I was able to perform a suitable partitioning using standard partitions instead of LVM. (I had a problem with LVM not accepting my desired partitioning scheme reported in Bug 1157990.) Using standard partitions the install went fine, and F21Beta RC1 installed without any problems. I've been using F21Beta on the system for 2 days now. No problems. 2. In order to satisfy a Needinfo request for Bug 1157990, I used the same F21Beta RC1 installation media to boot on the machine that has a working F21BRC1 system installed to the hard disks. The objective was to document the installer's failed responses to LVM partitioning requests. Result: "NO DISKS DETECTED. Please shut down the computer, connect at least one disk, and restart to complete installation." I tried rebooting the system several times, alternating between the existing F21 Beta RC1 install to the hard disks that works fine, and the F21BRC1 install medium. The installed version of F21BRC1 works fine, always recognizing the disks. The F21BRC1 install medium will either: A) recognize the disks, as it did yesterday; B) not recognize any disks, (today) or C) mis-recognize the SATA-SSD as a Specialized Network Disk (multipath) instead of as a Local Standard Disk, while failing to recognize the local HDD. (today). It looks like the bug has to be reopened and the +1 F21BetaBlocker status resumed. sorry.
(In reply to bob from comment #36) > Result: "NO DISKS DETECTED. Please shut down the computer, connect at least > one disk, and restart to complete installation." We need the journal to know what's going on. Once you get the above screen, please switch to a shell, and go to /tmp. There might be a journal file in there you can scp or copy to a USB stick to attach to this bug. If not, while still in /tmp use 'journalctl -b -o short-monotonic > journal.txt' and attach that to this bug. Thanks.
Actually it might be useful to get *separate* journals for case B: no disks detected, and case C: incorrectly treated as multipath device.
So, no one else is reporting this and we've had a pretty wide variety of people testing in the last few days. I'm prepared to invoke the "not affecting many people" excuse and vote -1 Blocker here. I'd be +1 as a Freeze Exception if 1) a fix is identified and 2) we end up slipping and respinning again. Frankly, given that the reporter has shown that it is intermittent (and that a reboot or two may get past it), I'd prefer to call this a Common Bug for Beta and proceed.
Created attachment 951932 [details] journalctl output
Created attachment 951944 [details] Test Case 1 Test Case 1: no disks found
Created attachment 951945 [details] Test Case 2 Test Case 2: No disks selected; No disks appeared in either menu, though two SATA drives were discoverable as multipath devices by clicking on the multipath tab.
Created attachment 951946 [details] Test Case 3 Test Case 3: No disks selected. Local SATA SSD was recognized improperly as a multipath device and appeared in the multipath window of the drive selection page. Local SATA HDD did not appear in the multipath window, but was discoverable as a multipath device.
I have a question. After you boot, what happens if you run # multipath -c <devname> When anaconda runs, it should set multipath.conf to include find_multipaths yes It certainly does in RHEL7. With this set, multipath will only detect that a device is a valid multipath path in two cases 1. It's wwid is listed in /etc/multipath/wwids (this file should be blank when anaconda runs. I can't imagine that anaconda would ever create it non-empty, and multipath only writes to it when it creates a multipath device). 2. There actually are currently at least one other device that has the same wwid. Multipath gets the wwids from udev, specifically the ID_SERIAL udev envrionment variable. You can also check this for your devices by running # udevadm info <devname> | grep ID_SERIAL Now, if find_multipaths isn't set, multipath will accept any non-blacklisted device as multipathed. This was how things were in RHEL6, and anaconda had to do a lot of work to determine which devices it should mark as multipathed. All that got pulled out when it switched to using find_multipaths. Now if for some reason, find_multipaths wasn't set in /etc/multipath.conf when anaconda runs multipath -c <devname> It will just accept every device as multipathed. So this is a tempting place to look first, with just one tiny problem. I don't see how this could be intermittent. Multipath will alwasy fail if the config file doesn't exist at all. And I assume that once the config file exists, it either should or shouldn't always have find_multipaths. Actually, when you fail out the shell in the installer, you should try running the multipath and udevadm commands, as well as checking /etc/multipath.conf and /etc/multipath/wwids
Ben: thanks for the input. As you describe it, I'd also kinda expect that if the problem were in multipath.conf, a lot more people would be hitting this a lot more often?
Created attachment 951963 [details] pre-anaconda tests
Created attachment 951964 [details] post-anaconda tests
(In reply to Ben Marzinski from comment #44) > When anaconda runs, it should set multipath.conf to include > > find_multipaths yes not happening. > 1. It's wwid is listed in /etc/multipath/wwids (this file should be blank > when anaconda runs. I can't imagine that anaconda would ever create it > non-empty, and multipath only writes to it when it creates a multipath > device). wwid of the local SATA disk drives are listed in /etc/multipath/wwids even though file should be blank. > 2. There actually are currently at least one other device that has the same > wwid. not applicable udev is accurately providing the ID_SERIAL envrionment variable. > Now, if find_multipaths isn't set, multipath will accept any non-blacklisted > device as multipathed. I think you've found the problem. > Now if for some reason, find_multipaths wasn't set in /etc/multipath.conf > when anaconda runs > > multipath -c <devname> > > It will just accept every device as multipathed. This appears to be the case. Please, no more tasks unless you want to add me to the redhat payroll.
bob: have you tried a non-live image at any point? I just did a couple of quick boot tests, and /etc/multipath.conf has 'find_multipaths yes' in defaults {} when booting a non-live image (DVD or netinst) but not when booting a live image. still, my disks don't show up as multipath when booting live. I tested F20 to compare, and when booting F20 live it doesn't have a /etc/multipath.conf at all, and there's nothing in /etc/multipath. It does print this in /tmp/program.log: /etc/multipath.conf does not exist, blacklisting all devices. so I guess there's a difference between F20 and F21 live there? F21 live has a /etc/multipath.conf which does not specify 'find_multipaths yes', so it doesn't get that 'blacklist all' behaviour...
in the context of deploying F20 I have been using CD net install images and a local mirror. in the context of testing F21A1 and F21BRC1 I have been using the KDE Live DVD image on USB.
so, I know we've asked you to do a lot of testing :), but it'd be really great if you could test with a 21 Beta RC4 (or RC1 or RC2, really, doesn't matter a whole lot) non-live image: network installer or Server DVD. I suspect you may find the bug doesn't happen there, because of the difference in multipath.conf . Still, I'm a bit curious because Ben's comment seems to indicate this should happen much *more* often with live installs...
Discussed at 2014-10-30 Go/No-Go meeting: http://meetbot.fedoraproject.org/fedora-meeting-2/2014-10-30/f21_beta_gono-go_meeting.2014-10-30-17.00.log.txt . Rejected as a blocker. We can see there's definitely a possibility things have changed in live installs between 20 and 21 such that there's more of a chance of people encountering this, but empirically, quite a few people have tested live installs of 21 Alpha and Beta composes on quite a lot of configurations and not hit this. It'll require further investigation and we should probably see if we can change the live environment's /etc/multipath.conf somehow, but we think the evidence tends to indicate this isn't sufficiently common to block the Beta release. We will document this and the workarounds - just reboot and try again until it works, or try editing multipath.conf before launching the installer, or use a non-live installer - in Common Bugs.
> just reboot and try again until it works FYI, drive ID worked *ONCE* and only *ONCE* for me -- when I performed the initial install over a Gentoo installation. Since I put F21B on the disks, the installer has *NEVER* recognized the local drives as local drives. If it recognizes them at all. it only recognizes them as multipath devices.
> 17:34:51 <nirik> and even reporter doesn't hit it always > 17:34:52 <adamw> right now we have one confirmed reporter who's only testing > with live images and who hits the bug 'sometimes', i kinda like those odds I understand your eagerness to release, but FWIW these statements don't represent an accurate understanding of the bug. I succeffully got drive recogniztion *ONCE* -- during the initial install over a Gentoo installation. Since then, I hit the bug *every* *time* I boot the live media on a system that already has an F21 install on it. The only variation or "intermittent" observence is whether the drives appear in the multipath box, or whether they don't appear in the multipath box and you have to scan for multipath devices to find them. They're never detected as local disks. Just wondering -- maybe the reason that nobody else is seeing this bug is because nobody is trying to use the Live DVD to install F21 over and existing F21 installation on your disks? Might be worth trying that.
we've done that lots of times. I did it at least four times just this morning.
(In reply to bob from comment #54) > Just wondering -- maybe the reason that nobody else is seeing this bug is > because nobody is trying to use the Live DVD to install F21 over and > existing F21 installation on your disks? Might be worth trying that. I have done this many times and haven't encountered this problem in 1-2 months, reported in bug 1114770. Fedora-Server-netinst-x86_64-21_Beta.iso (this should be RC4 it was downloaded just now but doesn't have -4 in the filename): /etc/multipath/wwids is present but only contains comments /etc/multipath.conf is present and contains 'find_multipaths yes' multipath -c /dev/sda says the device is not a valid multipath device Fedora-Live-Workstation-x86_64-21_Beta-4.iso /etc/multipath is empty /etc/multipath.conf is not present # multipath -c /dev/sda Oct 30 14:14:59 | DM multipath kernel driver not loaded Oct 30 14:14:59 | /etc/multipath.conf does not exist, blacklisting all devices. Oct 30 14:14:59 | A default multipath.conf file is located at Oct 30 14:14:59 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf Oct 30 14:14:59 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf Oct 30 14:14:59 | DM multipath kernel driver not loaded The not so great work around for this bug to maybe commonbugs, is create a install media with overlay, and modify /etc/multipath.conf to blacklist affected devices and then reboot. I did this while bug 1114770 was happening for me and it allowed me to proceed with other testing.
cmurf: huh, I thought I booted WS live and saw that /etc/multipath.conf did exist. have to take another look.
my two cents as a user: * all of the installation media should behave identically, consistently creating the same files when encountering the same local physical devices. If that's not happening then you have a design problem in need of attention. * I hope you're not suggesting that end users should have to create install media with overlays when they encounter this bug problem. that's an unduly burdensome expectation. users don't want to fight with an installer. Users want to pop in a CD and have things "just work." if the F21 installer doesn't work for a user, the path of least resistance is to go distro-hopping, not to spend a lot of time ferreting out a solution to an installer bug. * No end user in his right mind would commit as much time to this problem as I have. When encountering recalcitrant installation media, most users would just move on to installation media that doesn't give them a headache. I hate to say this to you guys, but ease of installation is why a lot of people have defected to Mint.
We released a *Beta*. Betas are pre-releases. They don't have to work perfectly. What you posted is a nice lovely ideal world, yes. The real world is this: https://apps.fedoraproject.org/packages/anaconda/bugs/all https://fedoraproject.org/wiki/Releases/21/Schedule you cannot achieve the lovely nice ideal world in practice. Does not happen. Not ever. And no, this is not limited to Fedora. All distributions with installers have bugs, *especially* installer storage bugs, it is a notoriously impossible area to perfect. It took me all of thirty seconds to find a bug in Mint's installer storage, it's the 9th most recently reported bug in Mint's launchpad product - https://bugs.launchpad.net/linuxmint/+bug/1381170 . There are 30 bugs there with the 'installer' tag - https://bugs.launchpad.net/linuxmint/+bugs?field.tag=installer . And that's not even the main place where bugs in Ubiquity are filed, which is https://bugs.launchpad.net/ubuntu/+source/ubiquity , where you will find such bugs as "Installer doesn't show encrypted partitions" and "In UEFI mode, installer crashes if OS is installed on 2nd drive and grub installs successfully but ubiquity doesn't think so" Installers are hard. Storage in installers is really, really, really hard. It gets harder the more capabilities you try and cover - hence you are being tripped up by Fedora's multipath support, which sucks, but for some of our users, multipath is a major feature. The point of a Beta release is to provide something that's broadly viable for deployment by people who want to do beta testing for the final release. This bug was not judged to be sufficiently common/severe that it's considered to violate that requirement. It's not like we're not saying it's a bug, we're saying it doesn't make sense to stop the Beta release from going out until this bug is fixed.
(In reply to bob from comment #58) > my two cents as a user: > > * all of the installation media should behave identically, No one's suggesting the current behavior is proper or intended. The nature of the problem isn't clear. The only other person who's reproduced it so far is me, and I haven't been able to do that at all post alpha, and as far as I recall not even any alpha TC or RCs. > * I hope you're not suggesting that end users should have to create install > media with overlays when they encounter this bug problem. I'm not. I'm suggesting a work around, thus far for one reproducer. It makes no sense to hold up the beta for this, especially since the problem's cause isn't identified let alone a fix found. The context under discussion is for beta, not final release. > * No end user in his right mind would commit as much time to this problem as > I have. When encountering recalcitrant installation media, most users would > just move on to installation media that doesn't give them a headache. I > hate to say this to you guys, but ease of installation is why a lot of > people have defected to Mint. I'm an end user, I don't code, I only test, I don't get paid for it. I don't appreciate off topic lengthy messages in bug reports, it gives me a headache and makes me not want to help when I have to wade through extraneous posts. This report should be 10% its current length. Anyone else arriving at this bug report will have an inordinate number of posts containing unhelpful and unrelated melodrama to wade through. Complaints about process go to devel@ list, feedback on installer go to anaconda@ list. Facts that progress *this* bug go here, it's not a forum message board. If you really want to fix this bug, either find exactly what the problem is and fix it, or just answer the questions you're asked. Thus far you're the only person who can consistently reproduce the bug, which is why you alone are being asked to test and report.
So, how does anaconda create /etc/multipath.conf? Does it write the file itself, or does in call # mpathconf --enable mpathconf should enable find_multipaths by default. However it does this by starting with a template file that contains these parameters. This template needs to be located at /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf If not, it will start with a black file. The template file starts with the following comment. # This is a basic configuration file with some examples, for device mapper # multipath. # # For a complete list of the default configuration values, run either # multipath -t # or # multipathd show config # # For a list of configuration options with descriptions, see the multipath.conf # man page So if that exists at the start of your multipath.conf file, you're using mpathconf and it found the template (or I suppose anaconda could be writing a file that's based off the template file). If there's no header, either anaconda wrote the file itself, or mpathconf didn't find the template. If anaconda is calling mpathconf, it could always try calling it with # mpathconf --enable --find_multipaths y To make sure it forces find_multipaths to be enabled, regardless of the template (or lack of one). In fact, if you do # mpathconf --enable --find_multipaths y --user_friendly_names y You will get the same configuration regardless of whether you find the current template, since that is the only two things it sets.
In c56 for live install, I hadn't launched anaconda. Kinda dumb. Fedora-Live-Workstation-x86_64-21_Beta-4.iso # cat /etc/multipath.conf blacklist { } defaults { user_friendly_names yes } # grep mpathconf /tmp/program.log mpathconf --user_friendly_names y --with_multipathd y There is no /usr/share/doc/device-mapper-multipath-0.4.9 directory but /usr/share/doc/device-mapper-multipath/multipath.conf exists and contains what you describe, but its pile of commented text is not in the /etc/multipath.conf file as created by anaconda. If I then: # rm -f /etc/multipath.conf # mpathconf --enable --find_multipaths y --user_friendly_names y # cat /etc/multipath.conf blacklist { } defaults { user_friendly_names yes find_multipaths yes } So it looks like anaconda isn't calling mpathconf correctly on lives?
Fedora-Server-netinst-x86_64-21_Beta.iso # cat /etc/multipath.conf defaults { find_multipaths yes user_friendly_names yes } blacklist { } # grep mpathconf /tmp/program.log mpathconf --user_friendly_names y --with_multipathd y There is no /usr/share/doc/device-mapper-multipath* directory at all. So the same mpathconf command in both netinst and live, but different /etc/multipath.conf - live is missing 'find_multipaths yes'.
So, it looks like the one issue here is that in the rpm spec files, %doc is set to /usr/share/doc/device-mapper-multipath in fedora, and /usr/share/doc/device-mapper-multipath-0.4.9 in RHEL7. I should probably check both places in mpathconf. However, that doesn't explain: 1. Why there is no template at all in Fedora-Server-netinst-x86_64-21_Beta.iso 2. Where "find_multipaths yes" is coming from in Fedora-Server-netinst-x86_64-21_Beta.iso. I assume that the same version of device-mapper-multipath is running in both cases. If that's so, and neither has the template file where mpathconf is looking for it, then they should both output the same file (since they are being called with the same options, and neither is finding the template). On Fedora-Server-netinst-x86_64-21_Beta.iso, could you try # rm -f /etc/multipath.conf # mpathconf --user_friendly_names y --with_multipathd y # cat /etc/multipath.conf Just to make sure that it really is setting find_multipaths in this version with no template... The only why that seems possible is if there already was an /etc/multipath.conf when mpathconf was run. If /etc/multipath.conf already exists, it just modifies it, instead of creating a file from scratch. If something created an /etc/multipath.conf file before mpathconf was running, that would explain it.
(In reply to Ben Marzinski from comment #64) > On Fedora-Server-netinst-x86_64-21_Beta.iso, could you try > > # rm -f /etc/multipath.conf Before doing this, ls -l on /etc reveals multipath.conf and multipath.conf.old with the same date and time stamp. Both have the same contents. > # mpathconf --user_friendly_names y --with_multipathd y > # cat /etc/multipath.conf blacklist { } defaults { user_friendly_names yes } Looks like the multipath.conf is baked into the ISO, already contains 'find_multipaths yes'? > > Just to make sure that it really is setting find_multipaths in this version > with no template... The only why that seems possible is if there already was > an > /etc/multipath.conf when mpathconf was run. If /etc/multipath.conf already > exists, it just modifies it, instead of creating a file from scratch. > > If something created an /etc/multipath.conf file before mpathconf was > running, that would explain it. Looks likely.
Mystery Solved! So, is there already an /etc/multipath.conf file whem mpathconf gets run in Fedora-Live-Workstation-x86_64-21_Beta-4.iso? If so, you'll need to fix this there, or make anaconda use "--with_find_multipaths y". If not, I can take this bug and make mpathconf search for the template file in both places, so that it will use the template, and you'll get find_multipaths set.
OK after mounting the squashfs, and then the rootfs images on netinst I'm finding there is already a multipath.conf which contains: defaults { find_multipaths yes user_friendly_names yes } What's the exact command anaconda should call? I'm seeing it use: mpathconf --user_friendly_names y --with_multipathd y but that it should be: mpathconf --enable --find_multipaths y --user_friendly_names y It should include --enable, correct? I'll leave it up to Adam if this should be set back to anaconda and fixed universally for netinst & lives there, or if it's a compose thing to include a correct multipath.conf. So this fixes a tangent of this bug. It doesn't actually fix the bug as reported by the OP, where in either netinst or live case, his SATA drives are either wrongly seen as multipath or simply not present at all in the installer. Maybe some kind of race condition since it's behaving non-deterministically. Also, FWIW, the two systems that have experienced this anomaly are EFI systems (UEFI and MacEFI respectively).
"So, it looks like the one issue here is that in the rpm spec files, %doc is set to /usr/share/doc/device-mapper-multipath in fedora, and /usr/share/doc/device-mapper-multipath-0.4.9 in RHEL7" I don't believe that's a bug anywhere, just a change in how we did docdirs in Fedora. They're no longer versioned, because all versioning them does is cause problems. At present there is no /etc/multipath.conf in the Workstation live before anaconda is run, but I'd feel much more comfortable with a solution that DTRT whether that file exists at a given time or not. It's hard to guarantee that will never change in any future live image. David Shea, would it be hard to have anaconda just specify the desired option explicitly, as given in c#66?
(In reply to Chris Murphy from comment #67) > OK after mounting the squashfs, and then the rootfs images on netinst I'm > finding there is already a multipath.conf which contains: > defaults { > find_multipaths yes > user_friendly_names yes > } > > What's the exact command anaconda should call? I'm seeing it use: > mpathconf --user_friendly_names y --with_multipathd y > but that it should be: > mpathconf --enable --find_multipaths y --user_friendly_names y > It should include --enable, correct? The --enable isn't actually necessary, unless you first disabled multipath with # mpathconf --disable which blacklists everything. It will undo that. Unless you already have a /etc/multipath.conf file where multipathing is disabled, you will just get an enabled config file. But, it can't hurt to have it. # mpathconf --enable --find_multipaths y --user_friendly_names y will force mpathconf to give you a correctly working config file, regardless of what was there before. > I'll leave it up to Adam if this should be set back to anaconda and fixed > universally for netinst & lives there, or if it's a compose thing to include > a correct multipath.conf. > > So this fixes a tangent of this bug. It doesn't actually fix the bug as > reported by the OP, where in either netinst or live case, his SATA drives > are either wrongly seen as multipath or simply not present at all in the > installer. Maybe some kind of race condition since it's behaving > non-deterministically. Also, FWIW, the two systems that have experienced > this anomaly are EFI systems (UEFI and MacEFI respectively).
(In reply to Adam Williamson (Red Hat) from comment #68) > "So, it looks like the one issue here is that in the rpm spec files, %doc is > set to /usr/share/doc/device-mapper-multipath in fedora, and > /usr/share/doc/device-mapper-multipath-0.4.9 in RHEL7" > > I don't believe that's a bug anywhere, just a change in how we did docdirs > in Fedora. They're no longer versioned, because all versioning them does is > cause problems. Well there is a bug insofar as mpathconf is checking in the wrong place on fedora. That needs to get fixed. But I'm certainly happy to see the doc versioning go.
(In reply to Adam Williamson (Red Hat) from comment #68) > At present there is no /etc/multipath.conf in the Workstation live before > anaconda is run, but I'd feel much more comfortable with a solution that > DTRT whether that file exists at a given time or not. It's hard to guarantee > that will never change in any future live image. David Shea, would it be > hard to have anaconda just specify the desired option explicitly, as given > in c#66? I guess not, it'd just be a string in blivet.
So for now let's assign back to anaconda and see if we can use an mpathconf invocation that will always do what we want for anaconda purposes. I believe this should be: mpathconf --enable --find_multipaths y --user_friendly_names y as specified in c#69.
I'd like to ask if the proposed fix would provide the desired functionality in the following scenario: User downloads a Live DVD image, uses liveusb-creator to burn the image to a bootable USB stick, while specifying the "persistent media" option. It would seem reasonable to expect that instead of burning DVDs, some users are installing via recyclable USB, and that they'd use persistent media when deploying the USB installer to perform installs on multiple boxes.
Yes. It would. (though in my experience very few people use the media persistence stuff any more anyway, it mostly was relevant when USB sticks cost a lot more than they do now.)
So this is specific to live media. I only had to read 73 comments to gather that. We already put find_multipaths yes in /etc/multipath.conf via lorax, but that's only for non-live media. I wasn't thinking that we have to do everything twice because of how live media is generated. We want multipath devices to be detected and activated before anaconda starts, even on live media. I don't think the proposed solution gets us there. What we need is for the live media to be set up correctly in the first place. With that done, blivet doesn't need to be changed at all IIUC.
I just looked over it again and saw that I was wrong in my last comment -- changing the mpathconf call in blivet should fix it for both types of media.
A couple of questions: 1. is the update going to bypass the freeze, and make it into F21 live media? 2. do you want me to test beforehand?
(In reply to bob from comment #77) > A couple of questions: > > 1. is the update going to bypass the freeze, and make it into F21 live media? Yes. > 2. do you want me to test beforehand? Please do. Add updates=https://dshea.fedorapeople.org/1154347.img to the boot command line to get the fix.
I'm not sure that I understand. Comment 78 sounds like you want me to use the existing F21 Beta Live DVD (known to be defective) to download a patch/overlay at boot time, which requires the testbed to have network connectivity. I was offering to test an updated build of the Live DVD that actually contains the fix.
(In reply to bob from comment #79) > I'm not sure that I understand. Comment 78 sounds like you want me to use > the existing F21 Beta Live DVD (known to be defective) to download a > patch/overlay at boot time, which requires the testbed to have network > connectivity. That is exactly what I'm saying, yes. There are some other, non-network based means of adding an updates image, described at https://git.fedorahosted.org/cgit/anaconda.git/tree/docs/boot-options.txt#n45 (this links to the description of inst.repo, which uses the same format for the argument). Http is usually the most straightforward, though. I don't know you're emphasizing that the F21 Beta live DVD is defective. That's the whole point of the update. > I was offering to test an updated build of the Live DVD that actually > contains the fix. Ok.
anaconda-21.48.15-1.fc21, python-blivet-0.61.10-1.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/python-blivet-0.61.10-1.fc21,anaconda-21.48.15-1.fc21
Package anaconda-21.48.15-1.fc21, python-blivet-0.61.10-1.fc21: * should fix your issue, * was pushed to the Fedora 21 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing anaconda-21.48.15-1.fc21 python-blivet-0.61.10-1.fc21' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-15420/python-blivet-0.61.10-1.fc21,anaconda-21.48.15-1.fc21 then log in and leave karma (feedback).
Package python-blivet-0.61.10-1.fc21, anaconda-21.48.16-1.fc21: * should fix your issue, * was pushed to the Fedora 21 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing python-blivet-0.61.10-1.fc21 anaconda-21.48.16-1.fc21' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-15420/anaconda-21.48.16-1.fc21,python-blivet-0.61.10-1.fc21 then log in and leave karma (feedback).
python-blivet-0.61.10-1.fc21, anaconda-21.48.16-1.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
I have encountered a perhaps related problem. I had a hard time upgrading from f20 to f21 due to issues with multipath. I have a laptop with both a SATA and mSATA disk (one is encrypted on BIOS level), I'm not using multipath. Booting fails because the kernel can not find the root disk. x220 kernel: snd_hda_intel 0000:00:1b.0: irq 31 for MSI/MSI-X x220 multipathd[415]: mpathc: failed in domap for addition of new path sdb x220 systemd-udevd[457]: error: /dev/sda1: No such file or directory As a temporary I delete the dm-multipath and multipath kernel modules before booting, and then it is fine.
I have encountered a perhaps related problem. I had a hard time upgrading from f20 to f21 due to issues with multipath. I have a laptop with both a SATA and mSATA disk (one is encrypted on BIOS level), I'm not using multipath. Booting fails because the kernel can not find the root disk. x220 kernel: snd_hda_intel 0000:00:1b.0: irq 31 for MSI/MSI-X x220 multipathd[415]: mpathc: failed in domap for addition of new path sdb x220 systemd-udevd[457]: error: /dev/sda1: No such file or directory As a temporary fix, I delete the dm-multipath and multipath kernel modules before booting, and then it is fine.
Mark: can you please file a new bug against device-mapper-multipath ? thanks!