1199019 – Fail to boot into the upgrade process on RHEVH 7.1 at the first time upgrade

Bug 1199019 - Fail to boot into the upgrade process on RHEVH 7.1 at the first time upgrade

Summary: Fail to boot into the upgrade process on RHEVH 7.1 at the first time upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-node
Sub Component:
Version:	3.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	ovirt-3.6.0-rc
Target Release:	3.6.0
Assignee:	Ryan Barry
QA Contact:	Huijuan Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:	1275956
Blocks:	1201165
TreeView+	depends on / blocked

Reported:	2015-03-05 10:26 UTC by Mengde Shang
Modified:	2016-03-09 14:17 UTC (History)
CC List:	15 users (show)
Fixed In Version:	ovirt-node-3.3.0-0.4.20150906git14a6024.el7ev
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1201165 (view as bug list)
Environment:
Last Closed:	2016-03-09 14:17:14 UTC
oVirt Team:	Node
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Error_detail_Upgrade.jpg (3.00 MB, image/jpeg) 2015-03-05 10:26 UTC, Mengde Shang	no flags	Details
All log files in /var/log (195.23 KB, application/x-gzip) 2015-03-05 10:29 UTC, Mengde Shang	no flags	Details
sosreport (5.02 MB, application/x-xz) 2015-03-05 10:30 UTC, Mengde Shang	no flags	Details
oops after an upgrade from 7.0 to 7.1 (63.55 KB, image/png) 2015-03-05 18:51 UTC, Fabian Deutsch	no flags	Details
Aneven longer stack trace. (30.56 KB, text/plain) 2015-03-05 19:10 UTC, Fabian Deutsch	no flags	Details
Stack trace including rd.debug (77.50 KB, text/plain) 2015-03-05 19:20 UTC, Fabian Deutsch	no flags	Details
the attachment is the log when remove the quiet and rhgb keyword from the kernel commandline on booting. (5.42 MB, application/x-xz) 2015-03-06 05:00 UTC, Mengde Shang	no flags	Details
upgrade_upgrade.png for comment 18 (43.05 KB, image/png) 2015-03-06 16:27 UTC, Ying Cui	no flags	Details
varlog for comment 18 (189.78 KB, application/x-gzip) 2015-03-06 16:29 UTC, Ying Cui	no flags	Details
sosreport for comment 18 (5.09 MB, application/x-xz) 2015-03-06 16:32 UTC, Ying Cui	no flags	Details
sosreport for comment 22 (5.10 MB, application/x-xz) 2015-03-06 17:04 UTC, Ying Cui	no flags	Details
varlog for comment 22 (208.92 KB, application/x-gzip) 2015-03-06 17:05 UTC, Ying Cui	no flags	Details
console_output for comment 22 (183.89 KB, text/plain) 2015-03-09 05:28 UTC, Ying Cui	no flags	Details
console_output_for_comment 27 (883.08 KB, text/plain) 2015-03-09 11:38 UTC, Ying Cui	no flags	Details
console_output for comment 30 (501.59 KB, text/plain) 2015-03-10 06:54 UTC, Ying Cui	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0378	normal	SHIPPED_LIVE	ovirt-node bug fix and enhancement update for RHEV 3.6	2016-03-09 19:06:36 UTC
oVirt gerrit	38603	master	MERGED	Don't validate SSH key ownership on install/reinstall	2020-07-18 14:08:47 UTC
oVirt gerrit	38637	ovirt-3.5	MERGED	Don't validate SSH key ownership on install/reinstall	2020-07-18 14:08:47 UTC
oVirt gerrit	38680	master	MERGED	Move hook execution to ovirt-post	2020-07-18 14:08:47 UTC
oVirt gerrit	38715	ovirt-3.5	MERGED	Move hook execution to ovirt-post	2020-07-18 14:08:47 UTC

Description Mengde Shang 2015-03-05 10:26:46 UTC

Created attachment 998288 [details]
Error_detail_Upgrade.jpg

Description of problem:
Upgrade from RHEV-H 7.0GA to RHEV-H 7.1 latest through TUI, RHEVM and CMD failed to reboot. When we finished upgrade, it can not reboot succeed. The error info is as Error_detail_Upgrade.jpg shows.

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.0-20150127.0
ovirt-node-3.2.1-6.el7.noarch
rhev-hypervisor7-7.1-20150226.0.el7ev
ovirt-node-3.2.1-9.el7.noarch

How reproducible:
100%
QA Whiteboard:
upgrade

Steps to Reproduce:
1. TUI install rhev-hypervisor7-7.0-20150127.0
2. Upgrade RHEV-H 7.0 to RHEV-H 7.1 in three ways:
   1)TUI
   2)CMD
   3)RHEVM 3.5 -- Red Hat Enterprise Virtualization Manager Version: 3.5.0-0.33.el6ev

Actual results:
1. After finished upgrade, it can not reboot successfully with the error like follows.
systemd-readahead[812]: Failed to open pack file: Read-only file system

Expected results:
1. It can upgrade RHEV-H 7.0 to RHEVH 7.1 and login rhevh7.1.

Additional info:

Comment 1 Mengde Shang 2015-03-05 10:29:20 UTC

Created attachment 998291 [details]
All log files in /var/log

Comment 2 Mengde Shang 2015-03-05 10:30:31 UTC

Created attachment 998303 [details]
sosreport

Comment 5 Fabian Deutsch 2015-03-05 11:31:32 UTC

I can not reproduce this bug in a plain VM.

Shang, does this bug appear on only one machine?

Can you please boot the machine with "rd.debug systemd.log_level=debug"
And remove the quiet and rhgb arguments from the cmdline.

Comment 6 Ying Cui 2015-03-05 12:00:39 UTC

(In reply to Fabian Deutsch from comment #5)
> I can not reproduce this bug in a plain VM.
> 
> Shang, does this bug appear on only one machine?

As I know, all Virt QE physical machines meet this issues. and 100% reproduce.

Comment 7 Fabian Deutsch 2015-03-05 18:51:41 UTC

Created attachment 998507 [details]
oops after an upgrade from 7.0 to 7.1

I was able to capture this oops in 50% of the cases on reboot:

[26634.453675] SQUASHFS error: unable to read inode lookup table
[26634.467530] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[26634.468108] IP: [<ffffffffa02f6fab>] dm_exception_store_set_chunk_size+0x7b/0x120 [dm_snapshot]
[26634.468760] PGD 0 
[26634.468962] Oops: 0000 [#1] SMP 
[26634.469203] Modules linked in: dm_snapshot dm_bufio ext4 mbcache jbd2 squashfs dm_service_time sd_mod crc_t10dif sr_mod cdrom ata_generic pata_acpi virtio_net virtio_balloon crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel bochs_drm syscopyarea sysfillrect sysimgblt drm_kms_helper ttm aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ahci libahci ata_piix virtio_pci virtio_ring virtio drm i2c_core libata sunrpc dm_mirror dm_region_hash dm_log loop dm_multipath dm_mod
[26634.472715] CPU: 0 PID: 638 Comm: dmsetup Not tainted 3.10.0-230.el7.x86_64 #1
[26634.473231] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[26634.473861] task: ffff880075a15b00 ti: ffff880034b30000 task.ti: ffff880034b30000
[26634.474399] RIP: 0010:[<ffffffffa02f6fab>]  [<ffffffffa02f6fab>] dm_exception_store_set_chunk_size+0x7b/0x120 [dm_snapshot]
[26634.475175] RSP: 0018:ffff880034b33b68  EFLAGS: 00010246
[26634.475518] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001
[26634.475987] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8800359eb600
[26634.476476] RBP: ffff880034b33b80 R08: 0000000000000000 R09: 0000000000000001
[26634.476928] R10: 000000000000000a R11: f000000000000000 R12: ffff8800353e8d80
[26634.477420] R13: ffffc900003b6088 R14: ffff880034b33c04 R15: ffff8800353e8d80
[26634.477880] FS:  00007f319f9df800(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[26634.478438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26634.478803] CR2: 0000000000000098 CR3: 0000000034415000 CR4: 00000000001406f0
[26634.479295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[26634.479746] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[26634.480242] Stack:
[26634.480367]  ffffc900003b6040 ffff8800359eb600 ffff8800353e8a90 ffff880034b33be0
[26634.480883]  ffffffffa02f721f 0000000000000018 ffffffffa02fb3e0 ffff8800359eb728
[26634.481430]  00000008353e83d8 00000000c7c4fd00 ffffc900003b6040 ffff8800353e8a80
[26634.481934] Call Trace:
[26634.482092]  [<ffffffffa02f721f>] dm_exception_store_create+0x1cf/0x240 [dm_snapshot]
[26634.482631]  [<ffffffffa02f539b>] snapshot_ctr+0x14b/0x630 [dm_snapshot]
[26634.483076]  [<ffffffffa0005638>] ? dm_split_args+0x68/0x170 [dm_mod]
[26634.483512]  [<ffffffffa00058b7>] dm_table_add_target+0x177/0x460 [dm_mod]
[26634.483968]  [<ffffffffa0008e57>] table_load+0x157/0x380 [dm_mod]
[26634.484382]  [<ffffffffa0008d00>] ? retrieve_status+0x1c0/0x1c0 [dm_mod]
[26634.484826]  [<ffffffffa0009ac5>] ctl_ioctl+0x255/0x500 [dm_mod]
[26634.485231]  [<ffffffffa0009d83>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[26634.485625]  [<ffffffff811d9bd5>] do_vfs_ioctl+0x2e5/0x4c0
[26634.485988]  [<ffffffff8126f0ae>] ? file_has_perm+0xae/0xc0
[26634.486368]  [<ffffffff811d9e51>] SyS_ioctl+0xa1/0xc0
[26634.486701]  [<ffffffff8160ed99>] ? do_async_page_fault+0x29/0xe0
[26634.487093]  [<ffffffff81613ea9>] system_call_fastpath+0x16/0x1b
[26634.487512] Code: 14 06 00 00 66 85 c0 74 15 66 c1 e8 09 31 d2 0f b7 c8 89 d8 f7 f1 85 d2 0f 85 8a 00 00 00 49 8b 7c 24 08 e8 58 d0 ff ff 48 8b 00 <48> 8b 80 98 00 00 00 48 8b 80 60 03 00 00 48 85 c0 74 1d 0f b7 
[26634.489314] RIP  [<ffffffffa02f6fab>] dm_exception_store_set_chunk_size+0x7b/0x120 [dm_snapshot]
[26634.489906]  RSP <ffff880034b33b68>
[26634.490131] CR2: 0000000000000098
[26634.490397] ---[ end trace 3bb6c95d82d638f6 ]---
[26634.490709] Kernel panic - not syncing: Fatal exception
[26634.491151] drm_kms_helper: panic occurred, switching back to text console

Comment 8 Fabian Deutsch 2015-03-05 19:10:44 UTC

Created attachment 998521 [details]
Aneven longer stack trace.

My current assumption is that something goes wrong with the squashfs uncompression.

But I am not sure how to debug such an issue.

Virt QE, can you please just remove the quiet keyword from the kernel commandline on booting, then you should get the stack trace which is causing the failed boot for you.

Comment 9 Fabian Deutsch 2015-03-05 19:20:03 UTC

Created attachment 998522 [details]
Stack trace including rd.debug

An additional log, including rd.debug data.

Comment 11 Ryan Barry 2015-03-05 19:25:04 UTC

As a note, I was unable to reproduce this on VMs or on two physical systems tested (both Dell, one R300 and one Optiplex 9020). Seconding the request for logs without "quiet rhgb"

Comment 12 Fabian Deutsch 2015-03-05 19:32:02 UTC

Virt qe, can you please try the following things:

1. Remove quiet and boot, to see if you get a stack trace
2. Boot in permissive mode (enforcing=0) to see if the bug is fixed

Comment 13 Phillip Lougher 2015-03-05 22:02:31 UTC

(In reply to Fabian Deutsch from comment #8)
> Created attachment 998521 [details]
> Aneven longer stack trace.
> 
> My current assumption is that something goes wrong with the squashfs
> uncompression.
> 

Unlikely to be a bug in Squashfs, a corrupted filesystem is much more likely.

People constantly misinterpret the error messages from Squashfs

"Squashfs error: unable to read inode lookup table"

Does not mean an error *in* Squashfs, merely that Squashfs experienced an error caused elsewhere which prevented it from continuing.

In this case, this error will be printed at Squashfs mount time if it can't read the inode lookup table.

99% of the time this is because the file has been truncated.

Two points here:

1.  This will cause the Squashfs filesystem mount to fail.  This failure to mount should be checked, but it evidently isn't.

2. If you didn't mount the Squashfs filesystem, then you can't mount the embedded ext3/ext4 filesystem.


> But I am not sure how to debug such an issue.

Well first you need to check the Squashfs filesystems in the rpm are
correct.  I in fact did that, and they're correct.

Second, when the Squashfs filesystem fails to mount, the system needs to fall back to a bash prompt, which will allow you to verify the filesystem, is it the correct length, does the checksum match etc.

There's probably a debug option to do this in dmsquash-live.

In any case, this is likely a truncation/file corruption issue, it is not a bug in Squashfs.


> 
> Virt QE, can you please just remove the quiet keyword from the kernel
> commandline on booting, then you should get the stack trace which is causing
> the failed boot for you.

Comment 14 Mengde Shang 2015-03-06 05:00:06 UTC

Created attachment 998657 [details]
the attachment is the log when remove the quiet and rhgb keyword from the kernel commandline on booting.

Comment 16 Mengde Shang 2015-03-06 09:54:29 UTC

No matter with or without enforcing=0 when boot, I still can't upgrade from RHEV-H 7.0 to RHEV-H 7.1 
If you fail to upgrade RHEV-H 7.0 to RHEV-H 7.1, you must re-install the RHEV-H 7.0 before the next upgrade. If you don't re-install RHEV-H 7.0, the upgrade will success.

Comment 17 Ying Cui 2015-03-06 15:51:15 UTC

checked comment 16, and with my testing with enforcing=0 to upgrade rhevh 7.0 to rhevh 7.1, with enforcing=0 rhevh upgrade still failed. 

So, let's waive the comment 15.

Comment 18 Ying Cui 2015-03-06 16:25:56 UTC

* with enforcing=0
Test steps:
1. Installed rhevh 7.0
2. upgrade to rhevh 7.1 with enforcing=0
3. first time upgrade hang. see screenshot: upgrade_hang.png
4. then reboot and upgrade rhevh 7.1 _again_, the second upgrade is successful.

Comment 19 Ying Cui 2015-03-06 16:27:26 UTC

Created attachment 998933 [details]
upgrade_upgrade.png for comment 18

Comment 20 Ying Cui 2015-03-06 16:29:29 UTC

Created attachment 998934 [details]
varlog for comment 18

Comment 21 Ying Cui 2015-03-06 16:32:06 UTC

Created attachment 998935 [details]
sosreport for comment 18

Comment 22 Ying Cui 2015-03-06 17:01:29 UTC

* without enforcing=0
Test steps:
1. totally clean installed rhevh 7.0(uninstall, then firstboot install.)
2. upgrade to rhevh 7.1 _without_ enforcing=0
3. first time upgrade hang. the same screenshot: upgrade_hang.png in comment 18.
4. then reboot and upgrade rhevh 7.1 _again_, the second upgrade is successful.

Comment 23 Ying Cui 2015-03-06 17:04:14 UTC

Created attachment 998945 [details]
sosreport for comment 22

Comment 24 Ying Cui 2015-03-06 17:05:16 UTC

Created attachment 998946 [details]
varlog for comment 22

Comment 26 Ying Cui 2015-03-09 05:28:42 UTC

Created attachment 999435 [details]
console_output for comment 22

Comment 27 Fabian Deutsch 2015-03-09 10:03:37 UTC

In the log from comment 22 I see nothing special.

Could you please provide the logs of a failed boot with the following kargs: systemd.log_level=debug rd.debug debug

Comment 28 Ying Cui 2015-03-09 11:38:40 UTC

Created attachment 999504 [details]
console_output_for_comment 27

Comment 29 Ryan Barry 2015-03-09 15:16:58 UTC

(In reply to Ying Cui from comment #22)
> * without enforcing=0
> Test steps:
> 1. totally clean installed rhevh 7.0(uninstall, then firstboot install.)
> 2. upgrade to rhevh 7.1 _without_ enforcing=0
> 3. first time upgrade hang. the same screenshot: upgrade_hang.png in comment
> 18.
> 4. then reboot and upgrade rhevh 7.1 _again_, the second upgrade is
> successful.

Does the double-upgrade also work with SElinux in enforcing mode?

I haven't been able to reproduce this (though I'm going to try again today), and I don't see anything obvious in the console output, but knowing whether SElinux is involved would be helpful.

Comment 32 Ying Cui 2015-03-10 03:11:13 UTC

(In reply to Ryan Barry from comment #29)
> (In reply to Ying Cui from comment #22)
> > * without enforcing=0
> > Test steps:
> > 1. totally clean installed rhevh 7.0(uninstall, then firstboot install.)
> > 2. upgrade to rhevh 7.1 _without_ enforcing=0
> > 3. first time upgrade hang. the same screenshot: upgrade_hang.png in comment
> > 18.
> > 4. then reboot and upgrade rhevh 7.1 _again_, the second upgrade is
> > successful.
> 
> Does the double-upgrade also work with SElinux in enforcing mode?

Yes, the second upgrade also work with selinux in enforcing mode.


> 
> I haven't been able to reproduce this (though I'm going to try again today),
> and I don't see anything obvious in the console output, but knowing whether
> SElinux is involved would be helpful.

Comment 34 Ying Cui 2015-03-10 06:54:42 UTC

Created attachment 999755 [details]
console_output for comment 30

Test machines:
1. dell r210 server - local disk - 5 times
2. dell 9010 desktop - local disk - 3 times
3. hp 5808 desktop - local disk - 3 times

_Always_ upgrade hang when the first time upgrade with selinux in enforcing mode, and the second upgrade with selinux in enforcing mode successful. 

Test steps:
1. clean install rhevh 7.0 rhev-hypervisor7-7.0-20150127.0 in local disk( uninstall firstly, then TUI install rhevh)
2. first time upgrade rhevh 7.0 to rhevh 7.1(rhev-hypervisor7-7.1-20150309.28.iso)(tested both TUI upgrade and cmdline with upgrade kargs, the phenomenon is the same, hang!)
3. after upgrade hang, reboot rhevh
4. again upgrade(second upgrade) via TUI or cmdline
5. the second upgrade successful and can login rhevh 7.1 successful.

Additional info:
rhevh-7.1-20150304.0.el7ev.iso upgrade to rhev-hypervisor7-7.1-20150309.28.iso  the first time upgrade successful, both are rhevh 7.1.

Comment 35 Fabian Deutsch 2015-03-10 08:52:02 UTC

Thanks for the very previse informations.

Maybe it is that the 7.1 kernel (or some component) has issue beeing installed alongside of 7.0.

Could you please try the following:

1. Install RHEV-H 7.0
2. Upgrade to 7.1
After 2: Please check that the reboot into 7.1 fails after installation

(Now the important part:)
3. Boot the installation CD, into the installer, but do NOT install
4. Drop to shell using F2
5. Run: blkid -L RootBackup to find the partition with the 7.0 image (after installation 7.0 is on backup)
example: /dev/sda3
6. Run parted, then in parted: "rm 3" (where the 3 comes from sda3), then "q" to quit.
7. Reboot, and try rebooting into 7.1

Comment 36 Ying Cui 2015-03-10 10:10:49 UTC

(In reply to Fabian Deutsch from comment #35)
> Could you please try the following:
> 
> 1. Install RHEV-H 7.0
> 2. Upgrade to 7.1
> After 2: Please check that the reboot into 7.1 fails after installation

Fabian, I am not clearly enough on this comment, here is it the first time upgrade? or second time upgrade?
If the first time upgrade, during upgrade hang, reboot the rhevh manually, then there is rhevh _7.0_ only, no 7.1 rhevh installation. Based on this, the following trying for important part is invalid. 

If the second time upgrade, rhevh 7.1 upgrade process successful, and after reboot, rhevh 7.1 can be login. we do not need to do the following important part. 

Any thoughts?

Thanks


> 
> (Now the important part:)
> 3. Boot the installation CD, into the installer, but do NOT install
> 4. Drop to shell using F2
> 5. Run: blkid -L RootBackup to find the partition with the 7.0 image (after
> installation 7.0 is on backup)
> example: /dev/sda3
> 6. Run parted, then in parted: "rm 3" (where the 3 comes from sda3), then
> "q" to quit.
> 7. Reboot, and try rebooting into 7.1

Comment 37 Fabian Deutsch 2015-03-10 11:12:36 UTC

(In reply to Ying Cui from comment #36)

…

> Fabian, I am not clearly enough on this comment, here is it the first time
> upgrade? or second time upgrade?
> If the first time upgrade, during upgrade hang, reboot the rhevh manually,
> then there is rhevh _7.0_ only, no 7.1 rhevh installation. Based on this,
> the following trying for important part is invalid. 

Okay, it sounds like there are missunderstandings, some questions:

1. In comment 26 there is a console output: From which boot is this?

2. When does the machine hang exactly?

3. After the first upgrade, does a boot entry appear?

Comment 38 Ying Cui 2015-03-10 11:56:00 UTC

(In reply to Fabian Deutsch from comment #37)
> (In reply to Ying Cui from comment #36)
> 
> …
> 
> > Fabian, I am not clearly enough on this comment, here is it the first time
> > upgrade? or second time upgrade?
> > If the first time upgrade, during upgrade hang, reboot the rhevh manually,
> > then there is rhevh _7.0_ only, no 7.1 rhevh installation. Based on this,
> > the following trying for important part is invalid. 
> 
> Okay, it sounds like there are missunderstandings, some questions:
> 
> 1. In comment 26 there is a console output: From which boot is this?

During upgrade process boot, hang happen during upgrade. and checked the screen output.log in comment 26, the hang happen after plymouth checking at least from the view of phenomenon like this:

<snip>

[   78.662208] systemd[1]: Starting Terminate Plymouth Boot Screen...

[   78.669886] systemd[1]: About to execute: /usr/bin/plymouth quit
         Startin[   78.676237] systemd[1]: Forked /usr/bin/plymouth as 1770
[   78.676683] systemd[1770]: Executing: /usr/bin/plymouth quit
g Terminate Plym[   78.688558] systemd[1]: plymouth-quit.service changed dead -> start
[   78.696232] systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
outh Boot Screen[   78.703221] systemd[1]: About to execute: /usr/bin/plymouth --wait
...

         S[   78.710832] systemd[1]: Forked /usr/bin/plymouth as 1771
[   78.711246] systemd[1771]: Executing: /usr/bin/plymouth --wait
tarting Wait for[   78.723386] systemd[1]: plymouth-quit-wait.service changed dead -> start

</snip>


> 
> 2. When does the machine hang exactly?

upgrade process hang, but if you enter ctrl+alt+del, can reboot rhevh manually.

> 
> 3. After the first upgrade, does a boot entry appear?

no boot entry of rhevh 7.1 after first upgrade, only 7.0 boot entry and after the first time upgrade, the rhevh _7.0_ can be login as well.

Comment 39 Fabian Deutsch 2015-03-10 14:21:56 UTC

After the clarifications I can now reproduce it as well.

It would have helped if someone would haven mentioned that this bug appears when booting _into_ the upgrade process. It sounded like th eboot _after_ the upgrade goes wrong.

I also do not understand how this bug can be reproduced with RHEV-M 3.5. Please clarify the exact steps here.

The steps to reproduce:

1. Install RHEV-H 7.0 (install using regular qemu)
2. Reboot with the RHEV-H 7.1 ISO (booted with qemu in snapshot mode now, to not touch the virtual disk)
2.a During the boot of the RHEV-H 7.1 iso the boot process hangs and does not enter the TUI installer

As noted in several comments, this only happens the first time, on the second boot of the RHEV-H 7.1 iso, the boot succeeds.
This indicates that some on disk changed was nevessary to boot.

Using qemu to install 7.0, and then using qemu -snapshot to boot 7.1 will help to reproduce the error with every boot.

Comment 40 Fabian Deutsch 2015-03-10 14:26:44 UTC

In production the workaround can be to just reboot the installer once it hangs.

Comment 41 Fabian Deutsch 2015-03-10 15:06:40 UTC

Commands used to reproduce:

Prepare the disk:
$ qemu-img create -f qcow2 dst.img 20G

Install 7.0 and shut down:
$ qemu-kvm -m 2048 -cdrom rhev-hypervisor7-7.0-*.iso -serial stdio -hda dst.img

Boot into the 7.1 installer (which hangs while booting into it):
$ qemu-kvm -m 2048 -cdrom rhev-hypervisor7-7.1-20150304.0.iso -serial stdio -hda dst.img -snapshot -boot d

Comment 42 Ryan Barry 2015-03-10 22:31:41 UTC

No solution today. 

There are messages from logind and dbus which look like errors, but those also appear on a successful boot. I put together a scratch build with a systemd debug shell enabled so I can look through systemd-analyze and see what's hanging up, hopefully. Judging from console output, it appears to be ovirt-early, but systemd does not warn about any hung jobs or jobs still starting, which makes me wonder if it's a systemd issue.

Will continue looking tomorrow.

Comment 46 Ying Cui 2015-03-13 11:09:44 UTC

Based on my today's testing, I noticed this issue is a regression issue.
Upgrade from rhev-hypervisor7-7.0-20150127.0 to rhev-hypervisor7-7.1-20150213.0 can be successful, and no hang happen during first time upgrade process. 

So I added regression keywords on this bug, and regression happened between rhev-hypervisor7-7.1-20150213.0 and rhev-hypervisor7-7.1-20150226.0.

Comment 50 Huijuan Zhao 2015-11-12 12:41:18 UTC

It is blocked by Bug 1275956, 1263648.
I will verify this issue after Bug 1275956, 1263648 are fixed.

Comment 51 Fabian Deutsch 2015-11-17 13:06:19 UTC

Why is this bug blocked by bug 1263648? The issue in bug 1263648 is only affecting an optional flow.

Please drop the dependency if you agree.

Comment 52 Huijuan Zhao 2015-11-23 06:37:01 UTC

Fabian, I have dropped the bug 1263648 dependency on Comment 50.

Comment 53 Huijuan Zhao 2015-11-23 07:52:31 UTC

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.1-20151015.0.el7ev
ovirt-node-3.2.3-23.el7.noarch

rhev-hypervisor7-7.2-20151112.1.el7ev
ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch


Test Steps:
1. TUI install rhev-hypervisor7-7.1-20151015.0.el7ev
2. Upgrade RHEV-H 7.1-20151015.0 to rhev-hypervisor7-7.2-20151112.1.el7ev in three ways:
   1)TUI
   2)CMD
   3)RHEVM 3.5 -- Red Hat Enterprise Virtualization Manager Version: 3.5.6.2-0.1.el6ev

Test results:
1. It can upgrade RHEV-H 7.1 to RHEVH 7.2 and login rhevh7.2 successful.

So this bug is fixed on rhev-hypervisor7-7.2-20151112.1.el7ev, I will change the status to verified.

Comment 55 errata-xmlrpc 2016-03-09 14:17:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html

Note You need to log in before you can comment on or make changes to this bug.