Bug 1731069

Summary: After reboot during system installation, the system can not find the boot option wrote into the disk [Power8]
Product: Red Hat Enterprise Linux 7 Reporter: Ping Zhang <pizhang>
Component: grub2Assignee: Bootloader engineering team <bootloader-eng-team>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Release Test Team <release-test-team-automation>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.7CC: bugproxy, didoming, fmartine, fnovak, hannsj_uhl, iranna.ankad, jkachuck, kzhang, pizhang
Target Milestone: rc   
Target Release: 7.7   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-14 13:48:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1689150, 1689420, 1776446    
Attachments:
Description Flags
the full failed console.log
none
successful console.log
none
Successful_Petitboot-v1.4.4-91eed07_grub.cfg
none
Successful-Petitboot-v1.4.4-91eed07.grub.cfg
none
Part1 of RH-Bug1731069.tar.gz
none
Part2 of RH-Bug1731069.tar.gz
none
Part3 of RH-Bug1731069.tar.gz
none
Part4 of RH-Bug1731069.tar.gz none

Description Ping Zhang 2019-07-18 09:33:43 UTC
Created attachment 1591734 [details]
the full failed console.log

Description of problem:
When i ran my tests for RHEL-7.7 on p8 systems, the test always failed because the system can not install successfully. It always boot from network interface,but the disk with 
wrote boot option,as below:
 Petitboot (v1.4.4-e1658ec)    
    
 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq     
     
  System information    
  System configuration    
  System status log    
  Language    
  Rescan devices    
  Retrieve config from URL    
 *     
 Exit to shell               
    
   
 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq     
    
  Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help    
 Welcome to Petitboot  Info: Waiting for device discovery   
      
8286-42A 103519V     
 [enP3p5s0f0] Configuring with DHCP       
Processing DHCP lease response (ip: 10.19.15.81)       
Requesting config tftp://10.19.42.13/bootloader/netqe-p8-02.knqe. [?7ll    
      
     
 M M      
     
[Network: enP3p5s0f0 / 40:f2:e9:5a:44:fc]   
    
netboot enP3p5s0f0 (pxelinux.0)       
   1 downloads in progress...  
  [enP3p5s0f0] Failed to download tftp://10.19.42.13/bootloader/netqe-p8-02.knqe [?7l.    
[-- MARK -- Wed May 22 19:25:00 2019] 
[-- MARK -- Wed May 22 19:30:00 2019] 
What's more, it will cause the system to hang.

However, when i arrange test for RHEL8.1.0 and RHEL7.6 on these systems,
It works well, the testing ran smoothly on these P8 systems.

Version-Release number of selected component (if applicable):
RHEL7.7, even the latest RHEL7.7 it also will encounter this problem.

How reproducible:
Arrange some multihost test on two p8 systems, or just provision two p8 systems via beaker xml.
Here are my multihost jobs:
https://beaker.engineering.redhat.com/jobs/3599302
sometimes the singlehost job also will reproduce this issue:
https://beaker.engineering.redhat.com/jobs/3614795

Steps to Reproduce:
1. submit a job with some multihost task
2. check the system installation of two system

Actual results:
the system can not find the boot option on disk after reboot during system installation, Which caused
6 times multihost testcase for RHEL7.7, failed 5 times 

Expected results:
the system can boot from disk after reboot during system installation. 

Additional info:
hosts:
netqe-p8-01.knqe.lab.eng.bos.redhat.com 
netqe-p8-02.knqe.lab.eng.bos.redhat.com

Comment 2 Ping Zhang 2019-07-18 09:34:32 UTC
Created attachment 1591735 [details]
successful console.log

Comment 3 Javier Martinez Canillas 2019-07-18 15:15:29 UTC
Can you please share the /boot/grub2/grub.cfg and /boot/grub2/grubenv files for the successful and failing cases?

Also, I noticed that you are using a different Petitboot version:

 - Petitboot (v1.4.4-91eed07) in the successful case.
 - Petitboot (v1.4.4-e1658ec) in the failing case.

Could you please test with the same machine and Petitboot version just to make sure that the problem is not in the OPAL firmware? Since the grub.cfg and grubenv files are parsed by Petitboot and not grub2 for ppc64le PowerNV (Non-Virtualized). It still could be though that the grub tools are not generating a correct grub config file in the failing case.

Comment 4 IBM Bug Proxy 2019-08-01 20:20:18 UTC
------- Comment From diegodo.com 2019-08-01 16:13 EDT-------
Hi,

please, could you provide the files asked in the previous comment?

Thanks

Comment 5 IBM Bug Proxy 2019-08-14 18:40:27 UTC
------- Comment From mbringm.com 2019-08-14 14:39 EDT-------
RedHat:
We need more information here.  We do not have access to the beaker environment
for replication or debugging.

* Can you generate a sosreport for the platform?
* Firmware versions
* PowerNV or PowerVM configuration
* Adapter configuration

Also, there was no response to Frank's question/request regarding the 2 different versions of petitboot
that were observed.

Comment 6 IBM Bug Proxy 2019-08-15 14:32:47 UTC
------- Comment From mbringm.com 2019-08-14 14:40 EDT-------
RedHat:
Since RHEL 7.7 has gone to GA, is this still an issue?

Comment 7 Ping Zhang 2019-08-16 09:53:55 UTC
Created attachment 1604322 [details]
Successful_Petitboot-v1.4.4-91eed07_grub.cfg

Comment 8 Ping Zhang 2019-08-16 10:06:54 UTC
Created attachment 1604324 [details]
Successful-Petitboot-v1.4.4-91eed07.grub.cfg

Comment 9 Ping Zhang 2019-08-16 10:15:40 UTC
(In reply to Javier Martinez Canillas from comment #3)
> Can you please share the /boot/grub2/grub.cfg and /boot/grub2/grubenv files
> for the successful and failing cases?
> 
> Also, I noticed that you are using a different Petitboot version:
> 
>  - Petitboot (v1.4.4-91eed07) in the successful case.
>  - Petitboot (v1.4.4-e1658ec) in the failing case.
> 
> Could you please test with the same machine and Petitboot version just to
> make sure that the problem is not in the OPAL firmware? Since the grub.cfg
> and grubenv files are parsed by Petitboot and not grub2 for ppc64le PowerNV
> (Non-Virtualized). It still could be though that the grub tools are not
> generating a correct grub config file in the failing case.

for the successful cases, i uploaded two grub.cfg, and the grubenv is as below:
cat /boot/grub2/grubenv
# GRUB Environment Block
saved_entry=Red Hat Enterprise Linux Server (3.10.0-1062.el7.ppc64le) 7.7 (Maipo)
#####################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################

for the failed cases, I can not found the /boot directory, so that there are no grub.cfg or grubenv file.

Comment 10 Ping Zhang 2019-08-16 11:28:01 UTC
(In reply to IBM Bug Proxy from comment #5)
> ------- Comment From mbringm.com 2019-08-14 14:39 EDT-------
> RedHat:
> We need more information here.  We do not have access to the beaker
> environment
> for replication or debugging.
> 
> * Can you generate a sosreport for the platform?
> * Firmware versions
> * PowerNV or PowerVM configuration
> * Adapter configuration
> 
> Also, there was no response to Frank's question/request regarding the 2
> different versions of petitboot
> that were observed.

I generate the sosreport for this two systems, but can not upload to bugzilla,
I did not found some good method to share these file with you.

Comment 11 IBM Bug Proxy 2019-08-16 14:50:41 UTC
------- Comment From chavez.com 2019-08-16 10:45 EDT-------
Hello,

Please try https://testcase.software.ibm.com and login with anonymous and no password. Navigate to the /toibm/linux directory and upload the files there. Once done, please add a comment here with the name of the file(s) uploaded.

Comment 12 Ping Zhang 2019-08-20 11:03:31 UTC
(In reply to IBM Bug Proxy from comment #11)
> ------- Comment From chavez.com 2019-08-16 10:45 EDT-------
> Hello,
> 
> Please try https://testcase.software.ibm.com and login with anonymous and no
> password. Navigate to the /toibm/linux directory and upload the files there.
> Once done, please add a comment here with the name of the file(s) uploaded.

I create a tar file named RH-Bug1731069.tar which includes two sosreports of
these two systems. And I think i uploaded it two the /toibm/linux directory,
but i am not sure, because i can not read that directory.

Comment 13 IBM Bug Proxy 2019-08-20 15:42:51 UTC
------- Comment From mbringm.com 2019-08-20 11:35 EDT-------
(In reply to comment #16)
> (In reply to IBM Bug Proxy from comment #11)
> > Hello,
> >
> > Please try https://testcase.software.ibm.com and login with anonymous and no
> > password. Navigate to the /toibm/linux directory and upload the files there.
> > Once done, please add a comment here with the name of the file(s) uploaded.
> I create a tar file named RH-Bug1731069.tar which includes two sosreports of
> these two systems. And I think i uploaded it two the /toibm/linux directory,
> but i am not sure, because i can not read that directory.

I just checked that directory, but do not see any file with the name RH-Bug1731069.tar.
Did you select the file locally on your system with 'Browse' before using the 'Upload (binary)' button?

Comment 14 Ping Zhang 2019-08-21 02:09:32 UTC
(In reply to IBM Bug Proxy from comment #13)
> ------- Comment From mbringm.com 2019-08-20 11:35 EDT-------
> (In reply to comment #16)
> > (In reply to IBM Bug Proxy from comment #11)
> > > Hello,
> > >
> > > Please try https://testcase.software.ibm.com and login with anonymous and no
> > > password. Navigate to the /toibm/linux directory and upload the files there.
> > > Once done, please add a comment here with the name of the file(s) uploaded.
> > I create a tar file named RH-Bug1731069.tar which includes two sosreports of
> > these two systems. And I think i uploaded it two the /toibm/linux directory,
> > but i am not sure, because i can not read that directory.
> 
> I just checked that directory, but do not see any file with the name
> RH-Bug1731069.tar.
> Did you select the file locally on your system with 'Browse' before using
> the 'Upload (binary)' button?
Maybe the file is too large, I try to upload the compressed version of it, it seems 
successful at first, without the Error 403.
Maybe you can found the file named RH-Bug1731069.zip or RH-Bug1731069.tar.gz in this 
directory.

Comment 15 IBM Bug Proxy 2019-08-21 14:20:27 UTC
------- Comment From mbringm.com 2019-08-21 10:15 EDT-------
> Maybe the file is too large, I try to upload the compressed version of it,
> it seems successful at first, without the Error 403.
> Maybe you can found the file named RH-Bug1731069.zip or RH-Bug1731069.tar.gz
> in this directory.

Don't see either file.  Investigating problem.

Comment 16 IBM Bug Proxy 2019-08-21 16:31:09 UTC
------- Comment From chavez.com 2019-08-21 12:26 EDT-------
While we figure out what is going on with testcase, if it is just sosreports you want to provide and they exceed the attachment size limit, consider using the split command, e.g.

split -b 4M sosreport.tar.xz

and let us know the order of the pieces and we can re-assemble them with

cat part1 part2 part3 > sosreport.tar.xz

Comment 17 Ping Zhang 2019-08-30 09:19:37 UTC
Created attachment 1609811 [details]
Part1 of RH-Bug1731069.tar.gz

Comment 18 Ping Zhang 2019-08-30 09:20:27 UTC
Created attachment 1609812 [details]
Part2 of RH-Bug1731069.tar.gz

Comment 19 Ping Zhang 2019-08-30 09:22:06 UTC
Created attachment 1609813 [details]
Part3 of RH-Bug1731069.tar.gz

Comment 20 Ping Zhang 2019-08-30 09:23:36 UTC
Created attachment 1609815 [details]
Part4 of RH-Bug1731069.tar.gz

Comment 21 IBM Bug Proxy 2019-09-09 20:00:44 UTC
------- Comment From diegodo.com 2019-09-09 15:50 EDT-------
Hi RedHat

I think we should put some RHEL installer maintainer in CC of this bug.

Per the previous messages, we don't have the /boot dir in the failing cases which suggests that it could be a result of a failure during the installation and possibly due some grub issue.

Is it possible to have the complete log of Anaconda Installer of the failing case? Maybe we could get some hint about what is getting wrong in this step.

THanks

Comment 22 IBM Bug Proxy 2019-09-23 13:31:24 UTC
------- Comment From diegodo.com 2019-09-23 09:28 EDT-------
Hi,

what are the next steps here?

I think we should consider to put the installer maintainers here, so we can try to understand why we dont have the /boot dir after the installation..

Thanks

Comment 23 Javier Martinez Canillas 2019-10-14 13:42:49 UTC
(In reply to Javier Martinez Canillas from comment #3)

[snip]

> 
>  - Petitboot (v1.4.4-91eed07) in the successful case.
>  - Petitboot (v1.4.4-e1658ec) in the failing case.
> 
> Could you please test with the same machine and Petitboot version just to
> make sure that the problem is not in the OPAL firmware? Since the grub.cfg
> and grubenv files are parsed by Petitboot and not grub2 for ppc64le PowerNV
> (Non-Virtualized). It still could be though that the grub tools are not
> generating a correct grub config file in the failing case.

There was never an answer to this question as far as I can tell. Since the bootloader is not controlled by the OS for ppc64le OPAL, it would be good to test using the same Petitboot version to make sure that the problem is not in the bootloader.

By the /boot directory not found, do you mean that the directory does not exist at all or that the boot partition can't be mounted on that directory (do you have a boot partition or only a root partition with a /boot directory)?

Comment 24 IBM Bug Proxy 2019-11-18 16:00:52 UTC
------- Comment From mbringm.com 2019-11-18 10:56 EDT-------
RedHat:
Any update on this one?

Comment 25 IBM Bug Proxy 2019-11-20 19:07:49 UTC
------- Comment From diegodo.com 2019-11-20 14:00 EDT-------
(In reply to comment #28)
> (In reply to Javier Martinez Canillas from comment #3)
> [snip]
> >
> >  - Petitboot (v1.4.4-91eed07) in the successful case.
> >  - Petitboot (v1.4.4-e1658ec) in the failing case.
> >
> > Could you please test with the same machine and Petitboot version just to
> > make sure that the problem is not in the OPAL firmware? Since the grub.cfg
> > and grubenv files are parsed by Petitboot and not grub2 for ppc64le PowerNV
> > (Non-Virtualized). It still could be though that the grub tools are not
> > generating a correct grub config file in the failing case.
> There was never an answer to this question as far as I can tell. Since the
> bootloader is not controlled by the OS for ppc64le OPAL, it would be good to
> test using the same Petitboot version to make sure that the problem is not
> in the bootloader.
> By the /boot directory not found, do you mean that the directory does not
> exist at all or that the boot partition can't be mounted on that directory
> (do you have a boot partition or only a root partition with a /boot
> directory)?

I'm assuming the partition can't be mounted, but it would be better to wait the answer from Ping.

@Ping could you please confirm which is the scenario we do have here?

Thanks!

Comment 26 Ping Zhang 2019-12-19 07:07:08 UTC
(In reply to IBM Bug Proxy from comment #25)
> ------- Comment From diegodo.com 2019-11-20 14:00 EDT-------
> (In reply to comment #28)
> > (In reply to Javier Martinez Canillas from comment #3)
> > [snip]
> > >
> > >  - Petitboot (v1.4.4-91eed07) in the successful case.
> > >  - Petitboot (v1.4.4-e1658ec) in the failing case.
> > >
> > > Could you please test with the same machine and Petitboot version just to
> > > make sure that the problem is not in the OPAL firmware? Since the grub.cfg
> > > and grubenv files are parsed by Petitboot and not grub2 for ppc64le PowerNV
> > > (Non-Virtualized). It still could be though that the grub tools are not
> > > generating a correct grub config file in the failing case.
> > There was never an answer to this question as far as I can tell. Since the
> > bootloader is not controlled by the OS for ppc64le OPAL, it would be good to
> > test using the same Petitboot version to make sure that the problem is not
> > in the bootloader.
> > By the /boot directory not found, do you mean that the directory does not
> > exist at all or that the boot partition can't be mounted on that directory
> > (do you have a boot partition or only a root partition with a /boot
> > directory)?
> 
> I'm assuming the partition can't be mounted, but it would be better to wait
> the answer from Ping.
> 
> @Ping could you please confirm which is the scenario we do have here?
> 
> Thanks!

When i caught this problem, I am sorry about that for the Petitboot version of these
system, I do not have the time to change it. 
for the scenario, I only have a root partition with a /boot directory, when i encountered 
this problem.

Comment 27 IBM Bug Proxy 2020-01-13 19:01:54 UTC
------- Comment From diegodo.com 2020-01-13 13:52 EDT-------
Hi Ping, is the problem still occurring? Could you please check if it works on Petitboot (v1.4.4-91eed07)?

Thank you

Comment 28 IBM Bug Proxy 2020-03-10 16:32:56 UTC
------- Comment From mbringm.com 2020-03-10 12:21 EDT-------
(In reply to comment #35)
> Hi Ping, is the problem still occurring? Could you please check if it works
> on Petitboot (v1.4.4-91eed07)?
> Thank you

Hello, Ping: Are you still observing this issue?

Comment 29 Ping Zhang 2020-04-03 11:01:08 UTC
(In reply to IBM Bug Proxy from comment #28)
> ------- Comment From mbringm.com 2020-03-10 12:21 EDT-------
> (In reply to comment #35)
> > Hi Ping, is the problem still occurring? Could you please check if it works
> > on Petitboot (v1.4.4-91eed07)?
> > Thank you
> 
> Hello, Ping: Are you still observing this issue?

It did not occur recently, I think it is a good news.

Comment 30 IBM Bug Proxy 2020-04-03 14:52:06 UTC
------- Comment From mbringm.com 2020-04-03 10:48 EDT-------
(In reply to comment #37)
> (In reply to IBM Bug Proxy from comment #28)
> > (In reply to comment #35)
> > > Hi Ping, is the problem still occurring? Could you please check if it works
> > > on Petitboot (v1.4.4-91eed07)?
> > > Thank you
> > Hello, Ping: Are you still observing this issue?
> It did not occur recently, I think it is a good news.

Great.  Do you think that there is more to do?

Comment 31 Javier Martinez Canillas 2020-04-09 12:18:14 UTC
Can we close this bug then? It's also reported to grub2 but the system uses OPAL/Petiboot.

Comment 32 IBM Bug Proxy 2020-04-09 15:34:06 UTC
------- Comment From mbringm.com 2020-04-09 11:27 EDT-------
(In reply to comment #39)
> Can we close this bug then? It's also reported to grub2 but the system uses
> OPAL/Petiboot.

Frank: What do you think?

Comment 33 IBM Bug Proxy 2020-04-13 14:01:52 UTC
------- Comment From fnovak.com 2020-04-13 09:54 EDT-------
Looks like this is working now..
Seems like RH doesn't have the time to go back..
I say, let's close this..

Comment 34 IBM Bug Proxy 2020-04-13 14:41:42 UTC
------- Comment From mbringm.com 2020-04-13 10:34 EDT-------
Closing per above comments.

Comment 35 Javier Martinez Canillas 2020-04-14 13:48:52 UTC
(In reply to IBM Bug Proxy from comment #34)
> ------- Comment From mbringm.com 2020-04-13 10:34 EDT-------
> Closing per above comments.

Ok, I'm closing this bug then.