Bug 1580476 - OSP10 - Container configuration generation fails if the host file system is xfs that was created with ftype=0 [NEEDINFO]
Summary: OSP10 - Container configuration generation fails if the host file system is x...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-validations
Version: 10.0 (Newton)
Hardware: All
OS: All
urgent
urgent
Target Milestone: z9
: 10.0 (Newton)
Assignee: Carlos Camacho
QA Contact: Artem Hrechanychenko
URL:
Whiteboard:
Depends On: 1564671 1575115 1580463 1580469
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-21 14:43 UTC by Carlos Camacho
Modified: 2018-09-17 17:00 UTC (History)
27 users (show)

Fixed In Version: openstack-tripleo-validations-5.1.4-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1580469
Environment:
Last Closed: 2018-09-17 16:59:20 UTC
Target Upstream Version:
amcleod: needinfo? (ccamacho)
amcleod: needinfo? (ccamacho)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1765121 None None None 2018-05-21 14:43:21 UTC
Red Hat Product Errata RHBA-2018:2671 None None None 2018-09-17 17:00:21 UTC

Comment 1 Marius Cornea 2018-06-08 18:13:09 UTC
I manually installed the fixed-in package and tried running the validation but it doesn't work as expected:


[stack@undercloud-0 ~]$ rpm -q openstack-tripleo-validations
openstack-tripleo-validations-5.1.4-2.el7ost.noarch
[stack@undercloud-0 ~]$ openstack workflow execution create tripleo.validations.v1.run_validation '{"validation_name": "check-ftype"}'  -f json | jq -r -c '.ID'
57b1936b-1e28-49ae-b909-4355acfeffff
[stack@undercloud-0 ~]$ mistral execution-get-output 57b1936b-1e28-49ae-b909-4355acfeffff
{}

@Carlos, can you please advise how I can run this validation on OSP10?

Comment 2 Carlos Camacho 2018-06-12 10:56:03 UTC
Hey Marius,

The behavior when executing the validations is a little unknown to me, depending on what you execute you get an empty output result or the stderr correctly displayed as json (We need to check if this is a bug in the validations workflows).

This is what I do:

1) For me if I don't get `out['status'] == SUCCESS` is because the validation failed, the thing is you don't have something to "DEBUG" as the output is {}.


I deployed a OSP10 env and simulated the issue, check what I did here:




(undercloud) [stack@undercloud ~]$ openstack workflow execution create tripleo.validations.v1.run_validation '{"validation_name": "check-ftype"}'  -f json

(undercloud) [stack@undercloud ~]$ mistral execution-get-output 38b2d81d-2c27-4390-8b86-507f24347b84
{
    "status": "SUCCESS", 
    "stderr": "[DEPRECATION WARNING]: DEFAULT_SUDO_FLAGS option, In favor of become which is a\n generic framework . This feature will be removed in version 2.8. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n", 
    "stdout": "Success! The validation passed for all hosts:\n* 192.168.24.13\n* 192.168.24.16\n* localhost\n"
}

# Simulate an affected partition

# From the UC run:
dd if=/dev/zero of=~/wakawaka.img bs=100M count=10
du -sh ~/wakawaka.img 
sudo sudo losetup -fP ~/wakawaka.img
sudo mkfs.xfs -n ftype=0 -m crc=0 ~/wakawaka.img -f
mkdir ~/fs_test
sudo mount ~/wakawaka.img ~/fs_test

(undercloud) [stack@undercloud ~]$ openstack workflow execution create tripleo.validations.v1.run_validation '{"validation_name": "check-ftype"}'  -f json
(undercloud) [stack@undercloud ~]$ mistral execution-get-output 9b579045-3b1c-42e3-9827-ffc0a4879d83
{}



So as you can see, unless the output is exactly "SUCCESS" for me it failed.

Does this sound OK to you?

Comment 3 Marius Cornea 2018-06-19 15:55:24 UTC
(In reply to Carlos Camacho from comment #2)
> Hey Marius,
> 
> The behavior when executing the validations is a little unknown to me,
> depending on what you execute you get an empty output result or the stderr
> correctly displayed as json (We need to check if this is a bug in the
> validations workflows).
> 
> This is what I do:
> 
> 1) For me if I don't get `out['status'] == SUCCESS` is because the
> validation failed, the thing is you don't have something to "DEBUG" as the
> output is {}.
> 
> 
> I deployed a OSP10 env and simulated the issue, check what I did here:
> 
> 
> 
> 
> (undercloud) [stack@undercloud ~]$ openstack workflow execution create
> tripleo.validations.v1.run_validation '{"validation_name": "check-ftype"}' 
> -f json
> 
> (undercloud) [stack@undercloud ~]$ mistral execution-get-output
> 38b2d81d-2c27-4390-8b86-507f24347b84
> {
>     "status": "SUCCESS", 
>     "stderr": "[DEPRECATION WARNING]: DEFAULT_SUDO_FLAGS option, In favor of
> become which is a\n generic framework . This feature will be removed in
> version 2.8. Deprecation \nwarnings can be disabled by setting
> deprecation_warnings=False in ansible.cfg.\n", 
>     "stdout": "Success! The validation passed for all hosts:\n*
> 192.168.24.13\n* 192.168.24.16\n* localhost\n"
> }
> 
> # Simulate an affected partition
> 
> # From the UC run:
> dd if=/dev/zero of=~/wakawaka.img bs=100M count=10
> du -sh ~/wakawaka.img 
> sudo sudo losetup -fP ~/wakawaka.img
> sudo mkfs.xfs -n ftype=0 -m crc=0 ~/wakawaka.img -f
> mkdir ~/fs_test
> sudo mount ~/wakawaka.img ~/fs_test
> 
> (undercloud) [stack@undercloud ~]$ openstack workflow execution create
> tripleo.validations.v1.run_validation '{"validation_name": "check-ftype"}' 
> -f json
> (undercloud) [stack@undercloud ~]$ mistral execution-get-output
> 9b579045-3b1c-42e3-9827-ffc0a4879d83
> {}
> 
> 
> 
> So as you can see, unless the output is exactly "SUCCESS" for me it failed.
> 
> Does this sound OK to you?


OK, so I managed to get the SUCCESS/FAIL message but this is really not straightforward at all because:

1/ There is no feedback while the validation is running so basically the user needs to wait indefinitely until some output shows up(keep runnin  mistral execution-get-output until some output shows up)

2/ The validation needs to be run manually which makes it no different than a manual operation to ssh into the nodes and check the filesystem manually which makes more sense to the operator. I think we need to consider hooking these validations into the upgrade workflow so they run automatically when the user triggers a part of the upgrade process.  

Given this I recommend that for now we go with documenting the manual check per https://access.redhat.com/solutions/3459291 in the FFU workflow documentation and do not re-spin another OSP10 puddle to include the fixed in version package.

Comment 9 Artem Hrechanychenko 2018-08-14 15:10:05 UTC
VERIFIED


[stack@undercloud-0 ~]$ rpm -qa "openstack-tripleo-validations"
openstack-tripleo-validations-5.1.4-3.el7ost.noarch


[stack@undercloud-0 ~]$  mistral execution-get-output 7e95c06b-d615-42d0-a71b-e5181174b1f9
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
{
    "status": "SUCCESS", 
    "stderr": "[DEPRECATION WARNING]: DEFAULT_SUDO_FLAGS option, In favor of become which is a\n generic framework . This feature will be removed in version 2.8. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n", 
    "__task_execution": {
        "id": "6c7cfa8e-40df-417f-8426-895d13aba75f", 
        "name": "send_message"
    }, 
    "stdout": "Success! The validation passed for all hosts.\n"
}
[stack@undercloud-0 ~]$ nova list
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 24722d9c-24b6-47ac-b8d1-5531f0db1dea | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| 190a7698-8929-4c55-ab52-9cddba022173 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.16 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
[stack@undercloud-0 ~]$ ssh heat-admin@192.168.24.16
The authenticity of host '192.168.24.16 (192.168.24.16)' can't be established.
ECDSA key fingerprint is SHA256:Adt4A+NI8HY3BmZIAXOBRaee+UMNDFXeDeKLLj4ePeQ.
ECDSA key fingerprint is MD5:9e:04:bb:0f:a4:af:a4:b7:28:10:ce:32:f9:82:ad:ff.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.24.16' (ECDSA) to the list of known hosts.
Last login: Tue Aug 14 15:06:45 2018 from 192.168.24.1
[heat-admin@controller-0 ~]$ dd if=/dev/zero of=~/wakawaka.img bs=100M count=10
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 0.806012 s, 1.3 GB/s
[heat-admin@controller-0 ~]$ du -sh ~/wakawaka.img 
1000M	/home/heat-admin/wakawaka.img
[heat-admin@controller-0 ~]$ sudo sudo losetup -fP ~/wakawaka.img
[heat-admin@controller-0 ~]$ sudo mkfs.xfs -n ftype=0 -m crc=0 ~/wakawaka.img -f
meta-data=/home/heat-admin/wakawaka.img isize=256    agcount=4, agsize=64000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0
data     =                       bsize=4096   blocks=256000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=853, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[heat-admin@controller-0 ~]$ mkdir ~/fs_test
[heat-admin@controller-0 ~]$ sudo mount ~/wakawaka.img ~/fs_test
[heat-admin@controller-0 ~]$ exit
logout
Connection to 192.168.24.16 closed.
[stack@undercloud-0 ~]$ openstack workflow execution create tripleo.validations.v1.run_validation '{"validation_name": "check-ftype"}'  -f json
{
  "Task Execution ID": "<none>", 
  "Description": "", 
  "Workflow name": "tripleo.validations.v1.run_validation", 
  "Created at": "2018-08-14 15:08:12.687706", 
  "State": "RUNNING", 
  "State info": null, 
  "Updated at": "2018-08-14 15:08:12.691316", 
  "ID": "d95f7cdc-dd8d-4105-bcf7-51510e532f6e", 
  "Workflow ID": "e1121119-3da5-4a4b-93cc-221dc1fa03b2"
}[stack@undercloud-0 ~]$ mistral execution-get-output d95f7cdc-dd8d-4105-bcf7-51510e532f6e
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
{
    "status": "FAILED", 
    "stderr": "[DEPRECATION WARNING]: DEFAULT_SUDO_FLAGS option, In favor of become which is a\n generic framework . This feature will be removed in version 2.8. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n", 
    "__task_execution": {
        "id": "16e2eb33-83ed-407a-a88c-1f468ac6f586", 
        "name": "send_message"
    }, 
    "stdout": "Task 'Check ftype' failed:\nHost: 192.168.24.16\nMessage: XFS volumes formatted using ftype=0 are incompatible with the docker overlayfs driver. Run xfs_info in controller-0.localdomain and fix those volumes before proceeding with the upgrade.\n\n\nFailure! The validation failed for the following hosts:\n* 192.168.24.11\n* 192.168.24.16\n* localhost\n"

Comment 11 Alex McLeod 2018-09-03 08:00:43 UTC
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 13 errata-xmlrpc 2018-09-17 16:59:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2671


Note You need to log in before you can comment on or make changes to this bug.