Applying 19.19 RU on RHEL8.7 fails with timeout on “kernel:watchdog: BUG: soft lockup – CPU#X stuck for XXs! [modprobe:XXXXXX].”

Last month 19.19 RU was released and after 19.18 issues, I was hoping to get a stable patch. But on the contrary, got another buggy RU, which fails on RHEL 8.7 (Oopta) in postpatch stage.

Following is the log, where it timeout, in postpatch stage of roothas.sh.

04-20-2023 05:41:13 Successfully Installed the Oracle Database software.
04-20-2023 05:41:13 Starting execution of /orabin01/app/oraInventory/orainstRoot.sh.
04-20-2023 05:41:13 Info: Skipping execution of /orabin01/app/oraInventory/orainstRoot.sh as it does not exist!
04-20-2023 05:41:13 Starting execution of /orabin01/app/oracle/product/19c/grid/root.sh.
04-20-2023 05:41:14 Executed "/orabin01/app/oracle/product/19c/grid/root.sh" as root.
04-20-2023 05:41:14 Starting execution of /orabin01/app/oracle/product/19c/grid/crs/install/roothas.sh
.
<-- Hangs here with soft lockup modprobe.

After this it waits indefinately and you will see Kernel throwing “kernel:watchdog: BUG: soft lockup – CPU#X stuck for XXs! [modprobe:XXXXXX].”, after every few seconds on the server console.

After this it waits indefinitely and you will see Kernel throwing “kernel:watchdog: BUG: soft lockup – CPU#X stuck for XXs! [modprobe:XXXXXX].”, after every few seconds on the server console.

# modinfo /lib/modules/4.18.0-425.19.2.el8_7.x86_64/extra/usm/oracleoks.ko | egrep 'filename|author|rhelversion|name|vermagic'

filename: /lib/modules/4.18.0-425.19.2.el8_7.x86_64/extra/usm/oracleoks.ko
author: Oracle Corporation
rhelversion: 8.7
name: oracleoks
vermagic: 4.18.0-425.3.1.el8.x86_64 SMP mod_unload modversions

This turned out to be a bug, which is also confirmed by RHEL support Note 7005934 – “RHEL8.7 system with the kernel version 4.18.0-425.10.1.el8_7 or higher hangs with soft lockup warnings”.

Resolution as per RedHat : Contact Oracle support to get an updated version of the [oracleoks] module built on top of RHEL8.7.z kernel version 4.18.0-425.10.1.el8_7 or higher.

As you can see, this only impacts RHEL 8.7 and may be onwards. I could install this RU successfully on RHEL 7.9 (Mapio).

Solution – We need to use one-off patch 35068505 to fix this bug. This patch itself is 800MB+!!

Secondly, we can’t use opatchauto to apply this patch and need to fallback to “Manual Patch Apply” method. Refer to section “5.1 Manual Steps for Applying or Rolling Back the Patch” from readme.txt for patch 35037840. This will re-direct you to another note – Grid Infrastructure Release Update 12.2.0.1.x / 18c /19c (Doc ID 2246888.1).

Following are the steps one will usually use for manual apply –

# <GI_HOME>/crs/install/roothas.sh -prepatch

As the GI home owner execute:

$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<OCW TRACKING BUG>
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<ACFS TRACKING BUG>
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<DBWLM TRACKING BUG>
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<DB RU TRACKING BUG>
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<TOMCAT RU TRACKING BUG>


Here additional step is to be added to address the Kernel panic BUG #35068505.

$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/35068505 <-- Additional step

# <GI_HOME>/rdbms/install/rootadd_rdbms.sh
# <GI_HOME>/crs/install/roothas.sh -postpatch


Once the patch 35068505 is applied, roothas.sh will run successfully.

Re-validate the Kernel object version, which should be now 4.18.0-425.10.1.el8_7 and addresses the underlying issue.

# modinfo /lib/modules/4.18.0-425.19.2.el8_7.x86_64/extra/usm/oracleoks.ko | egrep 'rhelversion|vermagic'

rhelversion: 8.7
vermagic: 4.18.0-425.
10.1.el8_7.x86_64 SMP mod_unload modversions

Metalink Doc ID 2523221.1 Grid Infrastructure 19 Release Updates and Revisions Bugs Fixed Lists

Posted in Oracle 19c, Oracle OPatch, Oracle PSU, Oracle Restart | Tagged , , , , , , , , | Leave a comment