OS版本:Red Hat Enterprise Linux Server release 7.3 (Maipo)
数据库版本:11.2.0.4.0
架构:RAC+单机DG
故障现象:
节点1 主机down
节点2 报错信息如下:
later 日志:
Mon Sep 16 02:00:00 2019
Closing Resource Manager plan via scheduler windowClearing Resource Manager plan via parameterMon Sep 16 02:20:59 2019Reconfiguration started (old inc 4, new inc 6)List of instances: 2 (myinst: 2) Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned outMon Sep 16 02:20:59 2019Mon Sep 16 02:20:59 2019 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survivedMon Sep 16 02:20:59 2019 LMS 3: 1 GCS shadows cancelled, 1 closed, 0 Xw survivedMon Sep 16 02:20:59 2019Mon Sep 16 02:20:59 2019 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survivedMon Sep 16 02:20:59 2019 LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues grantedMon Sep 16 02:20:59 2019minact-scn: master found reconf/inst-rec before recscn scan old-inc#:4 new-inc#:4 Post SMON to start 1st pass IRMon Sep 16 02:20:59 2019Instance recovery: looking for dead threadsBeginning instance recovery of 1 threads Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resourcesReconfiguration complete parallel recovery started with 32 processesStarted redo scanCompleted redo scan read 579 KB redo, 187 data blocks need recoveryMon Sep 16 02:21:03 2019Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p000_92487.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 11: Resource temporarily unavailableAdditional information: 3Additional information: 128Additional information: 202351108Mon Sep 16 02:21:03 2019Mon Sep 16 02:21:03 2019Mon Sep 16 02:21:03 2019Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p025_92901.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 11: Resource temporarily unavailableAdditional information: 3Additional information: 128Additional information: 202351108Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p027_92905.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 11: Resource temporarily unavailableAdditional information: 3Additional information: 128Additional information: 202351108Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p022_92893.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 11: Resource temporarily unavailableAdditional information: 3Additional information: 128Additional information: 202351108Mon Sep 16 02:21:03 2019Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p026_92903.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 11: Resource temporarily unavailableAdditional information: 3Additional information: 128Additional information: 202351108--grid 日志
2019-05-08 13:42:25.391:
[crsd(7600)]CRS-2769:Unable to failover resource 'ora.test.db'.2019-09-16 02:20:43.924:[cssd(5487)]CRS-1612:Network communication with node test1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.870 seconds2019-09-16 02:20:51.925:[cssd(5487)]CRS-1611:Network communication with node test1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.870 seconds2019-09-16 02:20:55.925:[cssd(5487)]CRS-1610:Network communication with node test1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.870 seconds2019-09-16 02:20:58.797:[cssd(5487)]CRS-1632:Node test1 is being removed from the cluster in cluster incarnation 4527191392019-09-16 02:20:58.801:[cssd(5487)]CRS-1601:CSSD Reconfiguration complete. Active nodes are test2 .2019-09-16 02:20:58.802:[crsd(7600)]CRS-5504:Node down event reported for node 'test1'.2019-09-16 02:21:01.997:[crsd(7600)]CRS-2773:Server 'test1' has been removed from pool 'Generic'.2019-09-16 02:21:01.997:[crsd(7600)]CRS-2773:Server 'test1' has been removed from pool 'ora.test'.2019-09-16 03:28:29.268:[cssd(5487)]CRS-1601:CSSD Reconfiguration complete. Active nodes are test1 test2 .2019-09-16 03:29:14.720:[crsd(7600)]CRS-2772:Server 'test1' has been assigned to pool 'Generic'.2019-09-16 03:29:14.720:[crsd(7600)]CRS-2772:Server 'test1' has been assigned to pool 'ora.test'.[grid@test2 test2]$
---主机日志
Sep 16 02:21:01 test2 systemd: Started Session 1138864 of user root.Sep 16 02:21:01 test2 systemd: Starting Session 1138864 of user root.Sep 16 02:21:01 test2 systemd: Started Session 1138865 of user oracle.Sep 16 02:21:01 test2 systemd: Starting Session 1138865 of user oracle.Sep 16 02:21:01 test2 systemd: Started Session 1138866 of user oracle.Sep 16 02:21:01 test2 systemd: Starting Session 1138866 of user oracle.Sep 16 02:21:01 test2 systemd: Started Session 1138867 of user oracle.Sep 16 02:21:01 test2 systemd: Starting Session 1138867 of user oracle.Sep 16 02:21:01 test2 systemd: Started Session 1138868 of user oracle.Sep 16 02:21:01 test2 systemd: Starting Session 1138868 of user oracle.Sep 16 02:21:01 test2 su: (to oracle) root on noneSep 16 02:21:01 test2 su: (to oracle) root on noneSep 16 02:21:01 test2 systemd: Removed slice user-0.slice.Sep 16 02:21:01 test2 systemd: Stopping user-0.slice.Sep 16 02:21:01 test2 avahi-daemon[3186]: Registering new address record for 1.3.10.8 on bond0.IPv4.Sep 16 02:21:01 test2 avahi-daemon[3186]: Withdrawing address record for 1.3.10.8 on bond0.Sep 16 02:21:01 test2 avahi-daemon[3186]: Registering new address record for 1.3.10.8 on bond0.IPv4.Sep 16 02:21:01 test2 avahi-daemon[3186]: Withdrawing address record for 1.3.10.8 on bond0.Sep 16 02:21:01 test2 avahi-daemon[3186]: Registering new address record for 1.3.10.8 on bond0.IPv4.Sep 16 02:22:01 test2 systemd: Created slice user-0.slice.Sep 16 02:22:01 test2 systemd: Starting user-0.slice.Sep 16 02:22:01 test2 systemd: Started Session 1138869 of user root.
--节点1 日志:
--alter
Mon Sep 16 02:00:00 2019Closing scheduler windowClosing Resource Manager plan via scheduler windowClearing Resource Manager plan via parameterMon Sep 16 03:29:22 2019Starting ORACLE instance (normal)************************ Large Pages Information *******************Per process system memlock (soft) limit = UNLIMITED
--ASM ALTER---早就出现ORA-27090
Tue May 21 17:21:02 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_208122.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3Additional information: 128Tue May 21 22:15:02 2019Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_259760.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3Additional information: 128Additional information: 1Tue May 21 22:21:02 2019Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_269758.trc:ORA-27090: Unable to reserve kernel resources for asynchronous disk I/OLinux-x86_64 Error: 2: No such file or directory
---mos 解决办法:
--两个节点都修改
vi /etc/sysctl.conf
fs.aio-max-nr = 3145728sysctl -p
---验证是否生效
--GRID 用户执行
cluvfy comp sys -n all -p crs -verbose
--输出结果如下
Check: Kernel parameter for "aio-max-nr"
Node Name Current Configured Required Status Comment ---------------- ------------ ------------ ------------ ------------ ------------ test2 3145728 3145728 1048576 passed test1 3145728 3145728 1048576 passed Result: Kernel parameter check passed for "aio-max-nr"文档 ID 579108.1