Once again I end up with my clients database swapping. Why? After quick investigation, could see that HugePages were not used on the last restart of the database.
oracle@myvm1:./trace/ [oracle19] grep -B1 -A4 PAGESIZE alert*.log 2020-04-14T04:36:34.601494+02:00 PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s) 2020-04-14T04:36:34.601550+02:00 4K Configured 10 10 NONE 2020-04-14T04:36:34.601642+02:00 2048K 247816 8193 8193 NONE -- 2020-10-13T22:59:28.856763+02:00 PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s) 2020-10-13T22:59:28.856818+02:00 4K Configured 10 4186122 NONE 2020-10-13T22:59:28.856925+02:00 2048K 202479 8193 17 NONE
Why was that? I did use a normal start command:
oracle@myvm1:./trace/ [oracle19] srvctl start database -db mydb
Let’s put the context. This is a Oracle Restart server, with separation between oracle and grid users.
The database in the past used HugePages, so they are configured (I skip the check here, but I did it).
I’ve looked around and could see that ulimits were correctly set for both users:
oracle@myvm1:./trace/ [oracle19] grep 'memlock' /etc/security/limits.d/99-grid-oracle-limits.conf oracle soft memlock 1342177280 oracle hard memlock 1342177280 grid soft memlock 1342177280 grid hard memlock 1342177280
I see the file was changed after the previous DB restart. The server was not restarted in-between.
But on the alertlog I see that the ulimit seen by the database changed compared the the last restart:
oracle@myvm1:./trace/ [oracle19] grep -h -B1 "Per proc" alert*.log 2020-04-14T04:36:34.301642+02:00 Per process system memlock (soft) limit = 1280G -- 2020-10-13T22:59:28.756763+02:00 Per process system memlock (soft) limit = 64K
I decided to stop the database and starting it again but with sqlplus. And HugesPages were used!
So, where could be the problem?
The srvctl uses de oraagent.bin to start databases. It is so important to check what the limits used by this process:
grid@myvm1:~/ [grid19] grep memory /proc/$(pgrep oraagent.bin)/limits Max locked memory 65536 65536 bytes
Here is the problem, the oraagent.bin was started before the change of the limits file. And new limits are only valid for new sessions.
When we connect to a new session and start a DB with sqlplus, it will get this session limits. But if we use srvctl, it will get the limits that were in place when oraagent.bin started.
We are on a production server, restart the whole clusterware stack would be quite annoying. What is nice, is that we can simply “kill” the oraagent.bin process and it will respawn. Even better if we do it with the ohasd.bin.
grid@myvm1:~/ [grid19] ps -ef | grep ohasd.bin grid 2854 1 0 Sep02 ? 03:23:19 /u00/app/grid/product/19/bin/ohasd.bin reboot grid@myvm1:~/ [grid19] kill -9 2854 grid@myvm1:~/ [grid19] ps -ef | grep ohasd.bin grid 185684 1 17 14:22 ? 00:00:01 /u00/app/grid/product/19/bin/ohasd.bin restart grid@myvm1:~/ [grid19] psg pmon grid 6458 1 0 Sep02 ? 00:03:38 asm_pmon_+ASM oracle 7069 1 0 Oct13 ? 00:05:18 ora_pmon_mydb
And now the new limits are correctly set for the oraagent.bin:
grid@myvm1:~/ [grid19] grep memory /proc/$(pgrep oraagent.bin)/limits Max locked memory 1374389534720 1374389534720 bytes
And further restarts with srvctl are using HugePages.
Lesson: check the if the ulimits are correctly applied to oragent.bin.
If you have a file with a list of servers separated by spaces or lines, then you can use a similar loop to go through all of them:
for server in $(cat server_list.txt | grep -v '^#'); do echo "=== $server"; ssh -o ConnectTimeout=1 -o ConnectionAttempts=1 -oBatchMode=yes $server \ "echo '99-grid - ' \$(grep 'grid soft memlock' /etc/security/limits.d/99-grid-oracle-limits.conf); echo 'oraagent - ' \$(grep memory /proc/\$(pgrep oraagent.bin)/limits)"; done;