ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 128 bytes of shared memory
ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed]
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-04030: out of process memory when trying to allocate 552 bytes (kxs-heap-p,kxfqlobf)
I started to dig and troubleshoot.
prstat -t output below shows total memory usage by user on Solaris. Look at the RSS column below it clearly shows 144GB of memory being used by the oracle user. I was surprised to see this.
NPROC USERNAME SWAP RSS MEMORY TIME CPU
85 root 2520M 2481M 1.3% 87:40:45 0.2%
5462 oracle 183G 144G 75% 197:27:56 0.2%
51 grid 17G 12G 6.4% 34:47:13 0.0%
1 smmsp 4272K 10M 0.0% 0:00:06 0.0%
7 daemon 9384K 12M 0.0% 0:00:54 0.0%
1 netcfg 4344K 7240K 0.0% 0:00:24 0.0%
3 netadm 46M 46M 0.0% 0:00:45 0.0%
I then wanted to see a list of process sorted by memory usage I also added cpu time, the process id and the command and sorted by the memory column from largest usage to lowest as shown in the following command. The output showed that the largest processes were oracle and all of them were about the same size based on the pmem column which shows the output in terms of a %.
# ps -eo pmem,pcpu,pid,comm | sort -k 1 -nr |wc -l
I also wanted to see a total process count on the server so I ran the following command it returned about 6000 processes, so many.
# ps -ef| wc -l
..And the plot thickens then I thought about checking to see if we have too many parallel process threads running and I ran the following command to check and verify this.The following command returned about 4800 rows! Now we are on to something this clearly shows that about 80% of the processes running on the VM are all parallel threads, this is way to much.
ps -eo pmem,pcpu,pid,comm | sort -k 1 -nr |grep -i _p0|wc -l
The parallel_max_servers database init parameter in 12c is a culprit for this and it should be limited. If this parameter is not set it will take the default value which is based upon the settings from CPU_COUNT, PARALLEL_THREADS_PER_CPU, and PGA_AGGREGATE_TARGET.
The CPU_COUNT database init parameter should also be limited as well. If the default value is taken then the database will take upon the value reported by the total cpu threads reported from the OS. In my case we have 20 CPU cores assigned to the VM and on the M7 Supercluster each CPU has 32 cores with 8 threads each a total of 256 threads per core.So 20 cores assigned to the VM is 160 threads and can set CPU_COUNT accordingly based on the workload need.
The PARALLEL_THREADS_PER_CPU is normally OS dependent and usually 2. This setting is fine.
Once I appropriately set the CPU_COUNT and PARALLEL_MAX_SERVERS parameters on each of the 16 databases and restarted each one I was happy to see the overall memory utilization dramatically decreased. As shown below in the prstat command.
# prstat -t
NPROC USERNAME SWAP RSS MEMORY TIME CPU
764 oracle 80G 56G 29% 14:33:54 0.1%
94 root 2525M 2493M 1.3% 88:14:08 0.1%
53 grid 8239M 6759M 3.4% 34:26:20 0.0%
1 smmsp 4272K 10M 0.0% 0:00:06 0.0%
7 daemon 9384K 12M 0.0% 0:00:55 0.0%
1 netcfg 4344K 7240K 0.0% 0:00:24 0.0%
3 netadm 46M 46M 0.0% 0:00:45 0.0%
In Summary the following two parameters should be set appropriately on every new Oracle Database build. As a best practice please limit these parameters as the default settings could hurt the overall performance of your system and even impact the available memory as it did in our case. I have even see many cases where the PARALLEL_MAX_SERVERS was not set and it completely hung an Exadata or other types of database servers so please be mindful of these two parameters.
- CPU_COUNT
- PARALLEL_MAX_SERVERS