How to configure JVM memory in a cloud native context

background

Some time ago, business R&D reported that his application’s memory usage was very high, causing frequent restarts. Let me check what’s going on;

I didn’t pay much attention to this problem before, so I just made a record of this investigation and analysis process.

First I checked the Pod monitoring in the monitoring panel:

It was found that it was indeed almost full, but when I checked the application’s JVM occupancy, it was only about 30%. This means that it is not the application memory that is full that causes the JVM’s OOM, but the Pod’s memory that is full, which causes the Pod’s memory to overflow. Killed by k8s.

In k8sorder to maintain the number of copies of the application, a Pod must be restarted, so it seems that the application is restarted after running for a period of time.



This application is configured with JVM 8G, and the memory requested by the container is 16G, so the memory usage of the Pod seems to be only about 50%.

The principle of containers

Before solving this problem, it is better to briefly understand the operating principle of containers, because all applications in k8s run in containers, and containers are essentially running on the host.

But when we use Docker, we feel that the applications started by each container do not interfere with each other. The file system, network, CPU, and memory can be completely isolated, just like two applications running on different servers.

In fact, this is not a black technology at all. Linux has already supported namespaceisolation since version 2.6.x, and namespacetwo processes can be completely isolated using it.

Merely isolating resources is not enough, you also need to limit the use of resources, such as CPU, memory, disk, bandwidth, etc.; this can also be cgroupsconfigured using .

It can limit the resources of a certain process. For example, the host has a 4-core CPU and 8G memory. In order to protect other containers, this container must be configured with an upper limit: 1-core CPU and 2G memory.

This picture clearly shows the role of namespace and cgroupsin container technology. Simply put, it is:

  • namespace is responsible for isolation
  • cgroups are responsible for restrictions

There are also corresponding withdrawals in k8s:

  resources: 
    requests: 
      memory:  1024Mi 
      cpu:  0.1 
    limits: 
      memory:  1024Mi 
      cpu:  4

This resource list indicates that the application needs to allocate at least one 0.1 core and 1024M resources to a container, and the maximum CPU limit is 4 cores.

Different OOM

Going back to this problem, it can be confirmed that the container experienced OOM and was restarted by k8s. This is also the role of our limits configuration.

When k8s memory overflow causes the container to exit, an event log with exit code 137 will appear.

Because the JVM memory configuration of the application and the configuration size of the container are the same, both are 8GB, but the Java application also has some non-JVM managed memory, such as off-heap memory, which can easily cause the container memory size to exceed The limit is 8G, which leads to container memory overflow.

Optimization of cloud native background

Because the application itself does not use much memory, it is recommended to limit the heap memory to 4GB, so as to avoid container memory exceeding the limit and solve the problem.

Of course, we will also add suggestions in the application configuration column in the future: it is recommended that the JVM configuration be less than 2/3 of the container limit and reserve some memory.

In fact, the development model has not changed in essence. The traditional Java application development model does not even understand the memory size of the container, because in the past, everyone’s applications were deployed on a virtual machine with large memory, so they could not sense it. Container memory limits.

Therefore, it is mistakenly thought that the two are equal, which may be especially obvious in Java applications. After all, there is an extra JVM; even in older versions of JDK, if the heap memory size is not set, the memory limit of the container cannot be sensed, thus The automatically generated Xmx is larger than the memory size of the container, causing OOM.