Two days ago, the service of a certain cloud vendor we used was down, and it stayed down for most of the day. Our services depended heavily on them, so we also hung up. Yet there is nothing we can do but wait for them to recover. During the cause of the accident, I heard them mention that Redis has a hot key. I happened to be responsible for the departmental Redis cluster in my previous company, and I also dealt with many Redis data hotspot issues. Next, let’s talk about what is a Redis hotspot? Why do Redis hot spots greatly affect the performance of the entire cluster? How to avoid Redis data hotspots? How to troubleshoot hot issues? How to solve hot issues?
Contents
What is Redis hotspot?
I have mentioned the word locality many times in my past blog posts (for locality, you can read my previous blog post on the principle of locality ). Data hot spots are the embodiment of data access locality. The specific manifestation is a certain Key in Redis. The access frequency is much greater than that of other remaining keys. We also have a saying to describe this phenomenon. Drought will kill you, and flood will kill you .
Why does Redis have hot issues? This has to start with the principle of Redis. As we all know, Redis stores KV data. In cluster mode, Redis will allocate all data to 16384 data slots (slots) according to the CRC64 value of the Key, and allocate these 16384 data slots to each machine in the cluster. , to achieve even storage of data on various machines as much as possible. But uniform storage does not mean uniform access. Sometimes requests for a certain Key will account for a large part of the total requests, which will cause requests to be concentrated on a certain Redis instance and exhaust the carrying capacity of the Redis instance. All other data stored on this instance will not be accessible normally, which means that all services that rely on this data will have problems.
The problem here does not mean that Redis will crash directly. As we all know, the core process of Redis is single-threaded mode, which means that Redis processes all requests serially. When there are too many requests, the requests will be congested. From the application layer From a perspective, it is very time-consuming to request Redis. Because the application layer uses Redis as a cache and all requests are synchronous, it will indirectly cause the request processing of the application layer to be very time-consuming, which will lead to the application layer requests becoming gradually congested and eventually becoming unavailable as a whole.
Let’s take a simple example. I believe everyone has eaten melons on Weibo. When a big melon appears, a group of users will quickly flood into Weibo to search for relevant information and frantically access the same Weibo (data). In this case, this data is hot data. If the data is too “hot”, it will eventually cause Weibo to crash. In fact, Weibo has crashed many times, not because Weibo technology is not good, but because of hot spots. The problem is terrible.
How does Redis hotspot bring down other services?
Redis hotspots will not only cause a single service exception, but will cause all services that depend on this Redis cluster to be abnormal. In the above figure, Server2, Server, and Server3 frequently access XXX_KEY, causing the RedisServer2 instance to be unavailable, because Server4 depends on Key7 on RedisServer2. Even if RedisServer1 3 4 5 all serve normally, Server4 cannot provide normal services to the outside world.
Will anyone ask if RedisServer2 is down? Can’t it be removed and replaced with a new machine? In fact, in Redis cluster mode, if an instance goes down, the Redis cluster will automatically replace it. But the current situation is that even if a new instance is replaced, a large number of requests will come in all at once and overwhelm it. Therefore, in the face of Redis hotspot problems, methods such as restarting are ineffective, and the problem can only be solved from the request side.
In fact, this is an obvious barrel model . The amount of water a barrel can hold depends on the shortest board on the barrel. When an application uses Redis cluster, the performance upper limit of Redis cluster is not exactly equal to the upper limit of a single instance. Multiply by the number of instances. But when there is a problem with any instance in the Redis cluster, the upper layer perceives that there is a problem with the entire Redis cluster.
How to avoid Redis hot spots?
As mentioned above, hot issues are actually local problems, and it is actually very difficult to avoid local problems. Almost any distributed system will be affected by locality. Faced with this kind of problem, there is no absolute way to avoid it. We can only analyze the characteristics of the data in advance and take corresponding measures. To put it bluntly, it depends on experience. I don’t know if you have any other good ideas. You can discuss them in the comment area.
How to troubleshoot hot issues?
Redis hotspot issues are actually easy to check, just rely on monitoring data to monitor the CPU usage and QPS data of each Redis instance. If you see that the load and QPS of some instances in the redis cluster are particularly high, but the load of other instances is very high. Low, no need to ask, there must be a hotspot problem. The next thing you need to do is to find out the specific hotspot key and find out the source of data access.
Finding the hotspot key is actually very simple. You can easily see it by grabbing some access logs and counting them. But what is more difficult is to find the source of data access. For example, in the company I worked for before, the same Redis cluster is shared by many businesses, but Redis access is not included in the data of full-link monitoring, so finding The most direct way to find out the source of the access is to ask in the group. It sounds primitive, but there is no other way.
How to solve hot issues?
Although the best way to solve the problem is to avoid the problem, as I said just now, Redis hot spots are actually difficult to avoid. Hot spots must exist in any business, but they do not necessarily cause disasters . The discovery of hot spots does not necessarily have to be caused by accidents. We can also conduct regular inspections in our daily work and kill any signs of hot spots directly.
Regarding how to solve hot spots after discovering them, I will provide two of my solutions for you to discuss:
Application layer cache
A common implementation method is to implement local cache (LocalCache) in the application, which is equivalent to adding another layer of cache to the Redis data. For those very hot hot data, the application layer has a high probability of finding the data in the local cache. , only a very small part of the requests when the LocalCache data expires will leak to the bottom, so that the requests for hot data can be digested within the application layer, thus greatly reducing the pressure on Redis.
With this implementation method, only the data reading side needs to be modified, and the data writing side does not need to be modified at all. However, the shortcomings are also obvious:
- Each end needs to implement it by itself, which will increase the cost of application layer development and maintenance.
- Additional storage space at each end will be wasted.
- It requires targeted development and is not suitable for large-scale promotion.
Add data copy
Since the hot spot problem is caused by a certain Key being accessed in large quantities, we can just split the requests for this Key. For example, the original hotspot key is called XXX_KEY . When writing data, we can repeatedly write 10 copies with different keys, such as XXX_KEY_01, XXX_KEY_02… The suffix between them is enough, so that data requests can be dispersed. If you want the requests to be more dispersed, you can store more copies.
The advantage of this solution is that the implementation cost of the data reading end is relatively low (not completely absent), but the requirements for the data writing end are much higher. Not only do you have to write multiple copies, but you also have to consider the issue of consistency after the data is written. This method requires both ends to be modified, which seems to be more troublesome. Do you think it is worse than the first option? In fact, this is not the case. Generally speaking, there will be many reading ends and they are very scattered. The cost of transformation will be very high, and frequent changes are unlikely, so some work has to be placed on more centralized ends.
The above two solutions actually trade storage for performance. The main difference lies in who does it. The former is done by the client, and the latter is done by the server. Each has its own advantages and disadvantages. Is it possible to provide a pure Redis protocol to the outside world that can solve the problem of data hotspots? . Lu Xun… David Wheeler once said that all problems in computer science can be solved by adding another layer , and hot issues are no exception. We can add an intermediate layer between the application and Redis. This intermediate layer can be a real service, such as a unified data access layer. It can also be a specially made Redis client.
The middle layer can add local cache to specific keys to ensure that hot spots will not appear on Redis. As for which keys to add to the local cache, the middle layer can analyze recent request hotspot data in real time and decide on its own. In fact, the simplest way is to open an LRU or LFU Cache.
In addition, the second solution of adding data copies can also be implemented by the middle layer. When we find a data hotspot, let the middle layer actively copy the hotspot data, intercept and rewrite all requests for hotspot data, and spread it out. Of course, if the middle layer is made more intelligent, all of this can be automated, from hot spot discovery to resolution, without human involvement at all.
That’s it for today’s article. If you find it useful, please like it. If you like it, please follow it. Regarding Redis hot issues, if you have any opinions or experiences, you can leave a message in the comment area for discussion.