Last week, a friend asked in the group about the relationship between Docker and Iptables. Let’s talk about it in detail here.
Docker can provide us with very powerful and flexible network capabilities, largely due to its combination with iptables. When using it, you may not pay much attention to the role of iptables. This is because Docker has automatically completed the relevant configuration for us.
(MoeLove) ➜ ~ dockerd --help | grep iptables --iptables Enable addition of iptables rules ( default true )
The docker daemon has a --iptables
parameter, which is used to control whether to automatically enable iptables rules. The default is set to on (true). So usually we don’t pay too much attention to its work.
In this article, in order to avoid environmental interference, I will use the docker in docker environment to introduce it. This environment can be started as follows:
(MoeLove) ➜ ~ docker run --rm -d --privileged docker:dind f323aef7b532ba6d575ca6f9444a08f1a55f2447afec2e853954694c034e6ae0
Contents
iptables basics
iptables
Is a tool for configuring Linux kernel firewalls that can be used to detect, modify forwarding, redirecting, and dropping IPv4 packets. It uses the ip_tables function of the kernel, so it requires the Linux 2.4+ version of the kernel.
At the same time, in order to facilitate management, iptables organizes multiple tables according to different purposes ; each table contains many predefined chains ; each chain contains rules for sequential traversal ; these rules also define matching rules for actions. and goals .
For users, what we usually need to interact with are chains and rules .
There is a classic diagram to understand the main workflow of iptables:
Image source: https://www.frozentux.net/ipt…
The lowercase letters above are tables , and the uppercase letters represent chains . Every IP packet coming in from any network port must pass through this picture from top to bottom.
- Quoted from ArchWiki
However, this is not the focus of this article, so I will not expand on it. If you are interested in the content of iptables, please leave a message, and you can write a complete article later.
Docker networking and iptables
Next, let’s take a look at the specific differences between Docker when opening and closing iptables.
Turn off Docker’s iptables support
At the beginning of this article, I introduced to you that the docker daemon has a --iptables
parameter that is used to control whether to use iptables. We use the following commands to start a docker daemon and turn off iptables support.
(MoeLove) ➜ ~ docker run --rm -d --privileged docker:dind dockerd --iptables = false 7135 a 54 c 913 af 5e9 ce 69 a 45 a 0819475503 ea 9e3 c 5 c 673 d 62 d 9 d 38 f 0 f 0896179 d
Enter this container and view all its iptables rules:
(MoeLove) ➜ ~ docker exec -it $(docker ps -ql) sh / # iptables-save # Generated by iptables-save v1.8.8 on Mon Dec 12 01:46:38 2022 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [2:80] COMMIT # Completed on Mon Dec 12 01:46:38 2022
It can be seen that when the docker daemon adds --iptables=false
the parameter, there is no regular output by default.
Enable Docker’s iptables support
Use the following command to start a docker daemon. There is no explicit --iptables
option passed here because it is the default true
.
(MoeLove) ➜ ~ docker run --rm -d --privileged docker:dind c 464 c 5 c 08 ecdf 9129 afbf 217 c 6462236089 fe 0 a 1 d 11 dfe 7700 c 2985 a 04 d 8 d 216
View its iptables rules:
(MoeLove) ➜ ~ docker exec -it $(docker ps -ql) sh / # iptables-save # Generated by iptables-save v1.8.8 on Mon Dec 12 14:48:16 2022 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [1:40] :POSTROUTING ACCEPT [1:40] :DOCKER - [0:0] -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -s 172.18.0.0/16 ! -o docker0 -j MASQUERADE -A DOCKER -i docker0 -j RETURN COMMIT # Completed on Mon Dec 12 14:48:16 2022 # Generated by iptables-save v1.8.8 on Mon Dec 12 14:48:16 2022 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [2:80] :DOCKER - [0:0] :DOCKER-ISOLATION-STAGE-1 - [0:0] :DOCKER-ISOLATION -STAGE-2 - [0:0] :DOCKER-USER - [0:0] -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN -A DOCKER-USER -j RETURN COMMIT # Completed on Mon Dec 12 14:48:16 2022
As you can see, it has several more chains than when iptables support was turned off just now:
- DOCKER
- DOCKER-ISOLATION-STAGE-1
- DOCKER-ISOLATION-STAGE-2
- DOCKER-USER
And some forwarding rules have been added, which will be introduced in detail below.
DOCKER-USER chain
Among the above-mentioned new chains, let’s first look at DOCKER-USER, which is the first to take effect.
*filter :DOCKER-USER - [0:0] -A FORWARD -j DOCKER-USER ... -A DOCKER-USER -j RETURN
The above rules are effective in the filter table:
- The first one is:
-A FORWARD -j DOCKER-USER
This means that after the traffic enters the FORWARD chain, it directly enters the DOCKER-USER chain; - The last one
-A DOCKER-USER -j RETURN
means that after the traffic enters the DOCKER-USER chain for processing, (if there is no other processing) it can be RETURNed back to the original chain for subsequent rule matching.
This is actually a chain reserved by Docker for users to configure some additional rules.
Docker’s default routing rule allows all clients to access it. If your Docker is running on the public network, or you want to prevent the containers in Docker from being accessed by other clients in the LAN, then you need to add a rule here .
For example, you only allow access to 100.84.94.62, but deny access to other clients:
iptables -I DOCKER-USER -i <net interface> ! -s 100.84.94.62 -j DROP
In addition, Docker will clean and rebuild iptables-related rules during operations such as restarting, but the rules in the DOCKER-USER chain can be persisted and will not be affected.
The specific implementations are docker/libnetwork
below . The following is DOCKER-USER
the relevant code about the chain:
const userChain = "DOCKER-USER" func arrangeUserFilterRule () { if ctrl == nil || !ctrl.iptablesEnabled() { return } iptable := iptables.GetIptable(iptables.IPv4) _, err := iptable.NewChain(userChain, iptables.Filter, false ) if err != nil { logrus.Warnf( "Failed to create %s chain: %v" , userChain, err) return } if err = iptable.AddReturnRule(userChain); err != nil { logrus.Warnf( "Failed to add the RETURN rule for %s: %v" , userChain, err) return } err = iptable.EnsureJumpRule( "FORWARD" , userChain) if err != nil { logrus.Warnf( "Failed to ensure the jump rule for %s: %v" , userChain, err) } }
You can see that the chain name is fixed in the code, and the chain and rules are created/ensured.
DOCKER-ISOLATION-STAGE-1/2 CHAIN
DOCKER-ISOLATION-STAGE-1/2 These two chains have similar functions and will be introduced together here.
*filter ... :DOCKER-ISOLATION-STAGE-1 - [0:0] :DOCKER-ISOLATION-STAGE-2 - [0:0] :DOCKER-USER - [0:0] -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 ... -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN ...
These two chains are mainly separated by bridge networks in two stages. The so-called bridged network usually refers to docker0
the network through this interface created by Docker.
/ # ifconfig docker0 docker0 Link encap:Ethernet HWaddr 02 : 42 : 11 : 31 : 97 : 0D inet addr: 172.18 . 0.1 Bcas t:172 . 18.255 . 255 Mask: 255.255 . 0.0 UP BROADCAST MULTICAST MTU: 1500 Metric: 1 RX packet s:0 error s:0 dropped: 0 overrun s:0 frame: 0 TX packet s :0 error s:0 dropped: 0 overrun s:0 carrier : 0 collision s:0 txqueuelen: 0 RX byte s:0 ( 0.0 B) TX byte s:0 ( 0.0 B)
Give an example to illustrate.
First create a moelove
network named and view its IP.
➜ ~ docker network create moelove 0d3d76dcf81fcf4b9d76ab5a7dec22737b115dddd593c73b27d27f0114cec1e2 ➜ ~ docker run --rm -it --network moelove alpine / # hostname -i 172.22.0.2
Then use the default network and use the previously created network to start the container to ping the container IP created above.
➜ ~ docker run -- rm -it alpine ping -c1 -w2 172.22.0.2 PING 172.22.0.2 (172.22.0.2): 56 data bytes --- 172.22.0.2 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss ➜ ~ docker run -- rm -it --network moelove alpine ping -c1 -w2 172.22.0.2 PING 172.22.0.2 (172.22.0.2): 56 data bytes 64 bytes from 172.22.0.2: seq =0 ttl=64 time=0.092 ms --- 172.22.0.2 ping statistics --- 1 packet transmitted, 1 packet received, 0% packet loss round-trip min/avg/max = 0.092/0.092/0.092 ms
It can be seen that if the containers are in the same network, they can be pinged successfully, but if they are containers in different networks, they cannot be pinged.
DOCKER-ISOLATION-STAGE-1 will first match the bridge from the bridge network, and the target is a different interface. If it matches, it will enter DOCKER-ISOLATION-STAGE-2. If it
does not match, it will return to the parent chain.
DOCKER-ISOLATION-STAGE-2 The matching target is the bridge of the bridge network. If it matches, it means that the data packet comes from a bridge of the bridge network,
the destination is the bridge of another bridge network, and DROPs it. . If there is no match, return to the parent chain.
Seeing this, you may ask why there are two stages of quarantine? Is it possible to directly isolate it with a chain?
The answer is yes, a chain can be isolated. This is what Docker did in its early versions.
But at that time, if there were more than 30 networks, Docker would start very slowly. So we later made this optimization
to reduce the complexity of this part from O(N^2) to O(2N). Docker will no longer start slowly.
DOCKER chain
Finally, let’s take a look at the DOCKER chain. This is the most frequently used chain in Docker and the chain with the most rules, but it is easy to understand.
Normally, if you accidentally delete the contents of this chain, it may cause network problems in the container, which can be solved manually or by restarting Docker.
Here we start a container and perform port mapping to see what changes will occur.
(MoeLove) ➜ ~ docker exec -it $(docker ps -ql) sh / # docker run - p 6379 : 6379 --rm -d redi s:alpine Unable to find image 'redis:alpine' locally alpine: Pulling from library/redis c158987b0551: Pull complete 1 a990ecc86f0: Pull complete f2520a938316: Pull complete ae8c5b65b255: Pull complete 1 f2628236ae0: Pull complete 329 dd56817a5: Pull complete Digest: sha256 : 518 c024ec78b3074917bad2d4 0863e882e5297d65587e6d7c6e0b7281d9b8270 Status: Downloaded newer image for redi s:alpine 6 bf21bd3de78ce32617bf64a6a730c0fb50e304509a2ec3ef05ceae648334294 / # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6 bf21bd3de78 redi s:alpine "docker-entrypoint.s…" 9 seconds ago Up 8 seconds 0.0 . 0.0 : 6379 -> 6379 /tcp friendly_spence
Then execute it again iptables-save
and compare the difference between the current result and the last time:
*filter +-A DOCKER -d 172.18.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 6379 -j ACCEPT *nat +-A POSTROUTING -s 172.18.0.2/32 -d 172.18.0.2/32 -p tcp -m tcp --dport 6379 -j MASQUERADE +-A DOCKER ! -i docker0 -p tcp -m tcp --dport 6379 - jDNAT --to-destination 172.18.0.2:6379
Docker adds rules to filter
tables and tables respectively. nat
Its specific meaning is as follows:
filter
This new rule in the table means: in the custom DOCKER
chain, for the target address is 172.18.0.2 and does not docker0
enter from but docker0
goes out from , and the target port is 6379, the TCP protocol will be received.
To put it simply, it means to docker0
allow the TCP protocol traffic with destination 172.18.0.2:6379 flowing out.
nat
The representation of these two rules in the table:
- Execute MASQUERADE action for the traffic with destination port 6379 on 172.18.0.2 (it can be simply understood as SNAT here);
- In the custom
DOCKER
chain, if the entry is notdocker0
and the target port is 6379, a DNAT action is performed to convert the target address to 172.18.0.2:6379. To put it simply, this rule provides us with the capability of Docker container port forwarding, converting the destination address of traffic accessing the host’s local 6379 port to 172.18.0.2:6379.
Of course, to provide complete access capabilities, it also needs to be coordinated with other rules listed above.
In addition, since there are many different network drivers in Docker, there are some differences in other modes that need to be noted.
containerd and iptables
With the complete removal of dockershim from Kubernetes, many people have switched the container runtime to containerd, and some even hope to replace all Docker environments with containerd.
But there are actually some points that need attention here. For example, in our above example, port mapping (port publishing) is actually not possible in containerd.
In containerd, you can start the same container through a command similar to the above docker, such as:
$ ctr run docker.io /library/ redis:alpine redis- 1
But it has no -p
or -P
parameters. Therefore, the ability to publish this port is specifically provided by Docker itself.
If you really want to use this function, how to do it?
One way is to manage iptables rules yourself, but it is more cumbersome.
Another way is to use nerdctl directly , which is a tool specially made for containerd and
compatible with Docker CLI. Provides many ctr
capabilities that are far richer than the default tools.
For example:
$ nerdctl run -d --name redis-1 -p 6379:6379 redis:alpine
Get its IP which is 192.168.40.9, and then check the iptables rules:
$ iptables -t nat -L | grep '192.168.40.9' CNI- 66888846605 aa0cf860a0834 all -- 192.168 . 40.9 anywhere DNAT tcp -- anywhere anywhere tcp dp t:redis to : 192.168 . 40.9 : 6379
I found that there are similar rules so that it can be accessed normally.
Summarize
This article analyzes the relationship between Docker and iptables, and analyzes the iptables rules and their meanings that will be created after Docker is started. It also introduces the actual principle of Docker port mapping through examples,
and how to use nerdctl and containerd for port mapping.
Containers have a lot of network content, but the principles are the same, and similar content is also included in Kubernetes.
Okay, that’s the content of this article.