K8S部分业务POD内存持续泄露问题

On 2020年7月7日2020年7月7日By yuer

线上K8S集群有极少量的PHP业务，它们的POD内存持续走高直到OOM，相信与特殊代码场景有关，需要展开分析。

我选择从POD的内存监控原理入手，分析到底内存用到了哪些地方。

分析过程

我把整个分析过程拆分成步骤，实际我也是按照这个步骤的逻辑逐渐展开的。

step 1

因为容器化依赖Cgroup限制内存资源，Docker采集容器的内存使用量也是基于Cgroup技术，因此需要先搞明白Cgroup，其核心原理如下：

cgroup需要先建树（实际就是目录），整个操作系统可以建多颗树，每棵树可以关联N个子系统（cpu、mem、io…），但是整个操作系统中每种子系统只能出现在1颗树中，不能出现在多个树中。

说白了，假设Cgroup有cpu、mem、io三种子系统，那么整个系统：

1）最多mount挂载3颗Cgroup树，每棵树只管理1种子系统。

2）最小mount挂载1颗Cgroup树，这棵树管理3种子系统。

step 2

实际上，Cgroup标准做法是把每个子系统作为一棵树（Hierarchy），然后在树里面创建子cgroup做资源限制。

Centos默认创建了这样的N颗树，每棵树管理1个子系统，K8S就是在这些树中创建子目录来使用Cgroup能力。

[root@10-42-53-112 ~]# ll /sys/fs/cgroup/
total 0
dr-xr-xr-x 7 root root  0 Jul  6 10:26 blkio
lrwxrwxrwx 1 root root 11 May 17 17:05 cpu -> cpu,cpuacct
lrwxrwxrwx 1 root root 11 May 17 17:05 cpuacct -> cpu,cpuacct
dr-xr-xr-x 7 root root  0 Jul  6 10:26 cpu,cpuacct
dr-xr-xr-x 5 root root  0 Jul  6 10:26 cpuset
dr-xr-xr-x 7 root root  0 Jul  6 10:26 devices
dr-xr-xr-x 5 root root  0 Jul  6 10:26 freezer
dr-xr-xr-x 5 root root  0 Jul  6 10:26 hugetlb
dr-xr-xr-x 7 root root  0 Jul  6 10:26 memory
lrwxrwxrwx 1 root root 16 May 17 17:05 net_cls -> net_cls,net_prio
dr-xr-xr-x 5 root root  0 Jul  6 10:26 net_cls,net_prio
lrwxrwxrwx 1 root root 16 May 17 17:05 net_prio -> net_cls,net_prio
dr-xr-xr-x 5 root root  0 Jul  6 10:26 perf_event
dr-xr-xr-x 7 root root  0 Jul  6 10:26 pids
dr-xr-xr-x 2 root root  0 Jul  6 10:26 rdma
dr-xr-xr-x 7 root root  0 Jul  6 10:26 systemd

[root@10-42-53-112 ~]# ll /sys/fs/cgroup/

total 0

dr-xr-xr-x 7 root root 0 Jul 6 10:26 blkio

lrwxrwxrwx 1 root root 11 May 17 17:05 cpu -> cpu,cpuacct

lrwxrwxrwx 1 root root 11 May 17 17:05 cpuacct -> cpu,cpuacct

dr-xr-xr-x 7 root root 0 Jul 6 10:26 cpu,cpuacct

dr-xr-xr-x 5 root root 0 Jul 6 10:26 cpuset

dr-xr-xr-x 7 root root 0 Jul 6 10:26 devices

dr-xr-xr-x 5 root root 0 Jul 6 10:26 freezer

dr-xr-xr-x 5 root root 0 Jul 6 10:26 hugetlb

dr-xr-xr-x 7 root root 0 Jul 6 10:26 memory

lrwxrwxrwx 1 root root 16 May 17 17:05 net_cls -> net_cls,net_prio

dr-xr-xr-x 5 root root 0 Jul 6 10:26 net_cls,net_prio

lrwxrwxrwx 1 root root 16 May 17 17:05 net_prio -> net_cls,net_prio

dr-xr-xr-x 5 root root 0 Jul 6 10:26 perf_event

dr-xr-xr-x 7 root root 0 Jul 6 10:26 pids

dr-xr-xr-x 2 root root 0 Jul 6 10:26 rdma

dr-xr-xr-x 7 root root 0 Jul 6 10:26 systemd

step 3

以内存memory为例，我们知道POD可以设置resource limit，具体是什么原理呢？

1）首先docker ps找到目标pod的相关容器，至少有2个容器，一个是pause容器，一个是应用容器。2）拿着应用容器的container id，执行docker inspect 可以看到label里有一个pod唯一标识uid：

"io.kubernetes.pod.uid": "931369e9-2a87-4090-a304-dd02122e7acc",

1	"io.kubernetes.pod.uid": "931369e9-2a87-4090-a304-dd02122e7acc",

同时，该容器ID为：

"io.kubernetes.pod.uid": "931369e9-2a87-4090-a304-dd02122e7acc",

1	"io.kubernetes.pod.uid": "931369e9-2a87-4090-a304-dd02122e7acc",

另外，标签里也说明了同POD的pause容器ID是多少：

 "io.kubernetes.sandbox.id": "dc9b09ac63191180ac5dca2836ebd15c82add818424ccf23417ebd16c0587a1d",

1	"io.kubernetes.sandbox.id": "dc9b09ac63191180ac5dca2836ebd15c82add818424ccf23417ebd16c0587a1d",

3）K8S创建了kubepods子cgroup，仍旧以memory为例：

[root@10-42-53-112 ~]# ll /sys/fs/cgroup/memory/kubepods/
total 0
drwxr-xr-x 4 root root 0 Jul  6 10:26 besteffort
drwxr-xr-x 3 root root 0 Jul  6 10:26 burstable
-rw-r--r-- 1 root root 0 Jul  6 10:26 cgroup.clone_children
--w--w--w- 1 root root 0 Jul  6 10:26 cgroup.event_control
-rw-r--r-- 1 root root 0 Jul  6 10:26 cgroup.procs
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.failcnt
--w------- 1 root root 0 Jul  6 10:26 memory.force_empty
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.failcnt
-rw-r--r-- 1 root root 0 May 17 17:05 memory.kmem.limit_in_bytes
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.slabinfo
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.tcp.failcnt
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 root root 0 Jul  6 10:26 memory.kmem.usage_in_bytes
-rw-r--r-- 1 root root 0 May 17 17:05 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.max_usage_in_bytes
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.memsw.failcnt
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jul  6 10:26 memory.memsw.usage_in_bytes
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.move_charge_at_immigrate
-r--r--r-- 1 root root 0 Jul  6 10:26 memory.numa_stat
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.oom_control
---------- 1 root root 0 Jul  6 10:26 memory.pressure_level
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.soft_limit_in_bytes
-r--r--r-- 1 root root 0 Jul  6 10:26 memory.stat
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.swappiness
-r--r--r-- 1 root root 0 Jul  6 10:26 memory.usage_in_bytes
-rw-r--r-- 1 root root 0 Jul  6 10:26 memory.use_hierarchy
-rw-r--r-- 1 root root 0 Jul  6 10:26 notify_on_release
drwxr-xr-x 4 root root 0 Jul  6 10:26 pod07ddb571-fbf5-496a-a391-938d1a5bdfef
drwxr-xr-x 4 root root 0 Jul  6 10:26 pod0d9d11d6-ce6f-41b5-9d89-31803fe050c6
drwxr-xr-x 4 root root 0 Jul  6 10:26 pod56c2f6e1-24fd-43f2-91f3-928f5f221f57
drwxr-xr-x 4 root root 0 Jul  6 10:26 pod5f3ea4c8-e39b-41e0-a729-20a5e98d6f7a
drwxr-xr-x 4 root root 0 Jul  6 10:26 pod79b470b0-2fa9-403e-b4b7-6e4878a5ac49
drwxr-xr-x 4 root root 0 Jul  6 10:26 pod8bad5a6c-3523-47c5-81ff-f641030d85b0
drwxr-xr-x 4 root root 0 Jul  6 10:26 pod931369e9-2a87-4090-a304-dd02122e7acc
-rw-r--r-- 1 root root 0 Jul  6 10:26 tasks

[root@10-42-53-112 ~]# ll /sys/fs/cgroup/memory/kubepods/

total 0

drwxr-xr-x 4 root root 0 Jul 6 10:26 besteffort

drwxr-xr-x 3 root root 0 Jul 6 10:26 burstable

-rw-r--r-- 1 root root 0 Jul 6 10:26 cgroup.clone_children

--w--w--w- 1 root root 0 Jul 6 10:26 cgroup.event_control

-rw-r--r-- 1 root root 0 Jul 6 10:26 cgroup.procs

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.failcnt

--w------- 1 root root 0 Jul 6 10:26 memory.force_empty

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.failcnt

-rw-r--r-- 1 root root 0 May 17 17:05 memory.kmem.limit_in_bytes

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.max_usage_in_bytes

-r--r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.slabinfo

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.tcp.failcnt

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.tcp.limit_in_bytes

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.tcp.max_usage_in_bytes

-r--r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.tcp.usage_in_bytes

-r--r--r-- 1 root root 0 Jul 6 10:26 memory.kmem.usage_in_bytes

-rw-r--r-- 1 root root 0 May 17 17:05 memory.limit_in_bytes

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.max_usage_in_bytes

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.memsw.failcnt

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.memsw.limit_in_bytes

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.memsw.max_usage_in_bytes

-r--r--r-- 1 root root 0 Jul 6 10:26 memory.memsw.usage_in_bytes

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.move_charge_at_immigrate

-r--r--r-- 1 root root 0 Jul 6 10:26 memory.numa_stat

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.oom_control

---------- 1 root root 0 Jul 6 10:26 memory.pressure_level

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.soft_limit_in_bytes

-r--r--r-- 1 root root 0 Jul 6 10:26 memory.stat

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.swappiness

-r--r--r-- 1 root root 0 Jul 6 10:26 memory.usage_in_bytes

-rw-r--r-- 1 root root 0 Jul 6 10:26 memory.use_hierarchy

-rw-r--r-- 1 root root 0 Jul 6 10:26 notify_on_release

drwxr-xr-x 4 root root 0 Jul 6 10:26 pod07ddb571-fbf5-496a-a391-938d1a5bdfef

drwxr-xr-x 4 root root 0 Jul 6 10:26 pod0d9d11d6-ce6f-41b5-9d89-31803fe050c6

drwxr-xr-x 4 root root 0 Jul 6 10:26 pod56c2f6e1-24fd-43f2-91f3-928f5f221f57

drwxr-xr-x 4 root root 0 Jul 6 10:26 pod5f3ea4c8-e39b-41e0-a729-20a5e98d6f7a

drwxr-xr-x 4 root root 0 Jul 6 10:26 pod79b470b0-2fa9-403e-b4b7-6e4878a5ac49

drwxr-xr-x 4 root root 0 Jul 6 10:26 pod8bad5a6c-3523-47c5-81ff-f641030d85b0

drwxr-xr-x 4 root root 0 Jul 6 10:26 pod931369e9-2a87-4090-a304-dd02122e7acc

-rw-r--r-- 1 root root 0 Jul 6 10:26 tasks

K8S资源限制是POD级的，所以K8S还会在这个cgroup下创建POD的子memory cgroup，进行POD级具体的资源限制。

在继续深入POD级cgroup之前，我们看一下kubepods这一级的内存限制：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/memory.limit_in_bytes  
32457519104

1 2	[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/memory.limit_in_bytes 32457519104

所有POD的总内存限制为30.23G，宿主机是32G内存，其他1G多内存没有纳入cgroup是因为kubelet配置的预留内存导致的。

step 4

根据上面找到的POD，就可以继续定位到POD级的cgroup了：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/memory.limit_in_bytes 
2147483648

1 2	[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/memory.limit_in_bytes 2147483648

整个POD限制为2G，符合Deployment YAML定义。

step 5

再往POD下面一级就是container的cgroup了，这里的内存会限制为什么呢？

cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.limit_in_bytes 
2147483648

1 2	cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.limit_in_bytes 2147483648

看样是继承了POD级的限制，反正POD级就那么多内存，里面的单个容器最多也就用这些。

为什么还要做container级的cgroup呢？这样做，至少memory的使用明细是可以具体到container去查看的：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.usage_in_bytes 
1949036544

1 2	[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.usage_in_bytes 1949036544

会发现应用容器占了1.8G左右，快要把POD的内存限制用满了。（也可以通过docker stats命令查看到容器内存占用）

我们拿着之前发现的sandbox容器ID（实际就是pause容器）查看一下内存使用：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/dc9b09ac63191180ac5dca2836ebd15c82add818424ccf23417ebd16c0587a1d/memory.usage_in_bytes 
1089536

1 2	[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/dc9b09ac63191180ac5dca2836ebd15c82add818424ccf23417ebd16c0587a1d/memory.usage_in_bytes 1089536

只用了1M左右，因此pause容器的内存占用可以忽略。

step 6

那么应用容器真的占用了1.8G吗？实际我们详细看应用容器的内存使用统计：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.stat 
cache 60653568
rss 325496832
rss_huge 0
shmem 26628096
mapped_file 27844608
dirty 540672
writeback 946176
swap 0
pgpgin 3729620103
pgpgout 3729526233
pgfault 5994964305
pgmajfault 0
inactive_anon 27070464
active_anon 324997120
inactive_file 24436736
active_file 8134656
unevictable 0
hierarchical_memory_limit 2147483648
hierarchical_memsw_limit 2147483648
total_cache 60653568
total_rss 325496832
total_rss_huge 0
total_shmem 26628096
total_mapped_file 27844608
total_dirty 540672
total_writeback 946176
total_swap 0
total_pgpgin 3729620103
total_pgpgout 3729526233
total_pgfault 5994964305
total_pgmajfault 0
total_inactive_anon 27070464
total_active_anon 324997120
total_inactive_file 24436736
total_active_file 8134656
total_unevictable 0

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.stat

cache 60653568

rss 325496832

rss_huge 0

shmem 26628096

mapped_file 27844608

dirty 540672

writeback 946176

swap 0

pgpgin 3729620103

pgpgout 3729526233

pgfault 5994964305

pgmajfault 0

inactive_anon 27070464

active_anon 324997120

inactive_file 24436736

active_file 8134656

unevictable 0

hierarchical_memory_limit 2147483648

hierarchical_memsw_limit 2147483648

total_cache 60653568

total_rss 325496832

total_rss_huge 0

total_shmem 26628096

total_mapped_file 27844608

total_dirty 540672

total_writeback 946176

total_swap 0

total_pgpgin 3729620103

total_pgpgout 3729526233

total_pgfault 5994964305

total_pgmajfault 0

total_inactive_anon 27070464

total_active_anon 324997120

total_inactive_file 24436736

total_active_file 8134656

total_unevictable 0

会发现total_rss和total_cache加起来不过300MB+，其他内存跑哪里去了？

step 7

经过了解，cgroup的memory.usage_in_bytes除了计算rss和swap外，还统计了kmem，也就是内核使用内存，我们查看一下实际kmem使用量：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.kmem.usage_in_bytes 
1564602368

1 2	[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.kmem.usage_in_bytes 1564602368

果然1.5G左右，和rss加起来大概就是1.8G了，为什么这个应用容器大部分内存都被kernel使用了呢？用来做啥呢？

step 8

kmem体现在内核slab内存的分配使用，可以直接查看应用容器的slabinfo：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.kmem.slabinfo

1	[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.kmem.slabinfo

找到内存占用高的容器，查看其slabinfo：

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.kmem.slabinfo 
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
tw_sock_TCP          544    544    240   34    2 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-8192          64     64   8192    4    8 : tunables    0    0    0 : slabdata     16     16      0
hugetlbfs_inode_cache      0      0    624   52    8 : tunables    0    0    0 : slabdata      0      0      0
UDPv6                 25     25   1280   25    8 : tunables    0    0    0 : slabdata      1      1      0
TCPv6                 28     28   2304   14    8 : tunables    0    0    0 : slabdata      2      2      0
TCP                  240    240   2176   15    8 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-16          2560   2560     16  256    1 : tunables    0    0    0 : slabdata     10     10      0
radix_tree_node     5208   5208    584   56    8 : tunables    0    0    0 : slabdata     93     93      0
kmalloc-96           672    672     96   42    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-2048         256    256   2048   16    8 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-1024         512    512   1024   32    8 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-192          672    672    192   42    2 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-8           8192   8192      8  512    1 : tunables    0    0    0 : slabdata     16     16      0
xfs_inode           3618   4114    960   34    8 : tunables    0    0    0 : slabdata    121    121      0
ovl_inode           3300   3504    680   48    8 : tunables    0    0    0 : slabdata     73     73      0
kmalloc-32          2048   2048     32  128    1 : tunables    0    0    0 : slabdata     16     16      0
eventpoll_pwq        896    896     72   56    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-64          1024   1024     64   64    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-4096         168    168   4096    8    8 : tunables    0    0    0 : slabdata     21     21      0
pde_opener          1632   1632     40  102    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-512          576    576    512   32    4 : tunables    0    0    0 : slabdata     18     18      0
skbuff_head_cache    640    640    256   32    2 : tunables    0    0    0 : slabdata     20     20      0
uts_namespace          0      0    440   37    4 : tunables    0    0    0 : slabdata      0      0      0
inode_cache          864    864    600   54    8 : tunables    0    0    0 : slabdata     16     16      0
pid                  608    608    128   32    1 : tunables    0    0    0 : slabdata     19     19      0
signal_cache         510    510   1088   30    8 : tunables    0    0    0 : slabdata     17     17      0
sighand_cache        255    255   2112   15    8 : tunables    0    0    0 : slabdata     17     17      0
files_cache          736    736    704   46    8 : tunables    0    0    0 : slabdata     16     16      0
task_struct          173    200   7808    4    8 : tunables    0    0    0 : slabdata     50     50      0
UNIX                 576    576   1024   32    8 : tunables    0    0    0 : slabdata     18     18      0
sock_inode_cache     736    736    704   46    8 : tunables    0    0    0 : slabdata     16     16      0
mm_struct            512    512   1024   32    8 : tunables    0    0    0 : slabdata     16     16      0
cred_jar            2394   2394    192   42    2 : tunables    0    0    0 : slabdata     57     57      0
shmem_inode_cache    414    414    704   46    8 : tunables    0    0    0 : slabdata      9      9      0
proc_inode_cache    4419   4512    672   48    8 : tunables    0    0    0 : slabdata     94     94      0
dentry            7900536 7900536    192   42    2 : tunables    0    0    0 : slabdata 188108 188108      0
filp                5344   5344    256   32    2 : tunables    0    0    0 : slabdata    167    167      0
anon_vma            7200   7452     88   46    1 : tunables    0    0    0 : slabdata    162    162      0
anon_vma_chain     17088  17088     64   64    1 : tunables    0    0    0 : slabdata    267    267      0
vm_area_struct     11486  11560    200   40    2 : tunables    0    0    0 : slabdata    289    289      0

[root@10-42-53-112 ~]# cat /sys/fs/cgroup/memory/kubepods/pod931369e9-2a87-4090-a304-dd02122e7acc/7e75c3921b2157ccecc5cff5055940c782f02cb8227ae080874220bb06124dad/memory.kmem.slabinfo

slabinfo - version: 2.1

# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>

tw_sock_TCP 544 544 240 34 2 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-8192 64 64 8192 4 8 : tunables 0 0 0 : slabdata 16 16 0

hugetlbfs_inode_cache 0 0 624 52 8 : tunables 0 0 0 : slabdata 0 0 0

UDPv6 25 25 1280 25 8 : tunables 0 0 0 : slabdata 1 1 0

TCPv6 28 28 2304 14 8 : tunables 0 0 0 : slabdata 2 2 0

TCP 240 240 2176 15 8 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-16 2560 2560 16 256 1 : tunables 0 0 0 : slabdata 10 10 0

radix_tree_node 5208 5208 584 56 8 : tunables 0 0 0 : slabdata 93 93 0

kmalloc-96 672 672 96 42 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-2048 256 256 2048 16 8 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-1024 512 512 1024 32 8 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-192 672 672 192 42 2 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-8 8192 8192 8 512 1 : tunables 0 0 0 : slabdata 16 16 0

xfs_inode 3618 4114 960 34 8 : tunables 0 0 0 : slabdata 121 121 0

ovl_inode 3300 3504 680 48 8 : tunables 0 0 0 : slabdata 73 73 0

kmalloc-32 2048 2048 32 128 1 : tunables 0 0 0 : slabdata 16 16 0

eventpoll_pwq 896 896 72 56 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-64 1024 1024 64 64 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-4096 168 168 4096 8 8 : tunables 0 0 0 : slabdata 21 21 0

pde_opener 1632 1632 40 102 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-512 576 576 512 32 4 : tunables 0 0 0 : slabdata 18 18 0

skbuff_head_cache 640 640 256 32 2 : tunables 0 0 0 : slabdata 20 20 0

uts_namespace 0 0 440 37 4 : tunables 0 0 0 : slabdata 0 0 0

inode_cache 864 864 600 54 8 : tunables 0 0 0 : slabdata 16 16 0

pid 608 608 128 32 1 : tunables 0 0 0 : slabdata 19 19 0

signal_cache 510 510 1088 30 8 : tunables 0 0 0 : slabdata 17 17 0

sighand_cache 255 255 2112 15 8 : tunables 0 0 0 : slabdata 17 17 0

files_cache 736 736 704 46 8 : tunables 0 0 0 : slabdata 16 16 0

task_struct 173 200 7808 4 8 : tunables 0 0 0 : slabdata 50 50 0

UNIX 576 576 1024 32 8 : tunables 0 0 0 : slabdata 18 18 0

sock_inode_cache 736 736 704 46 8 : tunables 0 0 0 : slabdata 16 16 0

mm_struct 512 512 1024 32 8 : tunables 0 0 0 : slabdata 16 16 0

cred_jar 2394 2394 192 42 2 : tunables 0 0 0 : slabdata 57 57 0

shmem_inode_cache 414 414 704 46 8 : tunables 0 0 0 : slabdata 9 9 0

proc_inode_cache 4419 4512 672 48 8 : tunables 0 0 0 : slabdata 94 94 0

dentry 7900536 7900536 192 42 2 : tunables 0 0 0 : slabdata 188108 188108 0

filp 5344 5344 256 32 2 : tunables 0 0 0 : slabdata 167 167 0

anon_vma 7200 7452 88 46 1 : tunables 0 0 0 : slabdata 162 162 0

anon_vma_chain 17088 17088 64 64 1 : tunables 0 0 0 : slabdata 267 267 0

vm_area_struct 11486 11560 200 40 2 : tunables 0 0 0 : slabdata 289 289 0

找到内存占用低的容器，查看其slabinfo：

[root@10-10-67-233 ~]# cat /sys/fs/cgroup/memory/kubepods/pod180b0a55-7c9a-45d3-a13b-01b654dce11a/c4d6ac72bfb3c98cb901a9a4c0d6a39408b16f3b68c0a142dbc507f07e1366ec/memory.kmem.slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
tw_sock_TCP          238    238    240   34    2 : tunables    0    0    0 : slabdata      7      7      0
kmalloc-8192          64     64   8192    4    8 : tunables    0    0    0 : slabdata     16     16      0
hugetlbfs_inode_cache      0      0    624   52    8 : tunables    0    0    0 : slabdata      0      0      0
UDPv6                  0      0   1280   25    8 : tunables    0    0    0 : slabdata      0      0      0
TCPv6                  0      0   2304   14    8 : tunables    0    0    0 : slabdata      0      0      0
TCP                  240    240   2176   15    8 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-16          3840   3840     16  256    1 : tunables    0    0    0 : slabdata     15     15      0
kmalloc-96           672    672     96   42    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-2048         256    256   2048   16    8 : tunables    0    0    0 : slabdata     16     16      0
radix_tree_node     1400   1400    584   56    8 : tunables    0    0    0 : slabdata     25     25      0
kmalloc-1024         512    512   1024   32    8 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-192          672    672    192   42    2 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-8           8192   8192      8  512    1 : tunables    0    0    0 : slabdata     16     16      0
xfs_inode            578    578    960   34    8 : tunables    0    0    0 : slabdata     17     17      0
ovl_inode           1056   1056    680   48    8 : tunables    0    0    0 : slabdata     22     22      0
kmalloc-32          2048   2048     32  128    1 : tunables    0    0    0 : slabdata     16     16      0
eventpoll_pwq        896    896     72   56    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-64          1024   1024     64   64    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-4096         136    136   4096    8    8 : tunables    0    0    0 : slabdata     17     17      0
pde_opener          1632   1632     40  102    1 : tunables    0    0    0 : slabdata     16     16      0
kmalloc-512          512    512    512   32    4 : tunables    0    0    0 : slabdata     16     16      0
skbuff_head_cache    512    512    256   32    2 : tunables    0    0    0 : slabdata     16     16      0
uts_namespace          0      0    440   37    4 : tunables    0    0    0 : slabdata      0      0      0
inode_cache          864    864    600   54    8 : tunables    0    0    0 : slabdata     16     16      0
pid                  544    544    128   32    1 : tunables    0    0    0 : slabdata     17     17      0
signal_cache         480    480   1088   30    8 : tunables    0    0    0 : slabdata     16     16      0
sighand_cache        255    255   2112   15    8 : tunables    0    0    0 : slabdata     17     17      0
files_cache          736    736    704   46    8 : tunables    0    0    0 : slabdata     16     16      0
task_struct          208    212   7808    4    8 : tunables    0    0    0 : slabdata     53     53      0
UNIX                 512    512   1024   32    8 : tunables    0    0    0 : slabdata     16     16      0
sock_inode_cache     736    736    704   46    8 : tunables    0    0    0 : slabdata     16     16      0
mm_struct            512    512   1024   32    8 : tunables    0    0    0 : slabdata     16     16      0
cred_jar            2016   2016    192   42    2 : tunables    0    0    0 : slabdata     48     48      0
shmem_inode_cache    368    368    704   46    8 : tunables    0    0    0 : slabdata      8      8      0
proc_inode_cache    2409   2592    672   48    8 : tunables    0    0    0 : slabdata     54     54      0
dentry            1642116 1642116    192   42    2 : tunables    0    0    0 : slabdata  39098  39098      0
filp                4928   4928    256   32    2 : tunables    0    0    0 : slabdata    154    154      0
anon_vma           11914  11914     88   46    1 : tunables    0    0    0 : slabdata    259    259      0
anon_vma_chain     10816  10816     64   64    1 : tunables    0    0    0 : slabdata    169    169      0
vm_area_struct      9974  10200    200   40    2 : tunables    0    0    0 : slabdata    255    255      0

[root@10-10-67-233 ~]# cat /sys/fs/cgroup/memory/kubepods/pod180b0a55-7c9a-45d3-a13b-01b654dce11a/c4d6ac72bfb3c98cb901a9a4c0d6a39408b16f3b68c0a142dbc507f07e1366ec/memory.kmem.slabinfo

slabinfo - version: 2.1

# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>

tw_sock_TCP 238 238 240 34 2 : tunables 0 0 0 : slabdata 7 7 0

kmalloc-8192 64 64 8192 4 8 : tunables 0 0 0 : slabdata 16 16 0

hugetlbfs_inode_cache 0 0 624 52 8 : tunables 0 0 0 : slabdata 0 0 0

UDPv6 0 0 1280 25 8 : tunables 0 0 0 : slabdata 0 0 0

TCPv6 0 0 2304 14 8 : tunables 0 0 0 : slabdata 0 0 0

TCP 240 240 2176 15 8 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-16 3840 3840 16 256 1 : tunables 0 0 0 : slabdata 15 15 0

kmalloc-96 672 672 96 42 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-2048 256 256 2048 16 8 : tunables 0 0 0 : slabdata 16 16 0

radix_tree_node 1400 1400 584 56 8 : tunables 0 0 0 : slabdata 25 25 0

kmalloc-1024 512 512 1024 32 8 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-192 672 672 192 42 2 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-8 8192 8192 8 512 1 : tunables 0 0 0 : slabdata 16 16 0

xfs_inode 578 578 960 34 8 : tunables 0 0 0 : slabdata 17 17 0

ovl_inode 1056 1056 680 48 8 : tunables 0 0 0 : slabdata 22 22 0

kmalloc-32 2048 2048 32 128 1 : tunables 0 0 0 : slabdata 16 16 0

eventpoll_pwq 896 896 72 56 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-64 1024 1024 64 64 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-4096 136 136 4096 8 8 : tunables 0 0 0 : slabdata 17 17 0

pde_opener 1632 1632 40 102 1 : tunables 0 0 0 : slabdata 16 16 0

kmalloc-512 512 512 512 32 4 : tunables 0 0 0 : slabdata 16 16 0

skbuff_head_cache 512 512 256 32 2 : tunables 0 0 0 : slabdata 16 16 0

uts_namespace 0 0 440 37 4 : tunables 0 0 0 : slabdata 0 0 0

inode_cache 864 864 600 54 8 : tunables 0 0 0 : slabdata 16 16 0

pid 544 544 128 32 1 : tunables 0 0 0 : slabdata 17 17 0

signal_cache 480 480 1088 30 8 : tunables 0 0 0 : slabdata 16 16 0

sighand_cache 255 255 2112 15 8 : tunables 0 0 0 : slabdata 17 17 0

files_cache 736 736 704 46 8 : tunables 0 0 0 : slabdata 16 16 0

task_struct 208 212 7808 4 8 : tunables 0 0 0 : slabdata 53 53 0

UNIX 512 512 1024 32 8 : tunables 0 0 0 : slabdata 16 16 0

sock_inode_cache 736 736 704 46 8 : tunables 0 0 0 : slabdata 16 16 0

mm_struct 512 512 1024 32 8 : tunables 0 0 0 : slabdata 16 16 0

cred_jar 2016 2016 192 42 2 : tunables 0 0 0 : slabdata 48 48 0

shmem_inode_cache 368 368 704 46 8 : tunables 0 0 0 : slabdata 8 8 0

proc_inode_cache 2409 2592 672 48 8 : tunables 0 0 0 : slabdata 54 54 0

dentry 1642116 1642116 192 42 2 : tunables 0 0 0 : slabdata 39098 39098 0

filp 4928 4928 256 32 2 : tunables 0 0 0 : slabdata 154 154 0

anon_vma 11914 11914 88 46 1 : tunables 0 0 0 : slabdata 259 259 0

anon_vma_chain 10816 10816 64 64 1 : tunables 0 0 0 : slabdata 169 169 0

vm_area_struct 9974 10200 200 40 2 : tunables 0 0 0 : slabdata 255 255 0

dentry占用内存的差距最大，可以通过7900536*192得出大概是1.4G，的确吻合内存占比，那么它的用途是什么呢？大概就是文件项缓存之类的用途，具体参考：https://zhuanlan.zhihu.com/p/43133085。

step 9

上述容器使用了790万的dentry，占了1.4G内存；宿主机执行slabtop可以看到整机分配了3000万的dentry，占了6G左右内存。

我们只有个别的应用存在内存泄露情况，怀疑与代码特殊行为有关，尝试strace了一下php-fpm，看是否有大量文件操作导致dentry增加：

[root@zfilter-api-smzdm-com-5657c49d7d-cgzzm ~]# strace -p 48 2>&1 |grep open
open("/tmp/phpn1quPl", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phphHcgMJ", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpp6Oq1f", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpPRfPXO", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpzms77o", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpDps111", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpHmbTnH", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpl7hv2u", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpRYE3jq", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpfGUqxn", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
open("/tmp/phpVodQrn", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

[root@zfilter-api-smzdm-com-5657c49d7d-cgzzm ~]# strace -p 48 2>&1 |grep open

open("/tmp/phpn1quPl", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phphHcgMJ", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpp6Oq1f", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpPRfPXO", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpzms77o", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpDps111", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpHmbTnH", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpl7hv2u", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpRYE3jq", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpfGUqxn", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

open("/tmp/phpVodQrn", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

竟然真的在不停的创建临时文件。

进一步strace保存完整日志，找到创建/tmp文件的HTTP请求信息：

strace -p 48 -s 2048 > ./debug.log 2>&1

1	strace -p 48 -s 2048 > ./debug.log 2>&1

从debug.log中，可以明确创建临时文件的接口是/comment/bgm_bulk_index，POST长度102633，类型是application/x-www-form-urlencoded：

read(4, "\0178SCRIPT_FILENAME/data/webroot/phpsrc/zfilter-api-smzdm-com/api/index.php\f\0QUERY_STRING\16\4REQUEST_METHODPOST\f!CONTENT_TYPEapplication/x-www-form-urlencoded\16\6CONTENT_LENGTH102633\v\nSCRIPT_NAME/index.php\v\27REQUEST_URI/comment/bgm_bulk_index\f\nDOCUMENT_URI/index.php\r.DOCUMENT_ROOT/data/webroot/phpsrc/zfilter-api-smzdm-com/api\17\10SERVER_PROTOCOLHTTP/1.1\21\7GATEWAY_INTERFACECGI/1.1\17\tSERVER_SOFTWAREnginx/1.7\v\fREMOTE_ADDR10.42.53.112\v\5REMOTE_PORT49028\v\fSERVER_ADDR10.42.130.74\v\3SERVER_PORT809\v\25SERVER_NAMEzfilter-api.smzdm.com\17\3REDIRECT_STATUS200\t\31HTTP_HOSTzfilter-api.smzdm.com:809\17\22HTTP_USER_AGENTSMZDM PHP CURL 1.0\v\3HTTP_ACCEPT*/*\27VHTTP__CATCALLFROMMETHOD/data/webroot/phpsrc/phpjob-comments-job/job/index.php daemon refresh_comment_es 0 8 0\25\36HTTP__CATCALLERDOMAINphpjob.phpjob-comments-job.job\25SHTTP__CATCALLERMETHODhttp%3a%2f%2fzfilter%2dapi%2esmzdm%2ecom%3a%38%30%39%2fcomment%2fbgm%5fbulk%5findex\27-HTTP__CATCHILDMESSAGEIDzfilter-api.smzdm.com-0a2abcbf-442781-5682431\0305HTTP__CATPARENTMESSAGEIDphpjob.phpjob-comments-job.job-0a2abcbf-442781-275909\0265HTTP__CATROOTMESSAGEIDphpjob.phpjob-comments-job.job-0a2abcbf-442781-275909\23\6HTTP_CONTENT_LENGTH102633\21!HTTP_CONTENT_TYPEapplication/x-www-form-urlencoded\v\fHTTP_EXPECT100-continue\0\0", 1224) = 1224

read(4, "\0178SCRIPT_FILENAME/data/webroot/phpsrc/zfilter-api-smzdm-com/api/index.php\f\0QUERY_STRING\16\4REQUEST_METHODPOST\f!CONTENT_TYPEapplication/x-www-form-urlencoded\16\6CONTENT_LENGTH102633\v\nSCRIPT_NAME/index.php\v\27REQUEST_URI/comment/bgm_bulk_index\f\nDOCUMENT_URI/index.php\r.DOCUMENT_ROOT/data/webroot/phpsrc/zfilter-api-smzdm-com/api\17\10SERVER_PROTOCOLHTTP/1.1\21\7GATEWAY_INTERFACECGI/1.1\17\tSERVER_SOFTWAREnginx/1.7\v\fREMOTE_ADDR10.42.53.112\v\5REMOTE_PORT49028\v\fSERVER_ADDR10.42.130.74\v\3SERVER_PORT809\v\25SERVER_NAMEzfilter-api.smzdm.com\17\3REDIRECT_STATUS200\t\31HTTP_HOSTzfilter-api.smzdm.com:809\17\22HTTP_USER_AGENTSMZDM PHP CURL 1.0\v\3HTTP_ACCEPT*/*\27VHTTP__CATCALLFROMMETHOD/data/webroot/phpsrc/phpjob-comments-job/job/index.php daemon refresh_comment_es 0 8 0\25\36HTTP__CATCALLERDOMAINphpjob.phpjob-comments-job.job\25SHTTP__CATCALLERMETHODhttp%3a%2f%2fzfilter%2dapi%2esmzdm%2ecom%3a%38%30%39%2fcomment%2fbgm%5fbulk%5findex\27-HTTP__CATCHILDMESSAGEIDzfilter-api.smzdm.com-0a2abcbf-442781-5682431\0305HTTP__CATPARENTMESSAGEIDphpjob.phpjob-comments-job.job-0a2abcbf-442781-275909\0265HTTP__CATROOTMESSAGEIDphpjob.phpjob-comments-job.job-0a2abcbf-442781-275909\23\6HTTP_CONTENT_LENGTH102633\21!HTTP_CONTENT_TYPEapplication/x-www-form-urlencoded\v\fHTTP_EXPECT100-continue\0\0", 1224) = 1224

其行为是先读取socket读进来16384字节的数据：

read(4, "doc_arr=%5B%7B%22comment_id%22%3A18677409%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375099%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%221604147%22%2C%22content1%22%3A%22%40%5Cu81ea%5Cu7531%5Cu843d%5Cu4f53+%5Cu80fd%5Cu52a0%5Cu4f60%5Cu597d%5Cu53cb%5Cu5417%5Cuff1f++%22%2C%22root_id%22%3A0%2C%22parent_id%22%3A0%2C%22parent_ids%22%3A%22%22%2C%22content2%22%3Anull%2C%22content3%22%3Anull%2C%22content4%22%3Anull%2C%22content5%22%3Anull%2C%22ip%22%3A%22180.156.213.241%22%2C%22remote_ip%22%3A%22%22%2C%22user_agent%22%3A%22%5Cu4ec0%5Cu4e48%5Cu503c%5Cu5f97%5Cu4e70HD+2.2.4+rv%3A3+%28iPad%3B+iPhone+OS+8.1.1%3B+zh_CN%29%22%2C%22comment_from%22%3A%22%22%2C%22reply_from%22%3A0%2C%22creation_date%22%3A%222014-12-09+20%3A45%3A27%22%2C%22card_num%22%3A0%2C%22up_num%22%3A0%2C%22down_num%22%3A0%2C%22sort_v1%22%3A0%2C%22sort_v2%22%3A0%2C%22sort_v3%22%3A0%2C%22sort_v4%22%3A0%2C%22sort_v5%22%3A0%2C%22children_ids_1%22%3A%22%22%2C%22children_ids_2%22%3A%22%22%2C%22children_ids_3%22%3A%22%22%2C%22children_ids_4%22%3A%22%22%2C%22children_ids_5%22%3A%22%22%2C%22status%22%3A1%2C%22is_locked%22%3A0%2C%22comment_card_list%22%3A%5B%5D%2C%22report%22%3A%7B%22report_latest_time%22%3A%220000-00-00+00%3A00%3A00%22%2C%22report_count%22%3A0%2C%22report_count2%22%3A0%7D%2C%22have_read%22%3A1%2C%22origin_status%22%3A0%2C%22report_log_list%22%3A%5B%5D%2C%22user_info%22%3A%7B%22user_id%22%3A2772343%2C%22nickname%22%3A%22mini%5Cu6768%5Cu4e3d%22%7D%2C%22admin_log_list%22%3A%5B%7B%22id%22%3A7061455%2C%22editor_id%22%3A187%2C%22editor_name%22%3A%22wangtao%22%2C%22ctype%22%3A3%2C%22description%22%3A%22%22%2C%22op_module%22%3A%22%22%2C%22creation_date%22%3A%222015-03-10+11%3A18%3A26%22%7D%5D%2C%22admin_first_check_time%22%3A%222015-03-10+11%3A18%3A26%22%2C%22ploy_risk_type%22%3A%22EMPTY%22%2C%22ploy_description%22%3A%22%22%2C%22risk_type_tencent%22%3A%22EMPTY%22%2C%22tencent_moderation_description%22%3A%22%22%7D%2C%7B%22comment_id%22%3A18677417%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375775%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%22%"..., 16384) = 16384

read(4, "doc_arr=%5B%7B%22comment_id%22%3A18677409%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375099%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%221604147%22%2C%22content1%22%3A%22%40%5Cu81ea%5Cu7531%5Cu843d%5Cu4f53+%5Cu80fd%5Cu52a0%5Cu4f60%5Cu597d%5Cu53cb%5Cu5417%5Cuff1f++%22%2C%22root_id%22%3A0%2C%22parent_id%22%3A0%2C%22parent_ids%22%3A%22%22%2C%22content2%22%3Anull%2C%22content3%22%3Anull%2C%22content4%22%3Anull%2C%22content5%22%3Anull%2C%22ip%22%3A%22180.156.213.241%22%2C%22remote_ip%22%3A%22%22%2C%22user_agent%22%3A%22%5Cu4ec0%5Cu4e48%5Cu503c%5Cu5f97%5Cu4e70HD+2.2.4+rv%3A3+%28iPad%3B+iPhone+OS+8.1.1%3B+zh_CN%29%22%2C%22comment_from%22%3A%22%22%2C%22reply_from%22%3A0%2C%22creation_date%22%3A%222014-12-09+20%3A45%3A27%22%2C%22card_num%22%3A0%2C%22up_num%22%3A0%2C%22down_num%22%3A0%2C%22sort_v1%22%3A0%2C%22sort_v2%22%3A0%2C%22sort_v3%22%3A0%2C%22sort_v4%22%3A0%2C%22sort_v5%22%3A0%2C%22children_ids_1%22%3A%22%22%2C%22children_ids_2%22%3A%22%22%2C%22children_ids_3%22%3A%22%22%2C%22children_ids_4%22%3A%22%22%2C%22children_ids_5%22%3A%22%22%2C%22status%22%3A1%2C%22is_locked%22%3A0%2C%22comment_card_list%22%3A%5B%5D%2C%22report%22%3A%7B%22report_latest_time%22%3A%220000-00-00+00%3A00%3A00%22%2C%22report_count%22%3A0%2C%22report_count2%22%3A0%7D%2C%22have_read%22%3A1%2C%22origin_status%22%3A0%2C%22report_log_list%22%3A%5B%5D%2C%22user_info%22%3A%7B%22user_id%22%3A2772343%2C%22nickname%22%3A%22mini%5Cu6768%5Cu4e3d%22%7D%2C%22admin_log_list%22%3A%5B%7B%22id%22%3A7061455%2C%22editor_id%22%3A187%2C%22editor_name%22%3A%22wangtao%22%2C%22ctype%22%3A3%2C%22description%22%3A%22%22%2C%22op_module%22%3A%22%22%2C%22creation_date%22%3A%222015-03-10+11%3A18%3A26%22%7D%5D%2C%22admin_first_check_time%22%3A%222015-03-10+11%3A18%3A26%22%2C%22ploy_risk_type%22%3A%22EMPTY%22%2C%22ploy_description%22%3A%22%22%2C%22risk_type_tencent%22%3A%22EMPTY%22%2C%22tencent_moderation_description%22%3A%22%22%7D%2C%7B%22comment_id%22%3A18677417%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375775%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%22%"..., 16384) = 16384

然后才创建了1个临时文件开始写入后续数据：

getcwd("/data/webroot/phpsrc/zfilter-api-smzdm-com/api", 4096) = 47
open("/tmp/phpxdEznL", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
write(5, "doc_arr=%5B%7B%22comment_id%22%3A18677409%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375099%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%221604147%22%2C%22content1%22%3A%22%40%5Cu81ea%5Cu7531%5Cu843d%5Cu4f53+%5Cu80fd%5Cu52a0%5Cu4f60%5Cu597d%5Cu53cb%5Cu5417%5Cuff1f++%22%2C%22root_id%22%3A0%2C%22parent_id%22%3A0%2C%22parent_ids%22%3A%22%22%2C%22content2%22%3Anull%2C%22content3%22%3Anull%2C%22content4%22%3Anull%2C%22content5%22%3Anull%2C%22ip%22%3A%22180.156.213.241%22%2C%22remote_ip%22%3A%22%22%2C%22user_agent%22%3A%22%5Cu4ec0%5Cu4e48%5Cu503c%5Cu5f97%5Cu4e70HD+2.2.4+rv%3A3+%28iPad%3B+iPhone+OS+8.1.1%3B+zh_CN%29%22%2C%22comment_from%22%3A%22%22%2C%22reply_from%22%3A0%2C%22creation_date%22%3A%222014-12-09+20%3A45%3A27%22%2C%22card_num%22%3A0%2C%22up_num%22%3A0%2C%22down_num%22%3A0%2C%22sort_v1%22%3A0%2C%22sort_v2%22%3A0%2C%22sort_v3%22%3A0%2C%22sort_v4%22%3A0%2C%22sort_v5%22%3A0%2C%22children_ids_1%22%3A%22%22%2C%22children_ids_2%22%3A%22%22%2C%22children_ids_3%22%3A%22%22%2C%22children_ids_4%22%3A%22%22%2C%22children_ids_5%22%3A%22%22%2C%22status%22%3A1%2C%22is_locked%22%3A0%2C%22comment_card_list%22%3A%5B%5D%2C%22report%22%3A%7B%22report_latest_time%22%3A%220000-00-00+00%3A00%3A00%22%2C%22report_count%22%3A0%2C%22report_count2%22%3A0%7D%2C%22have_read%22%3A1%2C%22origin_status%22%3A0%2C%22report_log_list%22%3A%5B%5D%2C%22user_info%22%3A%7B%22user_id%22%3A2772343%2C%22nickname%22%3A%22mini%5Cu6768%5Cu4e3d%22%7D%2C%22admin_log_list%22%3A%5B%7B%22id%22%3A7061455%2C%22editor_id%22%3A187%2C%22editor_name%22%3A%22wangtao%22%2C%22ctype%22%3A3%2C%22description%22%3A%22%22%2C%22op_module%22%3A%22%22%2C%22creation_date%22%3A%222015-03-10+11%3A18%3A26%22%7D%5D%2C%22admin_first_check_time%22%3A%222015-03-10+11%3A18%3A26%22%2C%22ploy_risk_type%22%3A%22EMPTY%22%2C%22ploy_description%22%3A%22%22%2C%22risk_type_tencent%22%3A%22EMPTY%22%2C%22tencent_moderation_description%22%3A%22%22%7D%2C%7B%22comment_id%22%3A18677417%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375775%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%22%"..., 8192) = 8192
write(5, "249%22%2C%22user_agent%22%3A%22Mozilla%5C%2F5.0+%28Windows+NT+6.3%3B+WOW64%29+AppleWebKit%5C%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%5C%2F37.0.2062.120+Safari%5C%2F537.36%22%2C%22comment_from%22%3A%22%22%2C%22reply_from%22%3A0%2C%22creation_date%22%3A%222014-12-09+20%3A46%3A04%22%2C%22card_num%22%3A0%2C%22up_num%22%3A0%2C%22down_num%22%3A0%2C%22sort_v1%22%3A0%2C%22sort_v2%22%3A0%2C%22sort_v3%22%3A0%2C%22sort_v4%22%3A0%2C%22sort_v5%22%3A0%2C%22children_ids_1%22%3A%22%22%2C%22children_ids_2%22%3A%22%22%2C%22children_ids_3%22%3A%22%22%2C%22children_ids_4%22%3A%22%22%2C%22children_ids_5%22%3A%22%22%2C%22status%22%3A1%2C%22is_locked%22%3A0%2C%22comment_card_list%22%3A%5B%5D%2C%22report%22%3A%7B%22report_latest_time%22%3A%220000-00-00+00%3A00%3A00%22%2C%22report_count%22%3A0%2C%22report_count2%22%3A0%7D%2C%22have_read%22%3A1%2C%22origin_status%22%3A0%2C%22report_log_list%22%3A%5B%5D%2C%22user_info%22%3A%7B%22user_id%22%3A755583%2C%22nickname%22%3A%22%5Cu5b59%5Cu5c0f%5Cu9c81%22%7D%2C%22admin_log_list%22%3A%5B%7B%22id%22%3A7061445%2C%22editor_id%22%3A187%2C%22editor_name%22%3A%22wangtao%22%2C%22ctype%22%3A3%2C%22description%22%3A%22%22%2C%22op_module%22%3A%22%22%2C%22creation_date%22%3A%222015-03-10+11%3A18%3A26%22%7D%5D%2C%22admin_first_check_time%22%3A%222015-03-10+11%3A18%3A26%22%2C%22ploy_risk_type%22%3A%22EMPTY%22%2C%22ploy_description%22%3A%22%22%2C%22risk_type_tencent%22%3A%22EMPTY%22%2C%22tencent_moderation_description%22%3A%22%22%7D%2C%7B%22comment_id%22%3A18677457%2C%22channel_id%22%3A3%2C%22article_id%22%3A5309159%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%22%22%2C%22content1%22%3A%22%5Cu521a%5Cu4e70%5Cu7684%5Cuff0c%5Cu5c31%5Cu964d%5Cu4ef7%5Cu4e86%5Cuff0c%5Cu5509%5Cuff01%5Cu4e0d%5Cu8fc7%5Cu786e%5Cu5b9e%5Cu633a%5Cu597d%5Cuff0c%5Cu633a%5Cu6d41%5Cu7545%5Cuff0c%5Cu8fd9%5Cu4e2a%5Cu4ef7%5Cu4f4d%5Cu5f88%5Cu503c%5Cu4e86%5Cu3002%22%2C%22root_id%22%3A0%2C%22parent_id%22%3A0%2C%22parent_ids%22%3A%22%22%2C%22content2%22%3Anull%2C%22content3%22%3Anull%2C%22content4%22%3Anull%2C%22content5%22%3Anull%2C%22ip%22%3A%"..., 8192) = 8192

getcwd("/data/webroot/phpsrc/zfilter-api-smzdm-com/api", 4096) = 47

open("/tmp/phpxdEznL", O_RDWR|O_CREAT|O_EXCL, 0600) = 5

write(5, "doc_arr=%5B%7B%22comment_id%22%3A18677409%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375099%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%221604147%22%2C%22content1%22%3A%22%40%5Cu81ea%5Cu7531%5Cu843d%5Cu4f53+%5Cu80fd%5Cu52a0%5Cu4f60%5Cu597d%5Cu53cb%5Cu5417%5Cuff1f++%22%2C%22root_id%22%3A0%2C%22parent_id%22%3A0%2C%22parent_ids%22%3A%22%22%2C%22content2%22%3Anull%2C%22content3%22%3Anull%2C%22content4%22%3Anull%2C%22content5%22%3Anull%2C%22ip%22%3A%22180.156.213.241%22%2C%22remote_ip%22%3A%22%22%2C%22user_agent%22%3A%22%5Cu4ec0%5Cu4e48%5Cu503c%5Cu5f97%5Cu4e70HD+2.2.4+rv%3A3+%28iPad%3B+iPhone+OS+8.1.1%3B+zh_CN%29%22%2C%22comment_from%22%3A%22%22%2C%22reply_from%22%3A0%2C%22creation_date%22%3A%222014-12-09+20%3A45%3A27%22%2C%22card_num%22%3A0%2C%22up_num%22%3A0%2C%22down_num%22%3A0%2C%22sort_v1%22%3A0%2C%22sort_v2%22%3A0%2C%22sort_v3%22%3A0%2C%22sort_v4%22%3A0%2C%22sort_v5%22%3A0%2C%22children_ids_1%22%3A%22%22%2C%22children_ids_2%22%3A%22%22%2C%22children_ids_3%22%3A%22%22%2C%22children_ids_4%22%3A%22%22%2C%22children_ids_5%22%3A%22%22%2C%22status%22%3A1%2C%22is_locked%22%3A0%2C%22comment_card_list%22%3A%5B%5D%2C%22report%22%3A%7B%22report_latest_time%22%3A%220000-00-00+00%3A00%3A00%22%2C%22report_count%22%3A0%2C%22report_count2%22%3A0%7D%2C%22have_read%22%3A1%2C%22origin_status%22%3A0%2C%22report_log_list%22%3A%5B%5D%2C%22user_info%22%3A%7B%22user_id%22%3A2772343%2C%22nickname%22%3A%22mini%5Cu6768%5Cu4e3d%22%7D%2C%22admin_log_list%22%3A%5B%7B%22id%22%3A7061455%2C%22editor_id%22%3A187%2C%22editor_name%22%3A%22wangtao%22%2C%22ctype%22%3A3%2C%22description%22%3A%22%22%2C%22op_module%22%3A%22%22%2C%22creation_date%22%3A%222015-03-10+11%3A18%3A26%22%7D%5D%2C%22admin_first_check_time%22%3A%222015-03-10+11%3A18%3A26%22%2C%22ploy_risk_type%22%3A%22EMPTY%22%2C%22ploy_description%22%3A%22%22%2C%22risk_type_tencent%22%3A%22EMPTY%22%2C%22tencent_moderation_description%22%3A%22%22%7D%2C%7B%22comment_id%22%3A18677417%2C%22channel_id%22%3A3%2C%22article_id%22%3A2375775%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%22%"..., 8192) = 8192

write(5, "249%22%2C%22user_agent%22%3A%22Mozilla%5C%2F5.0+%28Windows+NT+6.3%3B+WOW64%29+AppleWebKit%5C%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%5C%2F37.0.2062.120+Safari%5C%2F537.36%22%2C%22comment_from%22%3A%22%22%2C%22reply_from%22%3A0%2C%22creation_date%22%3A%222014-12-09+20%3A46%3A04%22%2C%22card_num%22%3A0%2C%22up_num%22%3A0%2C%22down_num%22%3A0%2C%22sort_v1%22%3A0%2C%22sort_v2%22%3A0%2C%22sort_v3%22%3A0%2C%22sort_v4%22%3A0%2C%22sort_v5%22%3A0%2C%22children_ids_1%22%3A%22%22%2C%22children_ids_2%22%3A%22%22%2C%22children_ids_3%22%3A%22%22%2C%22children_ids_4%22%3A%22%22%2C%22children_ids_5%22%3A%22%22%2C%22status%22%3A1%2C%22is_locked%22%3A0%2C%22comment_card_list%22%3A%5B%5D%2C%22report%22%3A%7B%22report_latest_time%22%3A%220000-00-00+00%3A00%3A00%22%2C%22report_count%22%3A0%2C%22report_count2%22%3A0%7D%2C%22have_read%22%3A1%2C%22origin_status%22%3A0%2C%22report_log_list%22%3A%5B%5D%2C%22user_info%22%3A%7B%22user_id%22%3A755583%2C%22nickname%22%3A%22%5Cu5b59%5Cu5c0f%5Cu9c81%22%7D%2C%22admin_log_list%22%3A%5B%7B%22id%22%3A7061445%2C%22editor_id%22%3A187%2C%22editor_name%22%3A%22wangtao%22%2C%22ctype%22%3A3%2C%22description%22%3A%22%22%2C%22op_module%22%3A%22%22%2C%22creation_date%22%3A%222015-03-10+11%3A18%3A26%22%7D%5D%2C%22admin_first_check_time%22%3A%222015-03-10+11%3A18%3A26%22%2C%22ploy_risk_type%22%3A%22EMPTY%22%2C%22ploy_description%22%3A%22%22%2C%22risk_type_tencent%22%3A%22EMPTY%22%2C%22tencent_moderation_description%22%3A%22%22%7D%2C%7B%22comment_id%22%3A18677457%2C%22channel_id%22%3A3%2C%22article_id%22%3A5309159%2C%22receive_user_id%22%3A0%2C%22at_user_ids%22%3A%22%22%2C%22content1%22%3A%22%5Cu521a%5Cu4e70%5Cu7684%5Cuff0c%5Cu5c31%5Cu964d%5Cu4ef7%5Cu4e86%5Cuff0c%5Cu5509%5Cuff01%5Cu4e0d%5Cu8fc7%5Cu786e%5Cu5b9e%5Cu633a%5Cu597d%5Cuff0c%5Cu633a%5Cu6d41%5Cu7545%5Cuff0c%5Cu8fd9%5Cu4e2a%5Cu4ef7%5Cu4f4d%5Cu5f88%5Cu503c%5Cu4e86%5Cu3002%22%2C%22root_id%22%3A0%2C%22parent_id%22%3A0%2C%22parent_ids%22%3A%22%22%2C%22content2%22%3Anull%2C%22content3%22%3Anull%2C%22content4%22%3Anull%2C%22content5%22%3Anull%2C%22ip%22%3A%"..., 8192) = 8192

最后再把所有数据从临时文件里读进内存，才开始进入PHP脚本的处理逻辑。

step 10

我高频抓取了一下/tmp目录，抓到1个临时文件看了一下内容：

while true; cp /tmp/php* .' done

1	while true; cp /tmp/php* .' done

发现内容就是/comment/bgm_bulk_index接口的POST body体，怀疑PHP-FPM遇到太大的POST体会走临时文件。

找到PHP源码SAPI.c文件，函数sapi_read_standard_form_data用于解析POST表单：

SAPI_API SAPI_POST_READER_FUNC(sapi_read_standard_form_data)
{
	...

	SG(request_info).request_body = php_stream_temp_create_ex(TEMP_STREAM_DEFAULT, SAPI_POST_BLOCK_SIZE, PG(upload_tmp_dir));

	if (sapi_module.read_post) {
		size_t read_bytes;

		for (;;) {
			char buffer[SAPI_POST_BLOCK_SIZE];

			read_bytes = sapi_read_post_block(buffer, SAPI_POST_BLOCK_SIZE);

			if (read_bytes > 0) {
				if (php_stream_write(SG(request_info).request_body, buffer, read_bytes) != read_bytes) {
				....

SAPI_API SAPI_POST_READER_FUNC(sapi_read_standard_form_data)

{

...

SG(request_info).request_body = php_stream_temp_create_ex(TEMP_STREAM_DEFAULT, SAPI_POST_BLOCK_SIZE, PG(upload_tmp_dir));

if (sapi_module.read_post) {

size_t read_bytes;

for (;;) {

char buffer[SAPI_POST_BLOCK_SIZE];

read_bytes = sapi_read_post_block(buffer, SAPI_POST_BLOCK_SIZE);

if (read_bytes > 0) {

if (php_stream_write(SG(request_info).request_body, buffer, read_bytes) != read_bytes) {

....

FPM处理POST表单时，大概会通过php_stream_temp_create_ex创建用于存放解析结果的request_body buffer，第2个参数是内存阈值，一旦超过内存阈值就会写临时文件；

然后循环解析数据写入这个Buffer，因为上述case的POST body总大小是百K，所以就超过了内存阈值，写了临时文件。

这个SAPI_POST_BLOCK_SIZE内存阈值是16进制定义的，实际就是16384：

#define SAPI_POST_BLOCK_SIZE 0x4000

1	#define SAPI_POST_BLOCK_SIZE 0x4000

要想提高它，只能改PHP-FPM源码重新编译。

step 11

最后，在高内存POD所在的node，进行一次slab dentry cache清理，观察POD内存是否下降：

echo 2 > /proc/sys/vm/drop_caches

1	echo 2 > /proc/sys/vm/drop_caches

POD内存从1.8G降到了346M，基本吻合了RSS实际占用，说明kmem部分被释放了。

step 12

虽然上述PHP接口频繁的创建临时文件，但是它请求结束也会删除掉，为什么slab cache能创建出数百万的dentry缓存对象呢？难道不应该删除后回收复用么？难道删除的文件表项也需要缓存起来，以便stat系统调用的时候可以立即返回文件不存在？还真不好说。

经过搜索（链接），发现内核的确会缓存删除文件的dentry：

负状态(negative)：

与目录项关联的索引节点不复存在，那是因为相应的磁盘索引节点已被删除，或者因为目录项对象是通过解析一个不存在文件的路径名创建的。目录项对象的d_inode字段被置为NULL，但该对象仍然被保存在目录项高速缓存中，以便后续对同一文件目录名的查找操作能够快速完成。术语“负状态”容易使人误解，因为根本不涉及任何负值。

因此，PHP频繁的新建+删除文件，就会不停的分配新的dentry对象，旧的dentry会越来越多直到系统没有更多内存可用才会开始淘汰缓存。

总结

这个案例告诉我们，docker默认将kmem算作cgroup的内存占用是比较坑的，哪个cgroup创建出来的slab对象就会被算到谁的头上，多多少少有点不合理。

所以，也许禁止docker将kmem统计在memory usage内，是不是一个更好的做法呢？网上有诸多讨论，就不赘述了。

如果文章帮助您解决了工作难题，您可以帮我点击屏幕上的任意广告，或者赞助少量费用来支持我的持续创作，谢谢~

K8S部分业务POD内存持续泄露问题

分析过程

step 1

step 2

step 3

step 4

step 5

step 6

step 7

step 8

step 9

step 10

step 11

step 12

总结

One thought on “K8S部分业务POD内存持续泄露问题”

发表回复取消回复

分析过程

step 1

step 2

step 3

step 4

step 5

step 6

step 7

step 8

step 9

step 10

step 11

step 12

总结

One thought on “K8S部分业务POD内存持续泄露问题”

发表回复 取消回复

发表回复取消回复