K8S部分业务POD内存持续泄露问题 – 后记

On 2020年7月8日By yuer

在上一篇《K8S部分业务POD内存持续泄露问题》博客中，我分析了1种POD持续内存泄漏的场景，如果你没读过的话建议先看一下。

这篇博客将分享另外2个场景POD持续内存泄露的场景，原因同样是slab dentry cache持续走高，相信可以帮助到很多迷茫的朋友们。

场景1：nginx反向代理

该问题发生在nginx+php-fpm技术栈，但不限于此场景。

相关的nginx配置如下：

    location / {
        index index.php;

        if (!-e $request_filename){
            rewrite ^(.*)$ /index.php last;
            break;
        }
    }

    location ~ \.php$ {
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_connect_timeout 300;
        fastcgi_send_timeout 300;
        fastcgi_read_timeout 300;
        fastcgi_buffer_size 128k;
        fastcgi_buffers 32 32k;
        include fastcgi_params;
        port_in_redirect off;
        fastcgi_pass unix:/var/run/php/phpfpm.sock;
    }

location / {

index index.php;

if (!-e $request_filename){

rewrite ^(.*)$ /index.php last;

break;

}

location ~ \.php$ {

fastcgi_index index.php;

fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

fastcgi_connect_timeout 300;

fastcgi_send_timeout 300;

fastcgi_read_timeout 300;

fastcgi_buffer_size 128k;

fastcgi_buffers 32 32k;

include fastcgi_params;

port_in_redirect off;

fastcgi_pass unix:/var/run/php/phpfpm.sock;

}

当访问某URL时，其匹配逻辑如下：

匹配location /，通过if检查是否存在，如果存在就返回静态文件，否则rewrite到index.php重新匹配。
匹配location .php，反向代理请求给PHP-FPM。

也就是说，每个URL都会去磁盘上读一次文件，无论文件是否存在。

这就意味着，有多少种URL，就有多少个slab dentry cache。

当遇到URL美化的场景就有问题了，比如：文章ID是URL的一部分，

/articles/detail/134543

/articles/detail/881929

这种URL的规模是无法估量的，经过nginx先查一次磁盘缓存到dentry，然后再转发给php-fpm进行处理，就必然导致千百万的dentry对象被缓存下来。

类似场景大家可以自行延伸，比如try_files指令也是先找磁盘文件，一样会坑。

场景2：web框架

这个case比较个性化，但也作为一种思路开拓提供给大家。

当我关闭了nginx反向代理先走文件的配置后，发现dentry仍旧在狂涨，因此我就进一步仔细看了一下php-fpm的strace日志。

发现php-fpm每次请求都会去web框架下的cache目录找一个md5样子的文件，难道web框架开启了cache特性？

翻了一下框架代码，发现这个框架实现的确有点问题，在没有开启cache特性的情况下仍旧会去cache目录尝试加载一下缓存文件：

	function _display_cache(&$CFG, &$URI)
	{
		$cache_path = ($CFG->item('cache_path') == '') ? APPPATH.'cache/' : $CFG->item('cache_path');

		// Build the file path.  The file name is an MD5 hash of the full URI
		$uri =	$CFG->item('base_url').
				$CFG->item('index_page').
				$URI->uri_string;

		$filepath = $cache_path.md5($uri);

		if ( ! @file_exists($filepath))
		{
			return FALSE;
		}

function _display_cache(&$CFG, &$URI)

{

$cache_path = ($CFG->item('cache_path') == '') ? APPPATH.'cache/' : $CFG->item('cache_path');

// Build the file path. The file name is an MD5 hash of the full URI

$uri = $CFG->item('base_url').

$CFG->item('index_page').

$URI->uri_string;

$filepath = $cache_path.md5($uri);

if ( ! @file_exists($filepath))

{

return FALSE;

}

因为文件名是URL的MD5，这就导致因为query string的不同而千变万化，即每次请求都将创建1个dentry cache。

总结

当我把上述发现的问题全部屏蔽之后，再次观察cgroup的slab dentry cache已经彻底稳定，基本不会改变。

所以问题又回来了，能不能关闭cgroup kmem counting来避免slab内存计入cgroup呢？是否有风险呢？有时间我再写新的博客来分享经验吧。

如果文章帮助您解决了工作难题，您可以帮我点击屏幕上的任意广告，或者赞助少量费用来支持我的持续创作，谢谢~

场景1：nginx反向代理

场景2：web框架

总结

发表回复 取消回复

发表回复取消回复