Thumbnail request causing OOM in PHP
Sentry Issue: ENGINE-HF
ErrorException: Allowed memory size of 134217728 bytes exhausted (tried to allocate 116526568 bytes)
File "/lib/elgglib.php", line 947, in fatalErrorShutdownHandler
\Sentry\captureLastError();
File "[internal]", line 0
added scoped labels
- Developer
Example request:
https://www.minds.com/fs/v1/thumbnail/1026508015016828928/xlarge
Link to media page:
https://www.minds.com/media/1026508015016828928
- Developer
Offending section of code in
/api/v1/thumbnail.php
:try { $finfo = new \finfo(FILEINFO_MIME); $contentType = $finfo->buffer($contents) ?: 'image/jpeg'; } catch (\Exception $e) { error_log($e); $contentType = 'image/jpeg'; }
Issue appears related to
finfo
under PHP 7.3.Running a referenced test script show the clear performance degradation between our 7.1 and 7.3 builds when calling finfo.
- Developer
While I'm not able to reproduce the OOM condition in our sandbox, I am able to demonstrate the performance degradation.
- Developer
PHP 7.1
/var/www/Minds/engine # ITERS=100 php test.php Took: 0.094574928283691 /var/www/Minds/engine # ITERS=1000 php test.php Took: 0.91444492340088
PHP 7.3
/var/www/Minds/engine # ITERS=100 php test.php Took: 0.42086601257324 /var/www/Minds/engine # ITERS=1000 php test.php Took: 4.215017080307
- Developer
Likely this is a good place to start looking for the OOM issue.
- Developer
The referenced performance relates to
finfo::file()
whereas we are usingfinfo::buffer()
.Using
finfo::buffer()
does not cause a slow down.Continuing by instrumenting some mock code to show memory usage before and after
finfo::buffer()
- Developer
There is also the possibility that something prior to this call is using too much memory and this final call tips it over the limit where it would not have done previously.
- Developer
The use of
$finfo->buffer()
to get the mime type has an impact on the memory overhead of the script. The difference between PHP 7.1 and PHP 7.3 is minimal.I suspect the real issue is that the entire image file is read into memory from S3 and then passed around before being written to the HTTP response.
After a couple of requests these values are stabilised:
finfo usage: 3585240 after usage: 3782552 total usage: 3782552
finfo
is the memory delta just before and after the finfo calls.total
is the delta from the start of request to the endafter
is the memory usage at the end of the scriptfinfo
call accounts for most of the total memory usage of the script though is probably just when the memory is assigned for the file. - Developer
With and S3 backend, we already set the mime type to the metadata using the same
finfo
call when the file was written so we shouldn't need to repeat this process again.Looking to see if we can extend the S3 calls to fetch the metadata...
- Developer
Filestore interface does not provide method for reading mime type.
Attempts to reduce the memory usage of the API request in sandbox by just reading the header did not make much difference.
- Developer
The memory delta is an unreliable measure. I edited out the
finfo
call just returning a string constant for the mime type and it still recorded the same level of memory delta without touching the file content buffer. - Developer
libmagic version is different between PHP versions
7.3:
fileinfo fileinfo support => enabled libmagic => 533
7.1:
fileinfo fileinfo support => enabled version => 1.0.5 libmagic => 522
- Developer
Alternatively we are using chunked file sending so could do chunked reading of file which would require a smaller memory buffer but more round trips to S3???
- Developer
While the abstract
ElggFilestore
does support offset and length reads, we are using a modified theService
interface andS3
service has noseek
implementation.The S3 lib does have a stream interface that could be used. https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-stream-wrapper.html
added 7h of time spent at 2019-11-12
- Developer
Got the source file referenced in first comment.
I can now recreate the issue during upload of the source image.
2019/11/13 09:27:17 [warn] 8#8: *1137 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000005, client: 192.168.13.110, server: _, request: "POST /api/v1/media HTTP/1.1", host: "fix-1119-image-regress.minds.io", referrer: "https://fix-1119-image-regress.minds.io/newsfeed/subscriptions" 2019/11/13 09:27:33 [error] 8#8: *1137 FastCGI sent in stderr: "PHP message: PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 146703512 bytes) in /var/www/Minds/engine/Core/Storage/Services/S3.php on line 67" while reading response header from upstream, client: 192.168.13.110, server: _, request: "POST /api/v1/media HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "fix-1119-image-regress.minds.io", referrer: "https://fix-1119-image-regress.minds.io/newsfeed/subscriptions" 2019/11/13 09:27:33 [error] 8#8: *1137 FastCGI sent in stderr: "PHP message: Fatal error: Array<br /> (<br /> [type] => 1<br /> [message] => Allowed memory size of 134217728 bytes exhausted (tried to allocate 146703512 bytes)<br /> [file] => /var/www/Minds/engine/Core/Storage/Services/S3.php<br /> [line] => 67<br /> )<br /> PHP message: PHP WARNING: 2019-11-13 09:27:33 (UTC): "Cannot modify header information - headers already sent by (output started at /var/www/Minds/engine/Core/Storage/Services/S3.php:67)" in file /var/www/Minds/engine/lib/elgglib.php (line 942)PHP message: PHP WARNING: 2019-11-13 09:27:33 (UTC): "file_get_contents(/var/www/Minds/errors/500.html): failed to open stream: No such file or directory" in file /var/www/Minds/engine/lib/elgglib.php (line 944)" while reading upstream, client: 192.168.13.110, server: _, request: "POST /api/v1/media HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "fix-1119-image-regress.minds.io", referrer: "https://fix-1119-image-regress.minds.io/newsfeed/subscriptions"
Edited by Guy Thouret - Developer
Offending code for previous error message:
public function write($data) { //TODO: check mime performance here $finfo = new \finfo(FILEINFO_MIME_TYPE); $mimeType = $finfo->buffer($data); $write = $this->s3->putObject([ // 'ACL' => 'public-read', 'Bucket' => Config::_()->aws['bucket'], 'Key' => $this->filepath, 'ContentType' => $mimeType, 'ContentLength' => strlen($data), //'ContentLength' => filesize($file), 'Body' => $data, ]);
Edited by Guy Thouret - Developer
In total isolation:
<?php $startMem = memory_get_usage(); $finfo = new finfo(FILEINFO_MIME); $buffer = file_get_contents('/Users/guythouret/Downloads/master.gif'); $type = $finfo->buffer($buffer); $endMem = memory_get_usage(); echo 'type: ' . print_r($type, true). PHP_EOL; $usage = $endMem - $startMem; echo 'usage: ' . $usage . PHP_EOL;
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 146703512 bytes) in /Users/guythouret/Library/Preferences/PhpStorm2019.2/scratches/scratch.php on line 5
Without the call to
$finfo->buffer($buffer)
it outputs:usage: 18342712
which is not much more than the size of the source file.Edited by Guy Thouret - Developer
finfo exceeds maximum script limit of 128M when given a 12.5M file.
dd if=/dev/zero of=/tmp/12.5M bs=1024 count=12500
<?php $finfo = new finfo(FILEINFO_MIME); $buffer = file_get_contents('/tmp/12.5M'); $type = $finfo->buffer($buffer);
When using a 12M file, it returns ok.
Edited by Guy Thouret - Developer
Doubling the limit to 256M with
ini_set('memory_limit', '256M');
only let the file size go up to 27M. the overhead offinfo
is not a fixed value and seems in proportion to the size of the buffer passed to it. - Developer
This is resolved by only passing the header bytes to finfo to keep it within limits.
<?php $finfo = new finfo(FILEINFO_MIME); $buffer = file_get_contents('/tmp/100M'); $header = substr($buffer, 0, 16); $type = $finfo->buffer($buffer);
Files in excess of 100M are acceptable while still returning the correct mime type.
added scoped label and automatically removed label
- Developer
With the fix up, PHP is bombing out with the following:
[13-Nov-2019 11:54:45] WARNING: [pool www] child 16, script '/var/www/Minds/engine/index.php' (request: "POST /index.php") executing too slow (1.062722 sec), logging [13-Nov-2019 11:54:45] NOTICE: child 16 stopped for tracing [13-Nov-2019 11:54:45] NOTICE: about to trace 16 [13-Nov-2019 11:54:45] ERROR: failed to open /proc/16/mem: Permission denied (13) [13-Nov-2019 11:54:45] NOTICE: finished trace of 16 127.0.0.1 - 13/Nov/2019:11:54:47 +0000 "POST /index.php" 200 [13-Nov-2019 11:54:47] WARNING: [pool www] child 16 exited on signal 9 (SIGKILL) after 1762.599356 seconds from start
- Developer
This is likely related to resource constraint in the sandbox environment. Caputured the following from
dmesg
within the php-fpm pod when a failing request is made:[11639952.932291] php-fpm invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=984 [11639952.940242] php-fpm cpuset=5fd6f4700aa2e4b35f3323fdcce4dda438dae29068fc04f280456492f17da530 mems_allowed=0 [11639952.947273] CPU: 6 PID: 15268 Comm: php-fpm Not tainted 4.14.123-111.109.amzn2.x86_64 #1 [11639952.953651] Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017 [11639952.958004] Call Trace: [11639952.960684] dump_stack+0x5c/0x82 [11639952.963699] dump_header+0x94/0x229 [11639952.966804] oom_kill_process+0x223/0x420 [11639952.970109] out_of_memory+0x2af/0x4d0 [11639952.973296] mem_cgroup_out_of_memory+0x49/0x80 [11639952.976832] mem_cgroup_oom_synchronize+0x2ed/0x330 [11639952.980515] ? mem_cgroup_css_online+0x30/0x30 [11639952.984270] pagefault_out_of_memory+0x32/0x77 [11639952.987744] __do_page_fault+0x4b4/0x4c0 [11639952.991031] ? async_page_fault+0x2f/0x50 [11639952.994324] async_page_fault+0x45/0x50 [11639952.997547] RIP: 00a0: (null) [11639953.000793] RSP: 157ceec0:000055ca157d4b20 EFLAGS: 00000001 [11639953.000868] Task in /kubepods/burstable/poda2a57466-060f-11ea-84fb-02f67eb6b970/5fd6f4700aa2e4b35f3323fdcce4dda438dae29068fc04f280456492f17da530 killed as a result of limit of /kubepods/burstable/poda2a57466-060f-11ea-84fb-02f67eb6b970/5fd6f4700aa2e4b35f3323fdcce4dda438dae29068fc04f280456492f17da530 [11639953.021460] memory: usage 1048576kB, limit 1048576kB, failcnt 0 [11639953.025539] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 3957 [11639953.029922] kmem: usage 10492kB, limit 9007199254740988kB, failcnt 0 [11639953.034225] Memory cgroup stats for /kubepods/burstable/poda2a57466-060f-11ea-84fb-02f67eb6b970/5fd6f4700aa2e4b35f3323fdcce4dda438dae29068fc04f280456492f17da530: cache:31872KB rss:1006212KB rss_huge:0KB shmem:31864KB mapped_file:31868KB dirty:0KB writeback:0KB swap:0KB inactive_anon:31864KB active_anon:1006212KB inactive_file:4KB active_file:0KB unevictable:0KB [11639953.054422] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [11639953.061224] [15182] 0 15182 542863 10458 36 4 0 984 php-fpm [11639953.067852] [15263] 82 15263 556973 12098 52 4 0 984 php-fpm [11639953.074473] [15264] 82 15264 556965 11612 52 4 0 984 php-fpm [11639953.081112] [15265] 82 15265 556965 11800 52 4 0 984 php-fpm [11639953.087739] [15267] 82 15267 556963 11560 52 4 0 984 php-fpm [11639953.094425] [15268] 82 15268 795258 247747 519 5 0 984 php-fpm [11639953.101099] [15269] 82 15269 556965 11572 52 4 0 984 php-fpm [11639953.107733] [15270] 82 15270 556987 12515 52 4 0 984 php-fpm [11639953.114362] [15271] 82 15271 556965 11333 52 4 0 984 php-fpm [11639953.120977] [15272] 82 15272 556965 11577 52 4 0 984 php-fpm [11639953.127627] [15279] 82 15279 556985 11924 52 4 0 984 php-fpm [11639953.134264] [15864] 0 15864 437 292 5 3 0 984 sh [11639953.140718] Memory cgroup out of memory: Kill process 15268 (php-fpm) score 1930 or sacrifice child [11639953.147494] Killed process 15268 (php-fpm) total-vm:3181032kB, anon-rss:945016kB, file-rss:15400kB, shmem-rss:30572kB [11639953.205124] oom_reaper: reaped process 15268 (php-fpm), now anon-rss:0kB, file-rss:0kB, shmem-rss:30572kB
- Developer
Kubernetes killed my proc
- Developer
Also confirmed this issue occurs under PHP 7.1 in sandbox also.
- Developer
Unable to test the fix in sandbox because of the sandbox config.
- Developer
helm upgrade --reuse-values --recreate-pods --set phpfpm.max_children=4 --set phpfpm.max_requests=2000 --set phpfpm.resources.limits.memory=2Gi fix-1120-thumbnail-oom ./minds
Sandbox needs a config change or the pod hard memory limit is reached and process is killed. Process needs to be able to flex up and use memory when needed.
Edited by Guy Thouret - Developer
New error now when the breaking gif is uploaded:
2019/11/14 11:44:25 [warn] 8#8: *244 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000003, client: 192.168.13.110, server: _, request: "POST /api/v1/media HTTP/1.1", host: "fix-1120-thumbnail-oom.minds.io", referrer: "https://fix-1120-thumbnail-oom.minds.io/newsfeed/subscriptions" 2019/11/14 11:44:54 [error] 8#8: *244 FastCGI sent in stderr: "PHP message: [Search/Events/search:index:queue] RuntimeException: Instances of Aws\S3\S3Client cannot be serialized" while reading response header from upstream, client: 192.168.13.110, server: _, request: "POST /api/v1/media HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "fix-1120-thumbnail-oom.minds.io", referrer: "https://fix-1120-thumbnail-oom.minds.io/newsfeed/subscriptions"
- Developer
api/v1/media
is now causing memory limit to be hit. During upload, the file is cached in nginx then passed to the PHP script. I observed memory ballooning over the 2G hard limit before the PHP process was terminated because it exceeded limits. - Developer
Hitting that limit because of the way the file is uploaded. Nginx caches it to a file, then passes it to PHP, which also caches it to a file (in tmpfs i.e. RAM). If the PHP request fails, it doesn't get cleaned up. When I checked, there were 200M+ of failed upload files sitting there eating memory.
What is also weird is that there were two copies of the file cached by PHP for one request.
/tmp # ls -l total 50616 drwxrwxrwx 2 root root 6 Nov 13 12:16 minds-cache drwxr-xr-x 5 root root 47 Nov 6 11:07 pear -rw------- 1 www-data www-data 18337938 Nov 14 13:21 phpfahFGm -rw------- 1 www-data www-data 18337938 Nov 14 13:21 phpmNAAOf
Log output for the upload only accounts for one tmp file:
2019/11/14 13:22:00 [error] 8#8: *859 FastCGI sent in stderr: "PHP message: upload: Array ( [file] => Array ( [name] => master.gif [type] => image/gif [tmp_name] => /tmp/phpfahFGm [error] => 0 [size] => 18337938 ) )
- Developer
I suspect the second tmp file is for the file being uploaded to S3.
- Developer
Partial Stack Trace before it truncated...
PHP message: [Search/Events/search:index:queue] RuntimeException: Instances of Aws\S3\S3Client cannot be serialized #0 [internal function]: Aws\AwsClient->__sleep() #1 /var/www/Minds/engine/Core/Search/Queue.php(24): serialize(Object(Minds\Entities\Image)) #2 /var/www/Minds/engine/Core/Search/Events.php(62): Minds\Core\Search\Queue->queue(Object(Minds\Entities\Image)) #3 [internal function]: Minds\Core\Search\Events->Minds\Core\Search\{closure}(Object(Minds\Core\Events\Event)) #4 /var/www/Minds/engine/Core/Events/Dispatcher.php(145): call_user_func_array(Object(Closure), Array) #5 /var/www/Minds/engine/Core/Events/EventsDispatcher.php(49): Minds\Core\Events\Dispatcher::trigger('search:index:qu...', 'all', Array, NULL) #6 /var/www/Minds/engine/Core/Search/Events.php(41): Minds\Core\Events\EventsDispatcher->trigger('search:index:qu...', 'all', Array) #7 [internal function]: Minds\Core\Search\Events->Minds\Core\Search\{closure}(Object(Minds\Core\Events\Event)) #8 /var/www/Minds/engine/Core/Events/Dispatcher.php(145): call_user_func_array(Object(Closure), Array) #9 /var/www/Minds/engine/Core/Events/EventsDispatcher.php(49): Minds\Core\Events\Dispatcher::trigger('search:index', 'object:image', Array, NULL) #10 /var/www/Minds/engine/Core/Search/Events.php(209): Minds\Core\Events\EventsDispatcher->trigger('search:index', 'object:image', Array) #11 [internal function]: Minds\Core\Search\Events->Minds\Core\Search\{closure}('update', 'object', Object(Minds\Entities\Image)) #12 /var/www/Minds/engine/Core/Events/Dis
Edited by Guy Thouret - Developer
Serializing a
Minds\Entities\Image
object that contains an S3Client object???