performance - High Memory Usage Using Python Multiprocessing -
i have seen couple of posts on memory usage using python multiprocessing module. questions don't seem answer problem have here. posting analysis hope 1 can me.
issue
i using multiprocessing perform tasks in parallel , noticed memory consumption worker processes grow indefinitely. have small standalone example should replicate notice.
import multiprocessing mp import time def calculate(num): l = [num*num num in range(num)] s = sum(l) del l # delete lists option return s if __name__ == "__main__": pool = mp.pool(processes=2) time.sleep(5) print "launching calculation" num_tasks = 1000 tasks = [pool.apply_async(calculate,(i,)) in range(num_tasks)] f in tasks: print f.get(5) print "calculation finished" time.sleep(10) print "closing pool" pool.close() print "closed pool" print "joining pool" pool.join() print "joined pool" time.sleep(5)
system
i running windows , use task manager monitor memory usage. running python 2.7.6.
observation
i have summarized memory consumption 2 worker processes below.
+---------------+----------------------+----------------------+ | num_tasks | memory del | memory without del | | | proc_1 | proc_2 | proc_1 | proc_2 | +---------------+----------------------+----------------------+ | 1000 | 4884 | 4694 | 4892 | 4952 | | 5000 | 5588 | 5596 | 6140 | 6268 | | 10000 | 6528 | 6580 | 6640 | 6644 | +---------------+----------------------+----------------------+
in table above, tried change number of tasks , observe memory consumed @ end of calculation , before join
-ing pool
. 'del' , 'without del' options whether un-comment or comment del l
line inside calculate(num)
function respectively. before calculation, memory consumption around 4400.
- it looks manually clearing out lists results in lower memory usage worker processes. thought garbage collector have taken care of this. there way force garbage collection?
- it puzzling increase in number of tasks, memory usage keeps growing in both cases. there way limit memory usage?
i have process based on example, , meant run long term. observe worker processes hogging lots of memory(~4gb) after overnight run. doing join
release memory not option , trying figure out way without join
-ing.
this seems little mysterious. has encountered similar? how can fix issue?
i did lot of research, , couldn't find solution fix problem per se. there decent work around prevents memory blowout small cost, worth on server side long running code.
the solution restart individual worker processes after fixed number of tasks. pool
class in python takes maxtasksperchild
argument. can specify maxtasksperchild=1000
limiting 1000 tasks run on each child process. after reaching maxtasksperchild
number, pool refreshes child processes. using prudent number maximum tasks, 1 can balance max memory consumed, start cost associated restarting back-end process. pool
construction done :
pool = mp.pool(processes=2,maxtasksperchild=1000)
i putting full solution here can of use others!
import multiprocessing mp import time def calculate(num): l = [num*num num in range(num)] s = sum(l) del l # delete lists option return s if __name__ == "__main__": # fix in following line # pool = mp.pool(processes=2,maxtasksperchild=1000) time.sleep(5) print "launching calculation" num_tasks = 1000 tasks = [pool.apply_async(calculate,(i,)) in range(num_tasks)] f in tasks: print f.get(5) print "calculation finished" time.sleep(10) print "closing pool" pool.close() print "closed pool" print "joining pool" pool.join() print "joined pool" time.sleep(5)
Comments
Post a Comment