performance - High Memory Usage Using Python Multiprocessing -


i have seen couple of posts on memory usage using python multiprocessing module. questions don't seem answer problem have here. posting analysis hope 1 can me.

issue

i using multiprocessing perform tasks in parallel , noticed memory consumption worker processes grow indefinitely. have small standalone example should replicate notice.

import multiprocessing mp import time  def calculate(num):     l = [num*num num in range(num)]     s = sum(l)     del l       # delete lists  option     return s  if __name__ == "__main__":     pool = mp.pool(processes=2)     time.sleep(5)     print "launching calculation"     num_tasks = 1000     tasks =  [pool.apply_async(calculate,(i,)) in range(num_tasks)]     f in tasks:             print f.get(5)     print "calculation finished"     time.sleep(10)     print "closing  pool"     pool.close()     print "closed pool"     print "joining pool"     pool.join()     print "joined pool"     time.sleep(5) 

system

i running windows , use task manager monitor memory usage. running python 2.7.6.

observation

i have summarized memory consumption 2 worker processes below.

+---------------+----------------------+----------------------+ |  num_tasks    |  memory del     | memory without del   | |               | proc_1   | proc_2    | proc_1   | proc_2    | +---------------+----------------------+----------------------+ | 1000          | 4884     | 4694      | 4892     | 4952      | | 5000          | 5588     | 5596      | 6140     | 6268      | | 10000         | 6528     | 6580      | 6640     | 6644      | +---------------+----------------------+----------------------+ 

in table above, tried change number of tasks , observe memory consumed @ end of calculation , before join-ing pool. 'del' , 'without del' options whether un-comment or comment del l line inside calculate(num) function respectively. before calculation, memory consumption around 4400.

  1. it looks manually clearing out lists results in lower memory usage worker processes. thought garbage collector have taken care of this. there way force garbage collection?
  2. it puzzling increase in number of tasks, memory usage keeps growing in both cases. there way limit memory usage?

i have process based on example, , meant run long term. observe worker processes hogging lots of memory(~4gb) after overnight run. doing join release memory not option , trying figure out way without join-ing.

this seems little mysterious. has encountered similar? how can fix issue?

i did lot of research, , couldn't find solution fix problem per se. there decent work around prevents memory blowout small cost, worth on server side long running code.

the solution restart individual worker processes after fixed number of tasks. pool class in python takes maxtasksperchild argument. can specify maxtasksperchild=1000 limiting 1000 tasks run on each child process. after reaching maxtasksperchild number, pool refreshes child processes. using prudent number maximum tasks, 1 can balance max memory consumed, start cost associated restarting back-end process. pool construction done :

pool = mp.pool(processes=2,maxtasksperchild=1000) 

i putting full solution here can of use others!

import multiprocessing mp import time  def calculate(num):     l = [num*num num in range(num)]     s = sum(l)     del l       # delete lists  option     return s  if __name__ == "__main__":      # fix in following line #     pool = mp.pool(processes=2,maxtasksperchild=1000)      time.sleep(5)     print "launching calculation"     num_tasks = 1000     tasks =  [pool.apply_async(calculate,(i,)) in range(num_tasks)]     f in tasks:             print f.get(5)     print "calculation finished"     time.sleep(10)     print "closing  pool"     pool.close()     print "closed pool"     print "joining pool"     pool.join()     print "joined pool"     time.sleep(5) 

Comments

Popular posts from this blog

html - Sizing a high-res image (~8MB) to display entirely in a small div (circular, diameter 100px) -

java - IntelliJ - No such instance method -

identifier - Is it possible for an html5 document to have two ids? -