multithreading - When, why, and how to call thread.join() in Python? -


i have python threading code.

import threading  def sum(value):     sum = 0     in range(value+1):         sum +=     print "i'm done %d - %d\n" % (value, sum)     return sum  r = range(500001, 500000*2, 100)  ts = [] u in r:     t = threading.thread(target=sum, args = (u,))     ts.append(t)     t.start()  t in ts:    t.join() 

executing this, have hundreds of threads working.

enter image description here

however, when move t.join() right after t.start(), have 2 threads working.

for u in r:     t = threading.thread(target=sum, args = (u,))     ts.append(t)     t.start()     t.join() 

enter image description here

i tested code not invoke t.join(), seems work fine?

then when, how, , how use thread.join()?

short answer: one:

for t in ts:    t.join() 

is idiomatic way start small number of threads. doing .join means main thread waits until given thread finishes before proceeding in execution. after you've started of threads.

longer answer:

len(list(range(500001, 500000*2, 100))) out[1]: 5000 

you're trying start 5000 threads @ once. it's miraculous computer still in 1 piece!

your method of .join-ing in loop dispatches workers never going able have more 2 threads (i.e. 1 worker thread) going @ once. main thread has wait each worker thread finish before moving on next one. you've prevented computer-meltdown, code going way slower if you'd never used threading in first place!

at point i'd talk gil, i'll put aside moment. need limit thread creation reasonable limit (i.e. more one, less 5000) threadpool. there various ways this. roll own - simple threading.semaphore. use 3.2+'s concurrent.futures package. use 3rd party solution. you, each going have different api can't discuss further.


obligatory gil discussion

cpython programmers have live gil. global interpreter lock, in short, means 1 thread can executing python bytecode @ once. means on processor-bound tasks (like adding bunch of numbers), threading not result in speed-up. in fact, overhead involved in setting , tearing down threads (not mention context switching) result in slowdown. threading better positioned provide gains on i/o bound tasks, such retrieving bunch of urls.

multiprocessing , friends sidestep gil limitation by, well, using multiple processes. isn't free - data transfer between processes expensive, lot of care needs made not write workers depend on shared state.


Comments

Popular posts from this blog

php - regexp cyrillic filename not matches -

c# - OpenXML hanging while writing elements -

sql - Select Query has unexpected multiple records (MS Access) -