multithreading - When, why, and how to call thread.join() in Python? -
i have python threading code.
import threading def sum(value): sum = 0 in range(value+1): sum += print "i'm done %d - %d\n" % (value, sum) return sum r = range(500001, 500000*2, 100) ts = [] u in r: t = threading.thread(target=sum, args = (u,)) ts.append(t) t.start() t in ts: t.join()
executing this, have hundreds of threads working.
however, when move t.join() right after t.start(), have 2 threads working.
for u in r: t = threading.thread(target=sum, args = (u,)) ts.append(t) t.start() t.join()
i tested code not invoke t.join(), seems work fine?
then when, how, , how use thread.join()?
short answer: one:
for t in ts: t.join()
is idiomatic way start small number of threads. doing .join
means main thread waits until given thread finishes before proceeding in execution. after you've started of threads.
longer answer:
len(list(range(500001, 500000*2, 100))) out[1]: 5000
you're trying start 5000 threads @ once. it's miraculous computer still in 1 piece!
your method of .join
-ing in loop dispatches workers never going able have more 2 threads (i.e. 1 worker thread) going @ once. main thread has wait each worker thread finish before moving on next one. you've prevented computer-meltdown, code going way slower if you'd never used threading in first place!
at point i'd talk gil, i'll put aside moment. need limit thread creation reasonable limit (i.e. more one, less 5000) threadpool
. there various ways this. roll own - simple threading.semaphore
. use 3.2+'s concurrent.futures
package. use 3rd party solution. you, each going have different api can't discuss further.
obligatory gil discussion
cpython programmers have live gil. global interpreter lock, in short, means 1 thread can executing python bytecode @ once. means on processor-bound tasks (like adding bunch of numbers), threading not result in speed-up. in fact, overhead involved in setting , tearing down threads (not mention context switching) result in slowdown. threading better positioned provide gains on i/o bound tasks, such retrieving bunch of urls.
multiprocessing
, friends sidestep gil limitation by, well, using multiple processes. isn't free - data transfer between processes expensive, lot of care needs made not write workers depend on shared state.
Comments
Post a Comment