Calculate the performance of a multicore architecture? -
cal multicore architecture 10 computing cores: 2 processor cores , 8 coprocessors. each processor core can deliver 2.0 gflops, while each coprocessor can deliver 1.0 gflops. computing cores can perform calculation simultaneously. instruction can execute in either processor or coprocessor cores unless there explicit restrictions.
if 70% of dynamic instructions in application parallelizable, maximum average performance (flops) can in optimal situation? please note remaining 30% instructions can executed after execution of parallel 70% over.
consider application dynamic instructions can partitioned 6 groups (a, b, c, d, e, f) following dependency. example, --> c implies instructions in need completed before starting execution of instructions in c. each of first 4 groups (a, b, c , d) contains 20% of dynamic instructions whereas each of remaining 2 groups (e , f) contains 10% of dynamic instructions. instructions in each group must executed sequentially on same processor or coprocessor core. how schedule them on multicore architecture achieve best possible performance? maximum average performance (flops) now?
a(20%) --> c(20%) --> e(10%)-->f(10%) b(20%) --> d(20%) -->
for first part, need use amdahl's law, is:
max speed-up = 1/(1-p+p/n)
where p parallelizable part. n improvement factor in executing parallel portion.
(note amdahl's law formula can used first order estimates on other types of changes. e.g., given factor of n reduction in alu energy use , p fraction of energy used alu, 1 can find improvement in total energy use.)
in case, since serial portion executed on higher performance (2 gflops) processor core, n 6 ([8 coprocessor cores * 1 gflops/core + 2 processor cores * 2 gflops/core]/ 2 gflops/processor core).
a quick calculation shows max speed-up can 2.4 related 1 processor core. maximum flops therefore speed-up times speed if whole program executed serially on 1 processor core, i.e., 2.4 * 2 gflops = 4.8 gflops.
for second part, note there 2 independent instruction streams: -> c , b -> c. since system has 2 processor cores, both can executed in parallel on higher performance processor cores. furthermore, both have same amount of work (40% of total each stream), 1 same performance core complete @ same time.
since e depends on results both c , d, must started after both finish. e , f execute on processor core (which core arbitrary since e must wait tasks running on both processor cores complete).
as can see 80% of program (40% a+c; 40% b+d) can parallelized factor of 2 , 20% of program (e+f) serial. can plug numbers amdahl's law formula (p=0.8, n=2).
Comments
Post a Comment