c++ - openmp: recursive task slower than sequential even with depth limit -
i have big binary tree on want expensive calculations. want parallelize these methods using openmp , task pragmas.
as test, have parallelized freetree
function , have created big example test tree. in order prevent spawning many tasks, have limited tasks creation top 2 levels of tree. 4 tasks created.
below minimal working example:
#include <chrono> #include <iostream> class node { public: int data; node* l; node* r; node(node* left, node* right) : l(left), r(right) {} }; node* createrandomtree(int depth) { if (depth == 0) return new node(null, null); return new node(createrandomtree(depth - 1), createrandomtree(depth - 1)); } void freetree(node* tree) { if (tree == null) return; freetree(tree->l); freetree(tree->r); delete tree; } void freetreepar(node* tree, int n = 0) { if (tree == null) return; node *l = tree->l, *r = tree->r; if (n < 2) { #pragma omp task freetreepar(l, n + 1); #pragma omp task freetreepar(r, n + 1); } else { freetree(tree->l); freetree(tree->r); } // taskwait not necessary delete tree; } int main(int argc, char const *argv[]) { std::chrono::time_point<std::chrono::system_clock> start, end; node* tree = createrandomtree(22); start = std::chrono::system_clock::now(); #pragma omp parallel shared(tree) { #pragma omp single nowait freetreepar(tree); } end = std::chrono::system_clock::now(); std::chrono::duration<double> elapsed_seconds = end-start; std::time_t end_time = std::chrono::system_clock::to_time_t(end); std::cout << "finished computation @ " << std::ctime(&end_time) << "elapsed time: " << elapsed_seconds.count() << "s\n"; return 0; }
when run code, takes 0.38 seconds free tree. however, if call freetree(root)
directly, takes 0.2 seconds. though tree big (2^22 elements) , though in particular test case tasks of equal size, cannot performance increase using parallel methods.
am doing wrong? know how can improve code? thanks!
some tasks not parallelizable, because resources accessible 1 thread @ time (thread safety). , case of dynamic memory.
malloc/free thread safe.
=> lock performed around each malloc/free.
so, cannot improve kind of code.
Comments
Post a Comment