c++ - openmp: recursive task slower than sequential even with depth limit -


i have big binary tree on want expensive calculations. want parallelize these methods using openmp , task pragmas.

as test, have parallelized freetree function , have created big example test tree. in order prevent spawning many tasks, have limited tasks creation top 2 levels of tree. 4 tasks created.

below minimal working example:

#include <chrono> #include <iostream>  class node { public:     int data;     node* l;     node* r;      node(node* left, node* right) : l(left), r(right) {} };  node* createrandomtree(int depth) {     if (depth == 0)         return new node(null, null);      return new node(createrandomtree(depth - 1), createrandomtree(depth - 1)); }  void freetree(node* tree) {     if (tree == null) return;      freetree(tree->l);     freetree(tree->r);      delete tree; }  void freetreepar(node* tree, int n = 0) {     if (tree == null) return;      node *l = tree->l, *r = tree->r;      if (n < 2) {         #pragma omp task         freetreepar(l, n + 1);         #pragma omp task         freetreepar(r, n + 1);     } else {         freetree(tree->l);         freetree(tree->r);     }      // taskwait not necessary     delete tree; }  int main(int argc, char const *argv[]) {     std::chrono::time_point<std::chrono::system_clock> start, end;      node* tree = createrandomtree(22);     start = std::chrono::system_clock::now();      #pragma omp parallel shared(tree)     {         #pragma omp single nowait         freetreepar(tree);     }     end = std::chrono::system_clock::now();      std::chrono::duration<double> elapsed_seconds = end-start;     std::time_t end_time = std::chrono::system_clock::to_time_t(end);      std::cout << "finished computation @ " << std::ctime(&end_time)               << "elapsed time: " << elapsed_seconds.count() << "s\n";      return 0; } 

when run code, takes 0.38 seconds free tree. however, if call freetree(root) directly, takes 0.2 seconds. though tree big (2^22 elements) , though in particular test case tasks of equal size, cannot performance increase using parallel methods.

am doing wrong? know how can improve code? thanks!

some tasks not parallelizable, because resources accessible 1 thread @ time (thread safety). , case of dynamic memory.

malloc/free thread safe.

=> lock performed around each malloc/free.

so, cannot improve kind of code.


Comments

Popular posts from this blog

html - Sizing a high-res image (~8MB) to display entirely in a small div (circular, diameter 100px) -

java - IntelliJ - No such instance method -

identifier - Is it possible for an html5 document to have two ids? -