c++ - Is there a penalty for using static variables in C++11 -


in c++11, this:

const std::vector<int>& f() {     static const std::vector<int> x { 1, 2, 3 };     return x; } 

is thread-safe. however, there penalty calling function after first time (i.e. when initialized) due thread-safe guarantee? wondering if function slower 1 using global variable, because has acquire mutex check whether it's being initialized thread every time called, or something.

"the best intution ever had 'i should measure this.'" let's find out:

#include <atomic> #include <chrono> #include <cstdint> #include <iostream> #include <numeric> #include <vector>  namespace { class timer {     using hrc = std::chrono::high_resolution_clock;     hrc::time_point start;      static hrc::time_point now() {       // prevent memory operations reordering across       // time measurement. overkill, needs more       // research determine correct fencing.       std::atomic_thread_fence(std::memory_order_seq_cst);       auto t = hrc::now();       std::atomic_thread_fence(std::memory_order_seq_cst);       return t;     }  public:     timer() : start(now()) {}      hrc::duration elapsed() const {       return now() - start;     }      template <typename duration>     typename duration::rep elapsed() const {       return std::chrono::duration_cast<duration>(elapsed()).count();     }      template <typename rep, typename period>     rep elapsed() const {       return elapsed<std::chrono::duration<rep,period>>();     } };  const std::vector<int>& f() {     static const auto x = std::vector<int>{ 1, 2, 3 };     return x; }  static const auto y = std::vector<int>{ 1, 2, 3 }; const std::vector<int>& g() {     return y; }  const unsigned long long n_iterations = 500000000;  template <typename f> void test_one(const char* name, f f) {   f(); // first call outside timer.    using value_type = typename std::decay<decltype(f()[0])>::type;   std::cout << name << ": " << std::flush;    auto t = timer{};   auto sum = uint64_t{};   (auto = n_iterations; > 0; --i) {     const auto& vec = f();     sum += std::accumulate(begin(vec), end(vec), value_type{});   }   const auto elapsed = t.elapsed<std::chrono::milliseconds>();   std::cout << elapsed << " ms (" << sum << ")\n"; } } // anonymous namespace  int main() {   test_one("local static", f);   test_one("global static", g); } 

running @ coliru, local version 5e8 iterations in 4618 ms, global version in 4392 ms. yes, local version slower approximately 0.452 nanoseconds per iteration. although there's measurable difference, it's small impact observed performance in situations.


edit: interesting counterpoint, switching clang++ g++ changes result ordering. g++-compiled binary runs in 4418 ms (global) vs. 4181 ms (local) local faster 474 picoseconds per iteration. nonetheless reaffirm conclusion variance between 2 methods small.
edit 2: examining generated assembly, decided convert function pointers function objects better inlining. timing indirect calls through function pointers isn't characteristic of code in op. used program:

#include <atomic> #include <chrono> #include <cstdint> #include <iostream> #include <numeric> #include <vector>  namespace { class timer {     using hrc = std::chrono::high_resolution_clock;     hrc::time_point start;      static hrc::time_point now() {       // prevent memory operations reordering across       // time measurement. overkill.       std::atomic_thread_fence(std::memory_order_seq_cst);       auto t = hrc::now();       std::atomic_thread_fence(std::memory_order_seq_cst);       return t;     }  public:     timer() : start(now()) {}      hrc::duration elapsed() const {       return now() - start;     }      template <typename duration>     typename duration::rep elapsed() const {       return std::chrono::duration_cast<duration>(elapsed()).count();     }      template <typename rep, typename period>     rep elapsed() const {       return elapsed<std::chrono::duration<rep,period>>();     } };  class f { public:     const std::vector<int>& operator()() {         static const auto x = std::vector<int>{ 1, 2, 3 };         return x;     } };  class g {     static const std::vector<int> x; public:     const std::vector<int>& operator()() {         return x;     } };  const std::vector<int> g::x{ 1, 2, 3 };  const unsigned long long n_iterations = 500000000;  template <typename f> void test_one(const char* name, f f) {   f(); // first call outside timer.    using value_type = typename std::decay<decltype(f()[0])>::type;   std::cout << name << ": " << std::flush;    auto t = timer{};   auto sum = uint64_t{};   (auto = n_iterations; > 0; --i) {     const auto& vec = f();     sum += std::accumulate(begin(vec), end(vec), value_type{});   }   const auto elapsed = t.elapsed<std::chrono::milliseconds>();   std::cout << elapsed << " ms (" << sum << ")\n"; } } // anonymous namespace  int main() {   test_one("local static", f());   test_one("global static", g()); } 

not surprisingly, runtimes faster under both g++ (3803ms local, 2323ms global) , clang (4183ms local, 3253ms global). results affirm our intuition global technique should faster local, deltas of 2.96 nanoseconds (g++) , 1.86 nanoseconds (clang) per iteration.


Comments

Popular posts from this blog

html - Sizing a high-res image (~8MB) to display entirely in a small div (circular, diameter 100px) -

java - IntelliJ - No such instance method -

identifier - Is it possible for an html5 document to have two ids? -