An adaptive clock management scheme exploiting instruction-based dynamic timing slack for general-purpose graphic processor unit with deep pipeline and out-of-order execution