Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor