Competing speaker count estimation on the fusion of the spectral and spatial embedding space