NCPU: An embedded neural CPU architecture on resource-constrained low power devices for real-time end-to-end performance