Although it may be out of scope, it would be nice to have an example of computing 4bit and 8bit tensors, to save memory bandwidth.
Although it may be out of scope, it would be nice to have an example of computing 4bit and 8bit tensors, to save memory bandwidth.