Motivation
When benchmarking or running inference on QNN NPU (and other EPs), the runtime EP provider options can significantly affect latency — independently of the build-time quantization config. For example, on QNN HTP:
| EP option |
Affects compile |
Affects runtime |
Values |
| \htp_performance_mode\ |
❌ (no-op) |
✅ clock governor |
\�urst, \high_performance, \�alanced, \low_power, \default\ |
| \htp_graph_finalization_optimization_mode\ |
✅ changes compiled graph |
✅ |
\ |
Motivation
When benchmarking or running inference on QNN NPU (and other EPs), the runtime EP provider options can significantly affect latency — independently of the build-time quantization config. For example, on QNN HTP: