Qualcomm® AI Engine Direct 使用手册(16)
Qualcomm® AI Engine Direct 使用手册(16)
6.2 模型准备
量化支持
量化通过转换器接口支持并在转换时执行。启用量化和转换所需的唯一选项是 –input_list 选项,它为量化器提供给定模型所需的输入数据。上面列出的每个转换器都提供以下选项来启用和配置量化:
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters
to use for quantization. These will override any
quantization data carried from conversion (eg TF fake
quantization) or calculated during the normal
quantization process. Format defined as per AIMET
specification.
--input_list INPUT_LIST
Path to a file specifying the input data. This file
should be a plain text file, containing one or more
absolute file paths per line. Each path is expected to
point to a binary file containing one input in the
"raw" format, ready to be consumed by the quantizer
without any further preprocessing. Multiple files per
line separated by spaces indicate multiple inputs to
the network. See documentation for more details. Must
be specified for quantization. All subsequent
quantization options are ignored when this is not
provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias
quantizer to use. Must be followed by one of the
following options: "tf": Uses the real min/max of the
data and specified bitwidth (default) "enhanced": Uses
an algorithm useful for quantizing models with long
tails present in the weight distribution "adjusted":
Uses an adjusted min/max for computing the range,
particularly good for denoise models "symmetric":
Ensures min and max have the same absolute values
about zero. Data will be stored as int#_t data such
that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation
quantizer to use. Must be followed by one of the
following options: "tf": Uses the real min/max of the
data and specified bitwidth (default) "enhanced": Uses
an algorithm useful for quantizing models with long
tails present in the weight distribution "adjusted":
Uses an adjusted min/max for computing the range,
particularly good for denoise models "symmetric":
Ensures min and max have the same absolute values
about zero. Data will be stored as int#_t data such
that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms.
Usage is: --algorithms <algo_name1> ... The
available optimization algorithms are: "cle" - Cross
layer equalization includes a number of methods for
equalizing weights and biases across layers in order
to rectify imbalances that cause quantization errors.
--bias_bw BIAS_BW Use the --bias_bw option to select the bitwidth to use
when quantizing the biases, either 8 (default) or 32.
--act_bw ACT_BW Use the --act_bw option to select the bitwidth to use
when quantizing the activations, either 8 (default) or
16.
--weight_bw WEIGHT_BW
Use the --weight_bw option to select the bitwidth to
use when quantizing the weights, currently only 8 bit
(default) supported.
--float_bias_bw FLOAT_BIAS_BW
Use the --float_bias_bw option to select the bitwidth to
use when biases are in float, either 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any
user or model provided encodings. Note: Cannot use
--ignore_encodings with --quantization_overrides
--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
Use per-channel quantization for
convolution-based op weights. Note: This will replace
built-in model QAT encodings when used for a given
weight.Usage "--use_per_channel_quantization" to
enable or "--use_per_channel_quantization false"
(default) to disable
--use_per_row_quantization [USE_PER_ROW_QUANTIZATION [USE_PER_ROW_QUANTIZATION ...]]
Use this option to enable rowwise quantization of Matmul and
FullyConnected op. Usage "--use_per_row_quantization" to enable
or "--use_per_row_quantization false" (default) to
disable. This option may not be supported by all backends.
使用 TF 转换器转换和量化模型的基本命令行用法如下:
$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
-d <network_input_name> <dims>
--out_node <network_output_name>
-o <optional_output_path>
--allow_unconsumed_nodes # optional, but most likely will be need for larger models
-p <optional_package_name> # Defaults to "qti.aisw"
--input_list input_list.txt
这将使用默认量化器和位宽(8 位用于激活、权重和偏差)来量化网络。
有关量化、选项和算法的更多详细信息,请参阅量化。
qnn-模型库-生成器
笔记
适合想要在 Windows-PC 下或具有 Windows 操作系统的 Qualcomm 设备上执行模型准备工具的开发人员。
qnn-model-lib-generator 位于 SDK 中的 /bin/x86_64-windows-msvc 下,供本机 Windows-PC 使用。
对于想要在 Windows 操作系统设备上运行 qnn-model-lib-generator 的开发人员,它位于 /bin/aarch64-windows-msvc 下。
qnn-model-lib-generator 将尝试使用您平台上的 CMake 命令来生成库。
请确保已安装编译工具(windows平台编译工具),以确保Windows操作系统中的CMake可行。
qnn -model-lib-generator工具将 QNN 模型源代码编译为特定目标的工件。
usage: qnn-model-lib-generator [-h] [-c <QNN_MODEL>.cpp] [-b <QNN_MODEL>.bin]
[-t LIB_TARGETS ] [-l LIB_NAME] [-o OUTPUT_DIR]
Script compiles provided Qnn Model artifacts for specified targets.
Required argument(s):
-c <QNN_MODEL>.cpp Filepath for the qnn model .cpp file
optional argument(s):
-b <QNN_MODEL>.bin Filepath for the qnn model .bin file
(Note: if not passed, runtime will fail if .cpp needs any items from a .bin file.)
-t LIB_TARGETS Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang
-l LIB_NAME Specifies the name to use for libraries. Default: uses name in <model.bin> if provided,
else generic qnn_model.so
-o OUTPUT_DIR Location for saving output libraries.
笔记
对于Windows用户,请使用python3执行该工具。
qnn-op-包生成器
qnn-op-package-generator工具用于使用描述包属性的 XML 配置文件生成 QNN op 包的框架代码。该工具将包创建为包含框架源代码和 makefile 的目录,可以编译这些文件以创建共享库对象。
usage: qnn-op-package-generator [-h] --config_path CONFIG_PATH [--debug]
[--output_path OUTPUT_PATH] [-f]
optional arguments:
-h, --help show this help message and exit
required arguments:
--config_path CONFIG_PATH, -p CONFIG_PATH
The path to a config file that defines a QNN Op
package(s).
optional arguments:
--debug Returns debugging information from generating the
package
--output_path OUTPUT_PATH, -o OUTPUT_PATH
Path where the package should be saved
-f, --force-generation
This option will delete the entire existing package
Note appropriate file permissions must be set to use
this option.
--converter_op_package, -cop
Generates Converter Op Package skeleton code needed
by the output shape inference for converters
qnn-上下文-二进制生成器
qnn -context-binary-generator工具用于通过使用特定后端并使用qnn-model-lib-generator创建的模型库来创建上下文二进制文件。
usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
--binary_file BINARY_FILE_NAME
[--model_prefix MODEL_PREFIX]
[--output_dir OUTPUT_DIRECTORY]
[--op_packages ONE_OR_MORE_OP_PACKAGES]
[--config_file CONFIG_FILE.json]
[--profiling_level PROFILING_LEVEL]
[--verbose] [--version] [--help]
REQUIRED ARGUMENTS:
-------------------
--model <FILE> Path to the <qnn_model_name.so> file containing a QNN network.
To create a context binary with multiple graphs, use
comma-separated list of model.so files. The syntax is
<qnn_model_name_1.so>,<qnn_model_name_2.so>.
--backend <FILE> Path to a QNN backend .so library to create the context binary.
--binary_file <VAL> Name of the binary file to save the context binary to.
Saved in the same path as --output_dir option with .bin
as the binary file extension. If not provided, no backend binary
is created.
OPTIONAL ARGUMENTS:
-------------------
--model_prefix Function prefix to use when loading <qnn_model_name.so> file
containing a QNN network. Default: QnnModel.
--output_dir <DIR> The directory to save output to. Defaults to ./output.
--op_packages <VAL> Provide a comma separated list of op packages
and interface providers to register. The syntax is:
op_package_path:interface_provider[,op_package_path:interface_provider...]
--profiling_level <VAL> Enable profiling. Valid Values:
1. basic: captures execution and init time.
2. detailed: in addition to basic, captures per Op timing
for execution.
3. backend: backend-specific profiling level specified
in the backend extension related JSON config file.
--profiling_option <VAL> Set profiling options:
1. optrace: Generates an optrace of the run.
--config_file <FILE> Path to a JSON config file. The config file currently
supports options related to backend extensions and
context priority. Please refer to SDK documentation
for more details.
--enable_intermediate_outputs Enable all intermediate nodes to be output along with
default outputs in the saved context.
Note that options --enable_intermediate_outputs and --set_output_tensors
are mutually exclusive. Only one of the options can be specified at a time.
--set_output_tensors <VAL> Provide a comma-separated list of intermediate output tensor names, for which the outputs
will be written in addition to final graph output tensors.
Note that options --enable_intermediate_outputs and --set_output_tensors
are mutually exclusive. Only one of the options can be specified at a time.
The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1
--backend_binary <VAL> Name of the file to save a backend-specific context
binary to.
Saved in the same path as --output_dir option with .bin
as the binary file extension.
--log_level Specifies max logging level to be set. Valid settings:
"error", "warn", "info" and "verbose"
--input_output_tensor_mem_type <VAL> Specifies mem type to be used for input and output tensors during graph creation.
Valid settings:"raw" and "memhandle"
--version Print the QNN SDK version.
--help Show this help message.
有关更多详细信息和选项,请参阅qnn-net-run部分。–op_packages–config_file