Qualcomm® AI Engine Direct 使用手册（16）

- 6.2 模型准备

6.2 模型准备

量化支持
量化通过转换器接口支持并在转换时执行。启用量化和转换所需的唯一选项是 –input_list 选项，它为量化器提供给定模型所需的输入数据。上面列出的每个转换器都提供以下选项来启用和配置量化：

Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
--input_list INPUT_LIST
                      Path to a file specifying the input data. This file
                      should be a plain text file, containing one or more
                      absolute file paths per line. Each path is expected to
                      point to a binary file containing one input in the
                      "raw" format, ready to be consumed by the quantizer
                      without any further preprocessing. Multiple files per
                      line separated by spaces indicate multiple inputs to
                      the network. See documentation for more details. Must
                      be specified for quantization. All subsequent
                      quantization options are ignored when this is not
                      provided.
--param_quantizer PARAM_QUANTIZER
                      Optional parameter to indicate the weight/bias
                      quantizer to use. Must be followed by one of the
                      following options: "tf": Uses the real min/max of the
                      data and specified bitwidth (default) "enhanced": Uses
                      an algorithm useful for quantizing models with long
                      tails present in the weight distribution "adjusted":
                      Uses an adjusted min/max for computing the range,
                      particularly good for denoise models "symmetric":
                      Ensures min and max have the same absolute values
                      about zero. Data will be stored as int#_t data such
                      that the offset is always 0.
--act_quantizer ACT_QUANTIZER
                      Optional parameter to indicate the activation
                      quantizer to use. Must be followed by one of the
                      following options: "tf": Uses the real min/max of the
                      data and specified bitwidth (default) "enhanced": Uses
                      an algorithm useful for quantizing models with long
                      tails present in the weight distribution "adjusted":
                      Uses an adjusted min/max for computing the range,
                      particularly good for denoise models "symmetric":
                      Ensures min and max have the same absolute values
                      about zero. Data will be stored as int#_t data such
                      that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
                      Use this option to enable new optimization algorithms.
                      Usage is: --algorithms <algo_name1> ... The
                      available optimization algorithms are: "cle" - Cross
                      layer equalization includes a number of methods for
                      equalizing weights and biases across layers in order
                      to rectify imbalances that cause quantization errors.
--bias_bw BIAS_BW     Use the --bias_bw option to select the bitwidth to use
                      when quantizing the biases, either 8 (default) or 32.
--act_bw ACT_BW       Use the --act_bw option to select the bitwidth to use
                      when quantizing the activations, either 8 (default) or
                      16.
--weight_bw WEIGHT_BW
                      Use the --weight_bw option to select the bitwidth to
                      use when quantizing the weights, currently only 8 bit
                      (default) supported.
--float_bias_bw FLOAT_BIAS_BW
                      Use the --float_bias_bw option to select the bitwidth to
                      use when biases are in float, either 32 or 16.
--ignore_encodings    Use only quantizer generated encodings, ignoring any
                      user or model provided encodings. Note: Cannot use
                      --ignore_encodings with --quantization_overrides
--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
                      Use per-channel quantization for
                      convolution-based op weights. Note: This will replace
                      built-in model QAT encodings when used for a given
                      weight.Usage "--use_per_channel_quantization" to
                      enable or "--use_per_channel_quantization false"
                      (default) to disable
--use_per_row_quantization [USE_PER_ROW_QUANTIZATION [USE_PER_ROW_QUANTIZATION ...]]
                      Use this option to enable rowwise quantization of Matmul and
                      FullyConnected op. Usage "--use_per_row_quantization" to enable
                      or "--use_per_row_quantization false" (default) to
                      disable. This option may not be supported by all backends.

使用 TF 转换器转换和量化模型的基本命令行用法如下：

$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
                    -d <network_input_name> <dims>
                    --out_node <network_output_name>
                    -o <optional_output_path>
                    --allow_unconsumed_nodes  # optional, but most likely will be need for larger models
                    -p <optional_package_name> # Defaults to "qti.aisw"
                    --input_list input_list.txt

这将使用默认量化器和位宽（8 位用于激活、权重和偏差）来量化网络。

有关量化、选项和算法的更多详细信息，请参阅量化。

qnn-模型库-生成器

笔记
适合想要在 Windows-PC 下或具有 Windows 操作系统的 Qualcomm 设备上执行模型准备工具的开发人员。
qnn-model-lib-generator 位于 SDK 中的 /bin/x86_64-windows-msvc 下，供本机 Windows-PC 使用。
对于想要在 Windows 操作系统设备上运行 qnn-model-lib-generator 的开发人员，它位于 /bin/aarch64-windows-msvc 下。
qnn-model-lib-generator 将尝试使用您平台上的 CMake 命令来生成库。
请确保已安装编译工具（windows平台编译工具），以确保Windows操作系统中的CMake可行。

qnn -model-lib-generator工具将 QNN 模型源代码编译为特定目标的工件。

usage: qnn-model-lib-generator [-h] [-c <QNN_MODEL>.cpp] [-b <QNN_MODEL>.bin]
       [-t LIB_TARGETS ] [-l LIB_NAME] [-o OUTPUT_DIR]
Script compiles provided Qnn Model artifacts for specified targets.

Required argument(s):
 -c <QNN_MODEL>.cpp                    Filepath for the qnn model .cpp file

optional argument(s):
 -b <QNN_MODEL>.bin                    Filepath for the qnn model .bin file
                                       (Note: if not passed, runtime will fail if .cpp needs any items from a .bin file.)

 -t LIB_TARGETS                        Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang
 -l LIB_NAME                           Specifies the name to use for libraries. Default: uses name in <model.bin> if provided,
                                       else generic qnn_model.so
  -o OUTPUT_DIR                         Location for saving output libraries.

笔记
对于Windows用户，请使用python3执行该工具。

qnn-op-包生成器
qnn-op-package-generator工具用于使用描述包属性的 XML 配置文件生成 QNN op 包的框架代码。该工具将包创建为包含框架源代码和 makefile 的目录，可以编译这些文件以创建共享库对象。

usage: qnn-op-package-generator [-h] --config_path CONFIG_PATH [--debug]
                                [--output_path OUTPUT_PATH] [-f]

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --config_path CONFIG_PATH, -p CONFIG_PATH
                        The path to a config file that defines a QNN Op
                        package(s).

optional arguments:
  --debug               Returns debugging information from generating the
                        package
  --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        Path where the package should be saved
  -f, --force-generation
                        This option will delete the entire existing package
                        Note appropriate file permissions must be set to use
                        this option.
  --converter_op_package, -cop
                        Generates Converter Op Package skeleton code needed
                        by the output shape inference for converters

qnn-上下文-二进制生成器

qnn -context-binary-generator工具用于通过使用特定后端并使用qnn-model-lib-generator创建的模型库来创建上下文二进制文件。

usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
                                    --binary_file BINARY_FILE_NAME
                                    [--model_prefix MODEL_PREFIX]
                                    [--output_dir OUTPUT_DIRECTORY]
                                    [--op_packages ONE_OR_MORE_OP_PACKAGES]
                                    [--config_file CONFIG_FILE.json]
                                    [--profiling_level PROFILING_LEVEL]
                                    [--verbose] [--version] [--help]

REQUIRED ARGUMENTS:
-------------------
  --model                         <FILE>      Path to the <qnn_model_name.so> file containing a QNN network.
                                              To create a context binary with multiple graphs, use
                                              comma-separated list of model.so files. The syntax is
                                              <qnn_model_name_1.so>,<qnn_model_name_2.so>.

  --backend                       <FILE>      Path to a QNN backend .so library to create the context binary.

  --binary_file                   <VAL>       Name of the binary file to save the context binary to.
                                              Saved in the same path as --output_dir option with .bin
                                              as the binary file extension. If not provided, no backend binary
                                              is created.


OPTIONAL ARGUMENTS:
-------------------
  --model_prefix                              Function prefix to use when loading <qnn_model_name.so> file
                                              containing a QNN network. Default: QnnModel.

  --output_dir                    <DIR>       The directory to save output to. Defaults to ./output.

  --op_packages                   <VAL>       Provide a comma separated list of op packages
                                              and interface providers to register. The syntax is:
                                              op_package_path:interface_provider[,op_package_path:interface_provider...]

  --profiling_level               <VAL>       Enable profiling. Valid Values:
                                              1. basic:    captures execution and init time.
                                              2. detailed: in addition to basic, captures per Op timing
                                                  for execution.
                                              3. backend:  backend-specific profiling level specified
                                                  in the backend extension related JSON config file.

  --profiling_option              <VAL>       Set profiling options:
                                              1. optrace:    Generates an optrace of the run.

  --config_file                   <FILE>      Path to a JSON config file. The config file currently
                                              supports options related to backend extensions and
                                              context priority. Please refer to SDK documentation
                                              for more details.

  --enable_intermediate_outputs               Enable all intermediate nodes to be output along with
                                              default outputs in the saved context.
                                              Note that options --enable_intermediate_outputs and --set_output_tensors
                                              are mutually exclusive. Only one of the options can be specified at a time.

  --set_output_tensors            <VAL>       Provide a comma-separated list of intermediate output tensor names, for which the outputs
                                              will be written in addition to final graph output tensors.
                                              Note that options --enable_intermediate_outputs and --set_output_tensors
                                              are mutually exclusive. Only one of the options can be specified at a time.
                                              The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1

  --backend_binary                <VAL>       Name of the file to save a backend-specific context
                                              binary to.
                                              Saved in the same path as --output_dir option with .bin
                                              as the binary file extension.

  --log_level                                 Specifies max logging level to be set. Valid settings:
                                              "error", "warn", "info" and "verbose"

  --input_output_tensor_mem_type  <VAL>       Specifies mem type to be used for input and output tensors during graph creation.
                                              Valid settings:"raw" and "memhandle"

  --version                                   Print the QNN SDK version.

  --help                                      Show this help message.

有关更多详细信息和选项，请参阅qnn-net-run部分。–op_packages–config_file