当前位置：首页 > news >正文

成都网站asp access源码购买修改wordpress 企业官网

news 2025/12/20 20:17:29

成都网站asp access源码购买修改,wordpress 企业官网,织梦做的网站图片路径在哪里,网站建设费用申请报告TVM#xff1a;通过Python接口#xff08;AutoTVM#xff09;来编译和优化模型上次我们已经介绍了如何从源码编译安装 tvm#xff0c;本文我们将介绍在本机中使用 tvm Python 接口来编译优化模型的一个demo。 TVM 是一个深度学习编译器框架#xff0c;有许多不同的模块…TVM通过Python接口AutoTVM来编译和优化模型上次我们已经介绍了如何从源码编译安装 tvm本文我们将介绍在本机中使用 tvm Python 接口来编译优化模型的一个demo。 TVM 是一个深度学习编译器框架有许多不同的模块可用于处理深度学习模型和运算符。在本教程中我们将学习如何使用 Python API 加载、编译和优化模型。在本文中我们将使用 Python 接口的 tvm 完成以下任务为 tvm runtime 编译一个预训练好的 ResNet50-v2 模型在编译好的模型上运行一张真实的图像并得到正确的结果使用 tvm 在 CPU 上 tune 模型使用 tvm 收集的数据重新编译并优化模型再次运行一张真实的图像对比优化前后模型的输出和性能导入必要的包 onnx用于模型的加载和转换PIL用于处理图像数据的 Python 图像库numpy用于图像数据预处理和后处理的用于下载测试数据的辅助程序TVM relay 框架和 TVM Graph Executor import onnx from tvm.contrib.download import download_testdata from PIL import Image import numpy as np import tvm.relay as relay import tvm from tvm.contrib import graph_executor下载并加载onnx模型在本文中我们将使用 ResNet-50 v2。 TVM 提供了一个帮助库来下载预先训练的模型。通过模块提供模型 URL、文件名和模型类型TVM 将下载模型并将其保存到磁盘。对于 ONNX 模型的实例我们可以使用 ONNX runtime 将其加载到内存中。另外提一下一个很方便的查看 onnx 模型的工具netron。 model_url .join([https://github.com/onnx/models/raw/,master/vision/classification/resnet/model/,resnet50-v2-7.onnx,] )model_path download_testdata(model_url, resnet50-v2-7.onnx, moduleonnx) onnx_model onnx.load(model_path)下载、预处理并加载测试图像我们从网络上下载一只小猫的图像作为测试图像。 img_url https://s3.amazonaws.com/model-server/inputs/kitten.jpg img_path download_testdata(img_url, imagenet_cat.png, moduledata)# 将图像尺寸调整为 (224, 224) resized_image Image.open(img_path).resize((224, 224)) img_data np.asarray(resized_image).astype(float32)# 此时我们图像的数据排布是 HWC但是 onnx 需要的是 CHW所以要转换以下 img_data np.transpose(img_data, (2, 0, 1))# 根据 ImageNet 数据集的标准进行归一化 imagenet_mean np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) imagenet_stddev np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) norm_img_data (img_data / 255 - imagenet_mean) / imagenet_stddev# 增加通道维此时我们输入的数据排布为 NCHW img_data np.expand_dims(norm_img_data, axis0)注意以上的模型和测试图像完全可以替换成自己的只要按要求转换为指定的格式即可通过relay编译模型 target llvm注意请定义正确的定义正确的target 指定正确的 target 会对编译模块的性能产生巨大影响因为它可以利用 target 上可用的硬件功能。有关更多信息请参阅自动调整 x86 CPU 的卷积网络。我们建议确定您正在运行的 CPU 以及可选功能并适当设置 target 。例如对于某些处理器 target “llvm -mcpuskylake”或 target “llvm -mcpuskylake-avx512” 用于具有 AVX-512 矢量指令集的处理器。 # 注意这里 input_name 可能会根据模型不同而不同大家可以使用上面提到 netron 工具来查看输入名称 input_name data shape_dict {input_name: img_data.shape}mod, params relay.frontend.from_onnx(onnx_model, shape_dict)with tvm.transform.PassContext(opt_level3):lib relay.build(mod, targettarget, paramsparams)dev tvm.device(str(target), 0) module graph_executor.GraphModule(lib[default](dev))在 TVM Runtime 上执行在模型编译完成之后我们可以使用 TVM Runtime 来用模型来输出预测结果。要运行 TVM Runtime 来完成预测我们需要编译好的模型就是我们刚刚做的有效地模型输入 dtype float32 module.set_input(input_name, img_data) module.run() output_shape (1, 1000) tvm_output module.get_output(0, tvm.nd.empty(output_shape)).numpy()收集基本性能数据我们这里要收集这个未优化模型相关的一些基本性能数据然后将其与 tune 后的模型进行比较。为了消除 CPU 噪声的影响我们以多次重复的方式在多个批次中运行计算然后收集一些关于平均值、中值和标准偏差的基础统计数据。 import timeittiming_number 10 timing_repeat 10 unoptimized (np.array(timeit.Timer(lambda: module.run()).repeat(repeattiming_repeat, numbertiming_number))* 1000/ timing_number ) unoptimized {mean: np.mean(unoptimized),median: np.median(unoptimized),std: np.std(unoptimized), }print(unoptimized)此处输出 {mean: 229.1864895541221, median: 228.7280524149537, std: 1.0664440211813757}对结果进行后处理如前所述不同的模型输出张量的方式可能不同。在我们的例子中我们需要进行一些后处理使用为模型提供的查找表将 ResNet-50-V2 的输出呈现为更易读的形式。 from scipy.special import softmax# 下载标签列表 labels_url https://s3.amazonaws.com/onnx-model-zoo/synset.txt labels_path download_testdata(labels_url, synset.txt, moduledata)with open(labels_path, r) as f:labels [l.rstrip() for l in f]# 打开并读取输出张量 scores softmax(tvm_output) scores np.squeeze(scores) ranks np.argsort(scores)[::-1] for rank in ranks[0:5]:print(class%s with probability%f % (labels[rank], scores[rank]))此处输出 classn02123045 tabby, tabby cat with probability0.610551 classn02123159 tiger cat with probability0.367180 classn02124075 Egyptian cat with probability0.019365 classn02129604 tiger, Panthera tigris with probability0.001273 classn04040759 radiator with probability0.000261调整 tune模型之前编译的模型工作在 TVM Runtime 上但是并未提供任何针对特定硬件平台的优化。这里我们来演示如何构建一个针对特定硬件平台的优化模型。在某些情况下使用我们自己编译的模块运行推理时性能可能无法达到预期。在这种情况下我们可以利用自动调谐器Auto-tuner为模型找到更好的配置并提高性能。 TVM 中的调优是指优化模型以在给定目标上运行得更快的过程。这与训练training和微调fine-tuning的不同之处在于它不会影响模型的准确性而只会影响运行时性能。作为调优过程的一部分TVM 将尝试运行许多不同的算子实现的可能以查看哪个性能最佳。并将这些运行的结果存储在调整记录文件中。在最简单的形式下tuning 需要我们指定三项我们想要运行该模型的目标设备的规格存储调整记录输出文件的路径要调整的模型的路径首先我们导入一些需要的库 import tvm.auto_scheduler as auto_scheduler from tvm.autotvm.tuner import XGBTuner from tvm import autotvm为运行器runner设置一些基本的参数运行其会根据这组特定的参数来生成编译代码并测试其性能。 number 指定我们将要测试的不同配置的数目repeat 指定我们对每种配置测试多少次min_repeat_ms 执行运行每次配置测试的多长时间如果重复次数低于此值则会增加。该选项对于 GPU tuning 时必须的对于 CPU tuning 则不需要。将其设为 0 即禁用它。timeout 指定了每次配置测试的运行时间上限。 number 10 repeat 1 min_repeat_ms 0 # 由于我们是 CPU tuning故不需要该参数 timeout 10 # 秒# 创建 TVM runner runner autotvm.LocalRunner(numbernumber,repeatrepeat,timeouttimeout,min_repeat_msmin_repeat_ms,enable_cpu_cache_flushTrue, )创建一个简单的结构来保存调整选项。 tunner我们使用 XGBoost 算法来指导搜索。在实际中可能需要根据模型复杂度、时间限制等因素选择其他算法。tirals对于实际项目您需要将试验次数设置为大于此处使用的值 10。 CPU 推荐 1500GPU 3000-4000。所需的试验次数可能取决于特定模型和处理器因此值得花一些时间评估一系列值的性能以找到调整时间和模型优化之间的最佳平衡。early_stopping 参数是在应用提前停止搜索的条件之前要运行的最小 trial 数。measure_option 指定将在何处构建试用代码以及将在何处运行。在本例中我们使用我们刚刚创建的 LocalRunner 和一个 LocalBuilder。tuning_records 选项指定一个文件来写入调整数据。 tuning_option {tuner: xgb,trials: 10,early_stopping: 100,measure_option: autotvm.measure_option(builderautotvm.LocalBuilder(build_funcdefault), runnerrunner),tuning_records: resnet-50-v2-autotuning.json, }注意在此示例中为了节省时间我们将试验次数和提前停止次数设置为 10。如果将这些值设置得更高我们可能会看到更多的性能改进但这是以花费调优时间为代价的。收敛所需的试验次数将根据模型和目标平台的具体情况而有所不同。 # 开始从 onnx 模型中提取 tasks tasks autotvm.task.extract_from_program(mod[main], targettarget, paramsparams)# 一次 tune 提取到的 tasks for i, task in enumerate(tasks):prefix [Task %2d/%2d] % (i 1, len(tasks))tuner_obj XGBTuner(task, loss_typerank)tuner_obj.tune(n_trialmin(tuning_option[trials], len(task.config_space)),early_stoppingtuning_option[early_stopping],measure_optiontuning_option[measure_option],callbacks[autotvm.callback.progress_bar(tuning_option[trials], prefixprefix),autotvm.callback.log_to_file(tuning_option[tuning_records]),],)此处输出 [Task 1/25] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/10) | 0.00 s [Task 1/25] Current/Best: 33.79/ 49.04 GFLOPS | Progress: (4/10) | 5.66 s...Done.[Task 25/25] Current/Best: 3.13/ 3.13 GFLOPS | Progress: (4/10) | 3.17 s [Task 25/25] Current/Best: 2.48/ 3.13 GFLOPS | Progress: (8/10) | 15.43 s [Task 25/25] Current/Best: 0.00/ 3.13 GFLOPS | Progress: (10/10) | 45.72 s使用 tuning data 编译优化过的模型作为上述调优过程的输出我们获得了存储在 resnet-50-v2-autotuning.json 中的调优记录。编译器将根据该结果为指定 target 上的模型生成高性能代码。现在已经收集了模型的调整数据我们可以使用优化的算子重新编译模型以加快计算速度。 with autotvm.apply_history_best(tuning_option[tuning_records]):with tvm.transform.PassContext(opt_level3, config{}):lib relay.build(mod, targettarget, paramsparams)dev tvm.device(str(target), 0) module graph_executor.GraphModule(lib[default](dev))验证优化过后的模型的运行后的输出结果与之前的相同 dtype float32 module.set_input(input_name, img_data) module.run() output_shape (1, 1000) tvm_output module.get_output(0, tvm.nd.empty(output_shape)).numpy()scores softmax(tvm_output) scores np.squeeze(scores) ranks np.argsort(scores)[::-1] for rank in ranks[0:5]:print(class%s with probability%f % (labels[rank], scores[rank]))此处输出 classn02123045 tabby, tabby cat with probability0.610552 classn02123159 tiger cat with probability0.367180 classn02124075 Egyptian cat with probability0.019365 classn02129604 tiger, Panthera tigris with probability0.001273 classn04040759 radiator with probability0.000261确是是相同的。比较调整过的和未调整过的模型这里我们同样收集与此优化模型相关的一些基本性能数据以将其与未优化模型进行比较。根据底层硬件、迭代次数和其他因素在将优化模型与未优化模型进行比较时我们能看到性能改进。 import timeittiming_number 10 timing_repeat 10 optimized (np.array(timeit.Timer(lambda: module.run()).repeat(repeattiming_repeat, numbertiming_number))* 1000/ timing_number ) optimized {mean: np.mean(optimized), median: np.median(optimized), std: np.std(optimized)}print(optimized: %s % (optimized)) print(unoptimized: %s % (unoptimized))此处输出 optimized: {mean: 211.9480087934062, median: 211.2688914872706, std: 1.1843122740378864} unoptimized: {mean: 229.1864895541221, median: 228.7280524149537, std: 1.0664440211813757}在本教程中我们给出了一个简短示例说明如何使用 TVM Python API 编译、运行和调整模型。我们还讨论了对输入和输出进行预处理和后处理的必要性。在调整过程之后我们演示了如何比较未优化和优化模型的性能。这里我们展示了一个在本地使用 ResNet 50 V2 的简单示例。但是TVM 支持更多功能包括交叉编译、远程执行和分析/基准测试。这将会在以后的教程中介绍。 Ref https://tvm.apache.org/docs/tutorial/autotvm_relay_x86.html#sphx-glr-tutorial-autotvm-relay-x86-py

查看全文

http://www.pierceye.com/news/970795/