j8typz 发表于 2024-7-28 01:21:03

怎么样用英伟达TensorRT优化TensorFlow Serving的性能?谷歌工程师一文详解


    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">文 / Guangda Lai、Gautam Vasudevan、Abhijit Karmarkar、Smit Hinsu</p>量子位 转载自 TensorFlow公众号<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">怎么样</span>用TensorFlow Serving系统,结合英伟达的Tensor RT,实现高性能深度学习推理?</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">TensorFlow团队的工程师们最新发布的一篇教程,<span style="color: black;">便是</span>要一步步教会你。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">量子位经授权转载,如下~</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">TensorFlow Serving 是用于<span style="color: black;">设备</span>学习模型的高性能灵活服务系统,而 NVIDIA TensorRT 是实现高性能深度学习推理的平台,<span style="color: black;">经过</span>将二者相结合,用户便可<span style="color: black;">得到</span>更高性能,从而<span style="color: black;">容易</span>实现 GPU 推理。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">TensorFlow 团队与 NVIDIA 合作,在 TensorFlow v1.7 中首次添加了对 TensorRT 的支持。此后,<span style="color: black;">咱们</span><span style="color: black;">始终</span>密切合作,<span style="color: black;">一起</span>改进 TensorFlow-TensorRT 集成(<span style="color: black;">叫作</span>为 TF-TRT)。TensorFlow Serving 1.13 现已实现这种集成,TensorFlow 2.0 <span style="color: black;">火速</span><span style="color: black;">亦</span>会进行集成。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_jpg/NkE3uMFiafXGCTTJWibdttqxN7MibIHULv5OiapWEyrTJXRpSpJmFSBXvQBiaWwFibnVoNz1RZibgg3icclKcJPqHPCzHw/640?wx_fmt=jpeg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在 之前的<span style="color: black;">文案</span> 中,<span style="color: black;">咱们</span>介绍了<span style="color: black;">怎样</span>借助 Docker <span style="color: black;">运用</span> TensorFlow Serving。在本文中,<span style="color: black;">咱们</span>将展示以<span style="color: black;">一样</span>的方式运行经 TF-TRT 转换的模型有多简单。与之前<span style="color: black;">同样</span>,<span style="color: black;">咱们</span>尝试在生产环境中<span style="color: black;">安排</span> ResNet 模型。下方所有示例均在配备 Titan-V GPU 的工作站上运行。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">注:ResNet 链接</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://github.com/tensorflow/models/tree/master/official/resnet</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在 GPU 上利用 TensorFlow Serving <span style="color: black;">安排</span> ResNet</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在这项练习中,<span style="color: black;">咱们</span>仅下载 经过预训练的 ResNet SavedModel:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ mkdir /tmp/resnet</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ ls /tmp/resnet</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1538687457</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">注:经过预训练的 ResNet 链接</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在之前的<span style="color: black;">文案</span>中,<span style="color: black;">咱们</span>展示了<span style="color: black;">怎样</span><span style="color: black;">运用</span> TensorFlow Serving CPU Docker 图像<span style="color: black;">供给</span>模型。在<span style="color: black;">这儿</span>,<span style="color: black;">咱们</span>运行 GPU Docker 图像(请查看 此处 <span style="color: black;">认识</span><span style="color: black;">关联</span>说明),以借助 GPU <span style="color: black;">供给</span>并测试此模型:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ docker pull tensorflow/serving:latest-gpu</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ docker run --rm --runtime=nvidia -p 8501:8501 --name tfserving_resnet \</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">-v /tmp/resnet:/models/resnet -e MODEL_NAME=resnet -t tensorflow/serving:latest-gpu &amp;</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">…</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">… server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 …</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">… server.cc:302] Exporting HTTP/REST API at:localhost:8501 …</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ curl -o /tmp/resnet/resnet_client.py https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/example/resnet_client.py</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ python /tmp/resnet/resnet_client.py</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Prediction class:286, avg latency:18.0469 ms</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">注:此处 链接</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://www.tensorflow.org/serving/docker#serving_with_docker_using_your_gpu</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">此 docker run 命令会<span style="color: black;">起步</span> TensorFlow Serving 服务器,以<span style="color: black;">供给</span> /tmp/resnet 中已下载的 SavedModel,并在主机中开放 REST API 端口 8501。resnet_client.py 会发送<span style="color: black;">有些</span>图像给服务器,并返回服务器所作的预测。<span style="color: black;">此刻</span>让<span style="color: black;">咱们</span>终止 TensorFlow Serving 容器的运行,以释放所占用的 GPU 资源。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ docker kill tfserving_resnet</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">注:REST API 链接</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://www.tensorflow.org/tfx/serving/api_rest</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">利用 TF-TRT 转换和<span style="color: black;">安排</span>模型</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">此刻</span>,<span style="color: black;">咱们</span>有了工作模型。为了享受 TensorRT 带来的好处,<span style="color: black;">咱们</span>需要在 TensorFlow Serving Docker 容器内运行转换命令,从而将此模型转换为<span style="color: black;">运用</span> TensorRT 运行运算的模型:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ docker pull tensorflow/tensorflow:latest-gpu</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ docker run --rm --runtime=nvidia -it -v /tmp:/tmp tensorflow/tensorflow:latest-gpu /usr/local/bin/saved_model_cli \</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&nbsp;convert --dir /tmp/resnet/1538687457 --output_dir /tmp/resnet_trt/1538687457 --tag_set serve \</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">tensorrt --precision_mode FP32 --max_batch_size 1 --is_dynamic_op True</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在<span style="color: black;">这儿</span>,<span style="color: black;">咱们</span>运行了 saved_model_cli 命令行工具,其中内置了对 TF-TRT 转换的支持。—dir 和 —output_dir 参数会指示 SavedModel 的位置以及在何处输出转换后的 SavedModel,而 —tag_set 则指示 SavedModel 中要转换的图表。随后,<span style="color: black;">咱们</span>在命令行中传递 tensorrt 并指定配置,<span style="color: black;">知道</span>指示其运行 TF-TRT 转换器:</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">—precision_mode 指示转换器需<span style="color: black;">运用</span>的精度,<span style="color: black;">日前</span>其仅支持 FP32 和 FP16</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">—max_batch_size 指示输入的批次<span style="color: black;">体积</span>上限。此转换器<span style="color: black;">需求</span>将由 TensorRT 处理的所有张量将其首个维度<span style="color: black;">做为</span>批次维度,而该参数则指示推理过程中会产生的最大值。若已知推理过程中的<span style="color: black;">实质</span>批次<span style="color: black;">体积</span>上限且该值与之匹配,则转换后的模型即为最优模型。请<span style="color: black;">重视</span>,转换后的模型<span style="color: black;">没法</span>处理批次规模大于此处所指定<span style="color: black;">体积</span>的输入,但可处理批次规模更小的输入</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">—is_dynamic_op 指示在模型运行时进行实际转换。<span style="color: black;">原由</span>在于,在进行转换时,TensorRT 需要<span style="color: black;">知道</span>所有形状。<span style="color: black;">针对</span>本例中<span style="color: black;">运用</span>的 ResNet 模型,其张量<span style="color: black;">无</span>固定的形状,<span style="color: black;">因此呢</span><span style="color: black;">咱们</span>需要此参数</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">注:saved_model_cli 链接</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://www.tensorflow.org/guide/saved_model#cli_to_inspect_and_execute_savedmodel</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">此刻</span>,<span style="color: black;">咱们</span>只需为模型指定正确的目录,便可利用 Docker <span style="color: black;">供给</span>经 TF-TRT 转换的模型,这与之前<span style="color: black;">同样</span>简单:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ docker run --rm --runtime=nvidia -p 8501:8501 --name tfserving_resnet \</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&nbsp;-v /tmp/resnet_trt:/models/resnet -e MODEL_NAME=resnet -t tensorflow/serving:latest-gpu &amp;</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">…</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">… server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 …</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">… server.cc:302] Exporting HTTP/REST API at:localhost:8501 …</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">向其发送请求:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ python /tmp/resnet/resnet_client.py</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Prediction class:286, avg latency:15.0287 ms</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">最后,<span style="color: black;">咱们</span>终止容器的运行:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ docker kill tfserving_resnet</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span><span style="color: black;">能够</span>看到,<span style="color: black;">运用</span> TensorFlow Serving 和 Docker 生成经 TF-TRT 转换的模型与<span style="color: black;">供给</span><span style="color: black;">通常</span>模型<span style="color: black;">同样</span>简单。<span style="color: black;">另外</span>,以上为展示内容,其中的性能数字仅适用于<span style="color: black;">咱们</span>所<span style="color: black;">运用</span>的模型和运行本示例的设备,但它的确展现出<span style="color: black;">运用</span> TF-TRT 所带来的性能<span style="color: black;">优良</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">TensorFlow 2.0 发布在即,TensorFlow 团队和 NVIDIA 正在<span style="color: black;">一起</span><span style="color: black;">奋斗</span>,以<span style="color: black;">保证</span> TF-TRT 能在 2.0 中流畅运行。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">如需<span style="color: black;">认识</span>最新信息,请查看 TF-TRT GitHub 代码库 :https://github.com/tensorflow/tensorrt</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">— <strong style="color: blue;">完</strong>
    </p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> —</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">加入社群</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">量子位现开放「AI+行业」社群,面向AI行业<span style="color: black;">关联</span>从业者,技术、<span style="color: black;">制品</span>等人员,<span style="color: black;">按照</span>所在行业可<span style="color: black;">选取</span>相应行业社群,在量子位公众号(QbitAI)对话界面回复关键词“行业群”,获取入群方式。行业群会有审核,敬请谅解。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">另外</span>,量子位AI社群正在招募,欢迎对AI感兴趣的<span style="color: black;">朋友</span>,在量子位公众号(QbitAI)对话界面回复关键字“交流群”,获取入群方式。</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">诚挚招聘</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">量子位正在招募编辑/记者,工作地点在北京中关村。期待有才气、有热情的<span style="color: black;">朋友</span>加入<span style="color: black;">咱们</span>!<span style="color: black;">关联</span>细节,请在量子位公众号(QbitAI)对话界面,回复“招聘”两个字。</p><img src="https://mmbiz.qpic.cn/mmbiz_jpg/YicUhk5aAGtA79bkQ65Hic3CUlPxCibb11eNqKmxqKEXQicKplodQiaGGKfUg1wyaibjV81oicSjsc8McozkXGmibgicRFw/640?wx_fmt=jpeg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">量子位&nbsp;</strong></span><span style="color: black;">QbitAI · 头条号签约作者</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">վᴗ ի <span style="color: black;">跟踪</span>AI技术和<span style="color: black;">制品</span>新动态</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">爱好</span>就点「好看」吧 !</p>




流星的美 发表于 2024-8-21 04:14:46

外链发布社区 http://www.fok120.com/

nqkk58 发表于 2024-10-8 21:05:51

你的见解真是独到,让我受益匪浅。

nykek5i 发表于 2024-10-18 07:59:56

哈哈、笑死我了、太搞笑了吧等。

nykek5i 发表于 2024-11-10 16:29:03

回顾历史,我们感慨万千;放眼未来,我们信心百倍。
页: [1]
查看完整版本: 怎么样用英伟达TensorRT优化TensorFlow Serving的性能?谷歌工程师一文详解