f9yx0du 发表于 2024-9-28 21:19:01

自然语言处理(NLP):基于文本语义的智能问答系统


    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">博客:</p>https://wenjie.blog.csdn.net/

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">作者:艾文编程</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">抖音:艾文编程,关注我分享小知识</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">介绍:计算机硕士,10+工作经验,技术和<span style="color: black;">制品</span>负责人,现就职BAT大厂一线技术专家。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">资料获取,请<span style="color: black;">大众</span>查看<span style="color: black;">文案</span>最末尾。</span></strong></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/0de945cf64a64cfba927c9fcd49a9e05~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=Ix8vrPcfy60rBfcBoOWuhHpoCvg%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">学完您将收获</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">从0到1搭建基于文本语义的智能问答<span style="color: black;">设备</span>人</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">认识</span>智能问答系统工业应用和技术<span style="color: black;">方法</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">把握</span>文本向量化搜索引擎在工业界应用</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">把握</span>文本语义匹配检索系统实现</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">把握</span>文本语义的FAQ智能问答实现</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">适合人群:</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">NLP算法工程师;想<span style="color: black;">认识</span>智能问答系统从业人员;软件<span style="color: black;">开发</span>人员等</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">目的</span>:</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">把握</span>从0到1搭建基于文本语义的智能问答<span style="color: black;">设备</span>人,<span style="color: black;">把握</span>BERT模型和Faiss算法库在项目的应用</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本项目结合</span><span style="color: black;">Facebook AI <span style="color: black;">科研</span>院开源的针对聚类和<span style="color: black;">类似</span>性搜索库</span><strong style="color: blue;"><span style="color: black;">Faiss </span></strong><span style="color: black;">、Google<span style="color: black;">供给</span>的</span><strong style="color: blue;"><span style="color: black;">BERT</span></strong><span style="color: black;">模型实现一个基于文本语义的智能问答系统。<span style="color: black;">经过</span>项目案例从0到1搭建FAQ智能问答<span style="color: black;">设备</span>人,让<span style="color: black;">大众</span><span style="color: black;">容易</span><span style="color: black;">把握</span> </span><strong style="color: blue;"><span style="color: black;">文本语义</span></strong><span style="color: black;"><span style="color: black;">类似</span>度文本检索系统和</span><strong style="color: blue;"><span style="color: black;">FAQ</span></strong><span style="color: black;">问答<span style="color: black;">设备</span>人在工业界应用。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">1 开篇介绍</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">问答系统是自然语言处理<span style="color: black;">行业</span>一个很经典的问题,它用于回答人们以自然语言形式提出的问题,有着广泛的应用。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">经典应用场景<span style="color: black;">包含</span>:智能语音交互、在线客服、知识获取、情感类聊天等。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">平常</span>的<span style="color: black;">归类</span>有:生成型、检索型问答系统;单轮问答、多轮问答系统;面向开放<span style="color: black;">行业</span>、特定<span style="color: black;">行业</span>的问答系统。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文<span style="color: black;">触及</span>的<span style="color: black;">重点</span>是在检索型、面向特定<span style="color: black;">行业</span>的问答系统</span><strong style="color: blue;"><span style="color: black;">——智能客服<span style="color: black;">设备</span>人。</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">传统客服<span style="color: black;">设备</span>人</span></strong><span style="color: black;">的搭建流程</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">一般</span>需要将<span style="color: black;">关联</span><span style="color: black;">行业</span>的知识(Domain Knowledge),转化为一系列</span><strong style="color: blue;"><span style="color: black;">的规则和知识图谱</span></strong><span style="color: black;">。构建过程中</span><strong style="color: blue;"><span style="color: black;">重度依赖“人工”智能</span></strong><span style="color: black;">,换个场景,换个用户都需要<span style="color: black;">海量</span>的重复劳动。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/df3d8c8d72074995a0df66e805fd2fac~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=D5zzcn0SB9lUBeUqlFjRcl1OFlI%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">深度学习-智能问答<span style="color: black;">设备</span>人</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">深度语言模型会将问题和</span><strong style="color: blue;"><span style="color: black;">文档转化为语义向量</span></strong><span style="color: black;">,</span><strong style="color: blue;"><span style="color: black;">从而找到最后的匹配答案</span></strong><span style="color: black;">。本文借助Google开源的Bert模型结合Faiss开源向量搜索引擎,快速搭建基于语义理解的对话<span style="color: black;">设备</span>人。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">案例分享:FAQ问答<span style="color: black;">设备</span>人</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">FAQ是Frequently Asked Questions的简<span style="color: black;">叫作</span>。假定<span style="color: black;">咱们</span>有一个<span style="color: black;">平常</span>问题和答案的数据库,<span style="color: black;">此刻</span>用户提出了一个新问题,能<span style="color: black;">不可</span>自动从<span style="color: black;">平常</span>问题库中抽取出最<span style="color: black;">关联</span>的问题和答案来作答呢?在这个项目中,<span style="color: black;">咱们</span>会探索<span style="color: black;">怎样</span>构建<span style="color: black;">这般</span>问答<span style="color: black;">设备</span>人。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">项目核心技术点:</span></p><span style="color: black;"><span style="color: black;">经过</span>深度学习模型判断问题与答案的匹配得分</span><span style="color: black;"><span style="color: black;">运用</span>BERT模型特征提取并判断问题<span style="color: black;">类似</span>度</span><span style="color: black;"><span style="color: black;">运用</span>检索引擎Faiss索引构建和检索</span><span style="color: black;">构建在线FAQ问答系统</span>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">1-1 学什么</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">结合Faiss和bert<span style="color: black;">供给</span>的模型实现了一个中文问答系统。旨在<span style="color: black;">供给</span>一个用Faiss结合<span style="color: black;">各样</span>AI模型实现语义相似度匹配的<span style="color: black;">处理</span><span style="color: black;">方法</span>。最后<span style="color: black;">经过</span>项目案例实现:文本语义<span style="color: black;">类似</span>度文本检索系统和FAQ问答<span style="color: black;">设备</span>人。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/dca03f51fe5f4a74a883d124a4ee0f22~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=uw%2FusULQOID20RKjoaQUH4BPmvY%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">1-2 在线系统DEMO</h1><span style="color: black;">文本语义<span style="color: black;">类似</span>度匹配检索</span><span style="color: black;">文本语义FAQ问答<span style="color: black;">设备</span>人</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">项目实现以一种平台化思路<span style="color: black;">意见</span>系统,是一个通用的<span style="color: black;">处理</span><span style="color: black;">方法</span>。<span style="color: black;">研发</span>者只需要<span style="color: black;">根据</span>数据规范<span style="color: black;">就可</span>,不需要修改代码就<span style="color: black;">能够</span>运行系统了</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">2 应用场景介绍</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">CSDN 问答系统:https://ask.csdn.net/</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">百度<span style="color: black;">晓得</span>:https://zhidao.baidu.com/</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">360问答:</p>https://wenda.so.com/search/

    <h1 style="color: black; text-align: left; margin-bottom: 10px;">2-1 对话系统整体简介</h1>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">2-2 文本搜索场景</h1>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/e523fa98628d4b59a7b575827c59fe9f~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=PVBklpqLBqEbRXjgAfFtZiA%2FZFk%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/60585499116f4c0b8b10aa8dbdc169fc~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=KtzoG9%2Bc%2BoAF%2Fdgpc6f%2Bn4M5SUQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">2-3 论坛<span style="color: black;">类似</span>问答系统</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">汽车之家<span style="color: black;">持有</span><span style="color: black;">全世界</span>最大的汽车社区论坛。<span style="color: black;">累积</span>了丰富的用户问答数据,能够<span style="color: black;">处理</span>用户在看车、买车、用车等方面遇到的<span style="color: black;">各样</span>问题。针对用户在平台上提出的<span style="color: black;">各样</span>问题,从海量的高质量问答库中匹配语义最<span style="color: black;">类似</span>的问题和答案</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">文本数据<span style="color: black;">拥有</span>表达多样化、用语不规范(如:车型车系用语存在<span style="color: black;">海量</span>缩写、简写、语序颠倒等现象)、歧义性强(如:“北京”可能指汽车品牌,<span style="color: black;">亦</span>可能指城市)等特点,这给传统基于关键词匹配的搜索<span style="color: black;">办法</span>带来了很大挑战。<span style="color: black;">因此呢</span>,在传统关键词匹配的<span style="color: black;">基本</span>上,进一步引入语义搜索技术,将精华问答库的问题映射为多维向量,进行语义匹配,<span style="color: black;">提高</span>问题匹配准确性。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/8cc178ddfccc43a889f1e6b96a0a852b~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=bltMA1QNlLl0MjNsbcDlnr4Z8Dw%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">2-4 智能对话闲聊系统</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">数据格式: query-answer 对如下</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不要骂人 好的,听你的就行了</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不要骂人严重的直接禁言 好的,听你的就行了</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不要骂人了吧 好的,听你的就行了</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不要骂人哦 好的,听你的就行了</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不要骂人小心封号啊 好的,听你的就行了</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不认识你不记得你 你当我傻逼啊</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不认识你昂 你当我傻逼啊</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">不认识你老哥了 你当我傻逼啊</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">3 问答系统发展<span style="color: black;">状况</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">FAQ检索型问答流程是<span style="color: black;">按照</span>用户的新Query去FAQ知识库找到最合适的答案并反馈给用户。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">检索过程如图所示</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/743b89d539d347c787eb8c0651fc8da5~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=P8Dl4D603ri0AU2fNR15NQpg71Q%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">其中,Qi是知识库里的标准问,Ai是标准问对应的答案。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">详细</span>处理流程为:</span></p><span style="color: black;">候选集离线建好索引</span><span style="color: black;">线上收到用户 query ,召回一批候选集<span style="color: black;">做为</span>粗排结果传入下一模块进行进一步精确排序;</span><span style="color: black;">利用matching模型计算用户query和FAQ知识库中问题或答案的匹配程度;</span><span style="color: black;">利用ranking 模型对候选集做 rerank 并返回 topk个候选答案。</span><span style="color: black;">matching 模型负责对 (query, reply) pair 做特征匹配,其输出的 matching score <span style="color: black;">一般</span>会<span style="color: black;">做为</span> ranking 模型的一维特征;</span><span style="color: black;">ranking 模型负责<span style="color: black;">详细</span>的 reranking 工作,其输入是候选回复对应的特征向量,<span style="color: black;">按照</span><span style="color: black;">实质</span><span style="color: black;">需要</span>构造<span style="color: black;">区别</span>类型(如:pointwise, pairwise, listwise)的损失函数,其输出的 ranking score 是候选回复的<span style="color: black;">最后</span>排序依据。</span><span style="color: black;">有些检索系统可能不会<span style="color: black;">知道</span>区分 matching 和 ranking 这两个过程。</span>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">3-1 智能问答常用<span style="color: black;">处理</span><span style="color: black;">方法</span></h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对FAQ检索式问答系统,<span style="color: black;">通常</span>处理流程</span></p><span style="color: black;">问答对数据集的清洗</span><span style="color: black;">Embedding</span><span style="color: black;">模型训练</span><span style="color: black;">计算文本<span style="color: black;">类似</span>度</span><span style="color: black;">在问答库中选出与输入问题<span style="color: black;">类似</span>度最高的问题</span><span style="color: black;">返回<span style="color: black;">类似</span>度最高的问题所对应的答案</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">搭建一个FAQ问答系统<span style="color: black;">通常</span>实现<span style="color: black;">办法</span></span></p><span style="color: black;">基于ES的智能问题系统</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">(<span style="color: black;">经过</span>关键词匹配获取答案,类似电商、<span style="color: black;">资讯</span>搜索<span style="color: black;">行业</span>关键词召回)</span></p><span style="color: black;">基于TF-IDF方式</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">(计算<span style="color: black;">每一个</span>单词的tfidf数值,分词后换算句子<span style="color: black;">暗示</span>。 TF-IDF 方式<span style="color: black;">亦</span>在用在关键词提取)</span></p><span style="color: black;">基于Doc2Vec 模型(<span style="color: black;">思虑</span>词和段,相比于word2vec<span style="color: black;">来讲</span>有了段落信息)</span><span style="color: black;"><span style="color: black;">经过</span>深度学习语言模型bert 提取向量,<span style="color: black;">而后</span>计算<span style="color: black;">类似</span>度</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">方法</span><span style="color: black;">能够</span>扩展到的业务<span style="color: black;">需要</span>(本文介绍的是一种文本语义匹配通用<span style="color: black;">处理</span><span style="color: black;">方法</span>)</span></p><span style="color: black;">智能客服<span style="color: black;">行业</span>语义匹配</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">(对话系统检索式智能问答系统,答案在知识库中且返回<span style="color: black;">独一</span>的数据记录)</span></p><span style="color: black;">以图搜索(resnet <span style="color: black;">照片</span>向量化<span style="color: black;">暗示</span>)</span><span style="color: black;"><span style="color: black;">资讯</span><span style="color: black;">行业</span>文本<span style="color: black;">类似</span><span style="color: black;">举荐</span>(<span style="color: black;">类似</span><span style="color: black;">资讯</span><span style="color: black;">举荐</span>等)</span><span style="color: black;">基于文本语义匹配检索系统(文本<span style="color: black;">类似</span>性rank )</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对这类问题,重点文本等<span style="color: black;">经过</span>某种方式进行向量化<span style="color: black;">暗示</span>(word2vec、doc2vec、elmo、bert等),<span style="color: black;">而后</span>把这种特征向量进行索引(faiss/Milus) ,<span style="color: black;">最后</span>实<span style="color: black;">此刻</span>线服务系统的检索,<span style="color: black;">而后</span>再<span style="color: black;">经过</span><span style="color: black;">必定</span>的规则进行过滤,获取<span style="color: black;">最后</span>的数据内容。</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">3-2 传统文本匹配<span style="color: black;">办法</span>存在问题</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">传统的文本匹配技术有BoW、VSM、TF-IDF、 BM25、Jaccord、SimHash等算法,<span style="color: black;">重点</span><span style="color: black;">处理</span>字面<span style="color: black;">类似</span>度问题。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">面临的困难:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">由 于中文含义的丰富性,<span style="color: black;">一般</span>很难直接<span style="color: black;">按照</span>关键字匹配<span style="color: black;">或</span>基于<span style="color: black;">设备</span>学习的浅层模型来确定两个句子之间的语义<span style="color: black;">类似</span>度。</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">3-3 深度学习文本匹配</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">深度学习模型文本做语义<span style="color: black;">暗示</span><span style="color: black;">逐步</span>应用于检索式问答系统。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">相比传统的模型优点:</span></p><span style="color: black;">能够节省人工提取特征的<span style="color: black;">海量</span>人力物力</span><span style="color: black;">从<span style="color: black;">海量</span>的样本中自动提取出词语之间的关系,并能结合短语匹配中的结构信息和文本匹配的层次化特性,发掘传统模型很难发掘的隐含在<span style="color: black;">海量</span>数据中含义不<span style="color: black;">显著</span>的特征</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文采用<span style="color: black;">类似</span>问题匹配来实现一个FAQ问答系统。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">问题:什么是<span style="color: black;">类似</span>问题匹配?</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">答案:即对比用户问题与现有FAQ知识库中问题的<span style="color: black;">类似</span>度,返回用户问题对应的最准确的答案</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">深度语义匹配模型<span style="color: black;">能够</span>分为两大类,分别是representation-based method 和 interaction-based method,</span><span style="color: black;"><span style="color: black;">这儿</span><span style="color: black;">咱们</span>针对Represention-based Method这种<span style="color: black;">办法</span>进行探索。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/9223349410914eb49f53dc4255c87a04~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=VNCMPPph94q%2BOoJH0XHmSqHyHT8%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这类算法<span style="color: black;">首要</span>将待匹配的两个对象<span style="color: black;">经过</span>深度学习模型进行<span style="color: black;">暗示</span>,之后计算这两个<span style="color: black;">暗示</span>之间的<span style="color: black;">类似</span>度便可输出两个对象的匹配度。针对匹配度函数f(x,y)的计算<span style="color: black;">一般</span>有两种<span style="color: black;">办法</span>: cosine 函数 和 多层感知器网络(MLP)</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/86fb1cfdf30248c3aeb0005cea21c188~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=SRVRVW1ZPGHUOTjalaPXuoMFWAw%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">对比两种匹配<span style="color: black;">办法</span>的优缺点</span></p><span style="color: black;">cosine 函数:<span style="color: black;">经过</span><span style="color: black;">类似</span>度度量函数进行计算,<span style="color: black;">实质</span><span style="color: black;">运用</span>过程中最常用的<span style="color: black;">便是</span> cosine 函数,这种方式简单<span style="color: black;">有效</span>,并且得分区间可控<span style="color: black;">道理</span><span style="color: black;">知道</span></span><span style="color: black;">多层感知器网络(MLP):将两个向量再接一个多层感知器网络(MLP),<span style="color: black;">经过</span>数据去训练拟合出一个匹配度得分,更加灵活拟合能力更强,但对训练的<span style="color: black;">需求</span><span style="color: black;">亦</span>更高</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">4 问答系统关键技术</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Google 的 BERT 模型在 NLP <span style="color: black;">行业</span>中<span style="color: black;">拥有</span>巨大的影响力。它是一个通用的语言<span style="color: black;">暗示</span>模型,<span style="color: black;">能够</span>应用于<span style="color: black;">许多</span><span style="color: black;">行业</span>。本文的项目是将 Faiss与 BERT 模型结合搭建文本语义匹配检索系统,<span style="color: black;">运用</span> BERT 模型将文本数据转成向量,结合 Faiss特征向量<span style="color: black;">类似</span>度搜索引擎<span style="color: black;">能够</span>快速搜索<span style="color: black;">类似</span>文本,<span style="color: black;">最后</span>获取想要的结果</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">4-1 Faiss</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Faiss是Facebook AI团队开源的针对聚类和<span style="color: black;">类似</span>性搜索库,为稠密向量<span style="color: black;">供给</span><span style="color: black;">有效</span><span style="color: black;">类似</span>度搜索和聚类,支持十亿级别向量的搜索,是<span style="color: black;">日前</span>最为成熟的近似近邻搜索库。它<span style="color: black;">包括</span>多种搜索任意<span style="color: black;">体积</span>向量集(备注:向量集<span style="color: black;">体积</span>由RAM内存决定)的算法,以及用于算法<span style="color: black;">评定</span>和参数<span style="color: black;">调节</span>的支持代码。Faiss用C++编写,并<span style="color: black;">供给</span>与Numpy完美衔接的Python接口。除此以外,对<span style="color: black;">有些</span>核心算法<span style="color: black;">供给</span>了GPU实现。<span style="color: black;">关联</span>介绍参考《Faiss:Facebook 开源的<span style="color: black;">类似</span>性搜索类库》</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/24ff297171cf4cbcae341ee27e07971e~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=iCpFgBhKsaFahyiIiqBEWQc%2BqCU%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">4-2 BERT</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">BERT 模型是 Google 发布的一个新的语言表达模型(Language Representation Model),全<span style="color: black;">叫作</span>是 Bidirectional Encoder Representations from Transformers,即双向编码表征模型。BERT 模型的<span style="color: black;">优良</span><span style="color: black;">表现</span>在两方面。</span></p><span style="color: black;">一方面,它<span style="color: black;">运用</span>基于<span style="color: black;">尤其</span>设计的<span style="color: black;">重视</span>力机制(Attention Mechanism)的简单全连接网络取代了<span style="color: black;">繁杂</span>的 CNN 和 RNN 网络。<span style="color: black;">不仅</span>大大减少了训练时间,<span style="color: black;">同期</span>有效地<span style="color: black;">提高</span>了网络性能。</span><span style="color: black;">另一方面,BERT 模型是<span style="color: black;">第1</span>个真正<span style="color: black;">捕捉</span>上下文语义信息的预训练语言<span style="color: black;">暗示</span>模型。这是<span style="color: black;">由于</span> BERT 模型<span style="color: black;">运用</span>了 Transformer <span style="color: black;">做为</span>算法的<span style="color: black;">重点</span>框架,而 Transformer 框架能更彻底地<span style="color: black;">捉捕</span>语句中的双向关系。</span>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/875299c75b354d258136eda81352194e~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=hroNriYQGWvvJ8rjYhJh3r%2B9tuM%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Google <span style="color: black;">供给</span>了<span style="color: black;">有些</span>预先训练的模型,其中最基本的两个模型是BERT-base 模型和 BERT-large 模型。<span style="color: black;">详细</span>参数如下表所示:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/ebd21f2a3bb0497290ba6caa7df1c034~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=jaUd0z%2Bnt8esW%2F5YUcoJwzDN8O0%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">BERT-base 模型和 BERT-large 模型的参数总量<span style="color: black;">体积</span>和网络层数<span style="color: black;">区别</span>,BERT-large 模型所占计算机内存较多,<span style="color: black;">因此</span>本文项目<span style="color: black;">选择</span> BERT-base 模型对文本数据进行向量转化。(注:其中,层数(即 Transformer 块个数)<span style="color: black;">暗示</span>为 L,<span style="color: black;">隐匿</span>尺寸<span style="color: black;">暗示</span>为 H ,自<span style="color: black;">重视</span>力头数<span style="color: black;">暗示</span>为 A 。)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">5 问答系统实现</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">关于main.py <span style="color: black;">重点</span>参数</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">$ python main.py --help</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">usage: main.py [-h] --task TASK [--load] [--index] [--n_total N_TOTAL]</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">[--search] [--sentence SENTENCE] [--topK TOPK]</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">optional arguments:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">-h, --help show this help message and exit</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">--task TASK project task name</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">--load load data into db</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">--index load data text vector into faiss</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">--n_total N_TOTAL take data n_sample ,generate it into faiss</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">--search search matched text from faiss</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">--sentence SENTENCE query text data</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">--topK TOPK take matched data in topK</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">第1</span>步:知识库存储 &lt; id,answer&gt;</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">$ python main.py --task </span><span style="color: black;">medical</span> --load</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">第二步:索引构建&lt;id,question&gt;</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">$ python main.py --task </span><span style="color: black;">medical</span> --index --n_total 120000</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">第三步:文本语义<span style="color: black;">类似</span>度匹配检索</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">$ python main.py --task </span><span style="color: black;">medical_120000 </span>--search --sentence 得了乙肝怎么治疗</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">备注:</span><span style="color: black;">medical_120000 </span>中task_${索引记录数} 组合</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">完成<span style="color: black;">以上</span>功能后,<span style="color: black;">咱们</span><span style="color: black;">能够</span><span style="color: black;">这里</span><span style="color: black;">基本</span>上,<span style="color: black;">按照</span>业务<span style="color: black;">区别</span>搭建<span style="color: black;">有些</span><span style="color: black;">关联</span>应用,例如:</span></p><span style="color: black;"><span style="color: black;">能够</span>实现FAQ智能问答</span><span style="color: black;"><span style="color: black;">能够</span>实现<span style="color: black;">资讯</span><span style="color: black;">新闻</span>内容类文本语义匹配召回</span><span style="color: black;"><span style="color: black;">能够</span>实现基于文本语义的中文检索系统</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">第四步:基于文本语义检索服务实现FAQ问答</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">$ python main.py --task </span><span style="color: black;">medical_120000 </span>--search --sentence 身上<span style="color: black;">显现</span> --topK 10</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">第五步:基于文本语义检索服务Web服务</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">起步</span>服务python app.py --task</span><span style="color: black;">medical_120000 </span>,<span style="color: black;">而后</span><span style="color: black;">拜访</span><span style="color: black;">位置</span> http://xx.xx.xx.xx:5000/</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">咱们</span><span style="color: black;">这儿</span>呢,<span style="color: black;">运用</span><span style="color: black;">以上</span><span style="color: black;">基本</span>服务完成一个FAQ问答<span style="color: black;">设备</span>人</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">5-1 数据规范</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">项目数据集<span style="color: black;">包括</span>三个部分:问题数据集+答案数据集+问题-答案<span style="color: black;">独一</span>标识,数据是一一对应的。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/2c31f2f56f4b4457ad265bb30e77e898~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=G8E3ks6iSB1L50EQgQnO83roWHU%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对<span style="color: black;">区别</span>的业务系统,<span style="color: black;">咱们</span>只需要<span style="color: black;">供给</span>这种数据格式,<span style="color: black;">经过</span>本文的模板就<span style="color: black;">能够</span>快速搭建一个demo了,祝<span style="color: black;">大众</span>学习愉快。</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">5-2 系统整体架构</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文的文本语义匹配搜索项目<span style="color: black;">运用</span>的 Faiss和BERT的整体架构如图所示:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/24761e0cbff44cb4a28a4aa5c86abe65~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=EPbJCBGC3q1rQTe2COPJ4wWFXHM%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">(注:深蓝色线为数据导入过程,橘黄色线为用户<span style="color: black;">查找</span>过程。)</span></p><span style="color: black;"><span style="color: black;">首要</span>,本文项目<span style="color: black;">运用</span>开源的 bert-serving , BERT做句子编码器,标题数据转化为固定长度</span><span style="color: black;">为 768 维的特征向量,并导入 Milvus <span style="color: black;">或</span>Faiss库。</span><span style="color: black;"><span style="color: black;">而后</span>,对存入 Milvus/Faiss 库中的特征向量进行存储并<span style="color: black;">创立</span>索引,<span style="color: black;">同期</span>原始数据<span style="color: black;">供给</span><span style="color: black;">独一</span>ID编码,将 ID 和对应内容存储在 PostgreSQL 中。</span><span style="color: black;">最后,用户输入一个标题,BERT 将其转成特征向量。Milvus/Faiss 对特征向量进行<span style="color: black;">类似</span>度检索,得到<span style="color: black;">类似</span>的标题的 ID ,在 知识库(PostgreSQL/MySQL/SQLite。。。) 中找出 ID 对应的<span style="color: black;">仔细</span>信息返回</span>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">6-3 文本向量服务 bert-serving</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">学习资料:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">https://github.com/hanxiao/bert-as-service</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">https://github.com/tensorflow/tensorflow</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">https://blog.csdn.net/abc50319/article/details/107171952(关于bert-as-service <span style="color: black;">运用</span><span style="color: black;">文案</span>)</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">运用</span>bert as service 服务</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">第1</span>步:安装tensorflow</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Python &gt;= 3.5</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Tensorflow &gt;= 1.10 (</span><span style="color: black;">one-point-ten</span><span style="color: black;">)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">ubuntu系统-gpu下载离线安装文件并pip安装</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">tensorboard-1.15.0-py3-none-any.whl</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">tensorflow_estimator-1.15.1-py2.py3-none-any.whl</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">tensorflow_gpu-1.15.3-cp37-cp37m-manylinux2010_x86_64.whl</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">亦</span><span style="color: black;">能够</span><span style="color: black;">经过</span>下面的方式快速下载(<span style="color: black;">这儿</span>下载cpu版本)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">pip install tensorflow==1.15.0 --user -i https://pypi.tuna.tsinghua.edu.cn/simple</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">MAC 系统cpu版本安装</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">pip3 install --upgrade https://files.pythonhosted.org/packages/dc/65/a94519cd8b4fd61a7b002cb752bfc0c0e5faa25d1f43ec4f0a4705020126/tensorflow-1.15.0-cp37-cp37m-macosx_10_11_x86_64.whl</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">https://blog.csdn.net/u012359618/article/details/107054741</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">https://www.tensorflow.org/install/pip?hl=zh-cn#system-install</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">http://www.yebaochen.com/deep-learning/install-tensorflow-on-mac</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">验证<span style="color: black;">是不是</span>安装</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">import tensorflow as tf</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">print(tf.__version__)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">第二步:bert-serving 服务搭建</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">( <span style="color: black;">咱们</span>在ubuntu系统搭建完成bert-serving ,目录:/home/ubuntu/teacher/ )</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">经过</span>bert-serving服务,<span style="color: black;">帮忙</span><span style="color: black;">咱们</span><span style="color: black;">处理</span>:文本-&gt; 向量的转换</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">1、参考github <span style="color: black;">供给</span>的代码</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">git clone https://github.com/hanxiao/bert-as-service.git</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">2、安装server和client</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">pip install bert-serving-server # server</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">pip install bert-serving-client # client, independent of `bert-serving-server`</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">3、下载pretrained BERT models</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">解压模型:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">chinese_L-12_H-768_A-12</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">├── bert_config.json</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">├── bert_model.ckpt.data-00000-of-00001</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">├── bert_model.ckpt.index</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">├── bert_model.ckpt.meta</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">└── vocab.txt</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">bert_config.json: bert 模型配置参数</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">vocab.txt: 字典</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">bert_model: 预训练的模型</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">4、<span style="color: black;">起步</span>bert-service</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">nohup bert-serving-start -model_dir chinese_L-12_H-768_A-12 -num_worker 1 -max_seq_len 64 &gt;start_bert_serving.log 2&gt;&amp;1 &amp;</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">( CPU和GPU 模式都<span style="color: black;">能够</span> )</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对<span style="color: black;">每一个</span>字段进行说明</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">workers = 4 <span style="color: black;">暗示</span><span style="color: black;">同期</span>并发处理请求数</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">model_dir 预训练的模型</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">max_seq_len 业务分析句子的长度</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">关闭服务</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">bert-serving-terminate -port 5555</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">5、测试文本-&gt; 向量<span style="color: black;">暗示</span>结果</span></p><span style="color: black;">from</span> bert_serving.client <span style="color: black;">import</span>BertClient
    bc = BertClient()
    result = bc.encode([<span style="color: black;">First do it</span>])
    <span style="color: black;">print</span>(result)<h1 style="color: black; text-align: left; margin-bottom: 10px;">5-4 向量<span style="color: black;">类似</span>度搜索引擎</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">制品</span>学习手册</span></strong></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">5-5 知识库存储</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">知识库:<span style="color: black;">能够</span>存储mongo/PostgreSQL/mysql <span style="color: black;">按照</span>数据量进行<span style="color: black;">选取</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文给<span style="color: black;">大众</span>分享的内容,数据存储在mysql上。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">备注:关于mysql 的<span style="color: black;">详细</span>安装,<span style="color: black;">大众</span>去上网<span style="color: black;">查询</span>一下。(root,12345678)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">大众</span>在学习过程中,<span style="color: black;">倘若</span>有任何的问题:<span style="color: black;">能够</span>网站留言(<span style="color: black;">或</span> aiwen2100)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">create database faiss_qa;</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">use faiss_qa;</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">CREATE TABLE `answer_info` (</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">`id` int(11) NOT NULL AUTO_INCREMENT,</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">`answer` mediumtext COLLATE utf8mb4_bin,</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">PRIMARY KEY (`id`),</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">KEY `answer_info_index_id` (`id`)</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">5-6 索引构建</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">$ python main.py --task </span><span style="color: black;">medical</span> --index --n_total 120000</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">5-7 文本语义<span style="color: black;">类似</span>度匹配搜索</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">$ python main.py --task medical --search --sentence 小安信贷</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">完成<span style="color: black;">以上</span>功能后,<span style="color: black;">咱们</span><span style="color: black;">能够</span><span style="color: black;">这里</span><span style="color: black;">基本</span>上,<span style="color: black;">按照</span>业务<span style="color: black;">区别</span>搭建<span style="color: black;">有些</span><span style="color: black;">关联</span>应用,例如:</span></p><span style="color: black;"><span style="color: black;">能够</span>实现FAQ智能问答</span><span style="color: black;"><span style="color: black;">能够</span>实现<span style="color: black;">资讯</span><span style="color: black;">新闻</span>内容类文本语义匹配召回</span><span style="color: black;"><span style="color: black;">能够</span>实现基于文本语义的中文检索系统</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">对输入数据微小的差别<span style="color: black;">瞧瞧</span>有什么<span style="color: black;">区别</span>?</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/ecb3c5469d214eee82f6986332d7dca3~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=9p0dl9T0FplG4nrah8JxNO8dipo%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">5-8 文本语义FAQ问答<span style="color: black;">设备</span>人-API接口</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">首要</span>,<span style="color: black;">咱们</span><span style="color: black;">起步</span>服务:python app.py</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">而后</span>,请求API 服务<span style="color: black;">位置</span>查看检索检索</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$ curl -H "</p>Content-Type:application/json" -X POST --data {"query": "乙肝怎么治疗"}
    <span style="color: black;">http://localhost:5000/api/v1/search</span><span style="color: black;"> | jq</span>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">5-9 文本语义FAQ问答<span style="color: black;">设备</span>人-Web界面</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">起步</span>服务 python app.py,<span style="color: black;">而后</span><span style="color: black;">拜访</span><span style="color: black;">位置</span> http://xx.xx.xx.xx:5000/</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">用户表达的细微差别,<span style="color: black;">经过</span>文本语义匹配总之能找到最佳的答案,<span style="color: black;">最后</span>反馈给用户。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">例如: 两种语言表达<span style="color: black;">瞧瞧</span>效果</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">第1</span>句: 小<span style="color: black;">孩儿</span>感冒</span><span style="color: black;">吃什么</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">第二句: 小<span style="color: black;">孩儿</span>感冒</span><span style="color: black;"><span style="color: black;">不可</span>吃什么</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">以上</span><span style="color: black;">显著</span>表达的是两个含义,而<span style="color: black;">经过</span>文本语义的方式<span style="color: black;">亦</span>很好得识别出来了,效果还不错。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/872dbae6cb6f48649224388860cd9909~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=BSPvaq7iJBXT47Vzwc3mikk6a68%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/45ebf22032bd4a25a163ed1a37b909fb~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=n2L2H4FU5aQ8OFJTp3sIZ2wbXVg%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">咱们</span><span style="color: black;">瞧瞧</span>后端服务<span style="color: black;">位置</span>数据</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/392e118babe545749d3e14e938490271~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1727628762&amp;x-signature=C7tAMnLW%2FsjGk7RtL6XSCM9sJYY%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">重视</span>:FAQ系统依赖bert-serving 服务,请确认正常工作。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">http://xx.xx.xx.xx:5000/status,正常<span style="color: black;">状况</span>下的返回结果格式如下:</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">{</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">"status":"success",</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">"ip":"127.0.0.1",</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">"port":5555,</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">"identity":"cbc94483-1cd6-406d-b170-0cb04e77725bb"</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">}</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">6 总结展望</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在 AI 高速发展的时代,<span style="color: black;">咱们</span><span style="color: black;">能够</span><span style="color: black;">运用</span>深度学习模型去处理各种类型的非结构化数据,例如<span style="color: black;">照片</span>、文本、视频和语音等。本文项目<span style="color: black;">经过</span> BERT 模型<span style="color: black;">能够</span>将这些非结构化数据提取为特征向量,<span style="color: black;">而后</span><span style="color: black;">经过</span>Faiss 对这些特征向量进行计算,实现对非结构化数据的分析与检索。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文利用Faiss搭建的FAQ问答系统<span style="color: black;">亦</span>只是其中一个场景,展示了Faiss在非结构化数据处理中的应用。欢迎<span style="color: black;">大众</span>导入自己的数据<span style="color: black;">创立</span>自己的FAQ问答系统(<span style="color: black;">或</span>文本搜索、智能客服等新系统)。Faiss向量<span style="color: black;">类似</span>度检索引擎搜索十亿向量仅需毫秒响应时间。你<span style="color: black;">能够</span><span style="color: black;">运用</span> Faiss探索<span style="color: black;">更加多</span> AI 用法!</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">详细</span><span style="color: black;">关联</span>资料请私信留言,分享<span style="color: black;">大众</span>。</span></strong></p>




4zhvml8 发表于 2024-10-5 03:20:28

回顾历史,我们感慨万千;放眼未来,我们信心百倍。

1fy07h 发表于 2024-10-25 01:22:15

回顾历史,我们感慨万千;放眼未来,我们信心百倍。

4zhvml8 发表于 2024-10-28 16:15:01

我完全同意你的看法,期待我们能深入探讨这个问题。
页: [1]
查看完整版本: 自然语言处理(NLP):基于文本语义的智能问答系统