m5k1umn 发表于 2024-10-4 13:29:44

南大通用GBase 8c分布式场景故障分析及处理——安装&运行问题


    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">原文链接:https://www.gbase.cn/community/post/4295</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">更加多</span>精彩内容<span style="color: black;">都在</span>南大通用GBase技术社区,南大通用致力于<span style="color: black;">作为</span>用户最信赖的数据库<span style="color: black;">制品</span>供应商。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">南大通用GBase 8c数据库是一款<span style="color: black;">拥有</span>多模多态特性的高性能企业级分布式数据库,支持行存、列存、内存等多种存储模式以及单机、主备、分布式<span style="color: black;">安排</span>形态。GBase 8c数据库安装包从南大通用官网获取,<span style="color: black;">位置</span>为https://www.gbase.cn/download/gbase-8c?category=INSTALL_PACKAGE</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">GBase 8c <span style="color: black;">做为</span>一款高性能的分布式数据库管理系统,广泛应用于大数据处理和实时分析<span style="color: black;">行业</span>。然而,在<span style="color: black;">实质</span>应用过程中,<span style="color: black;">因为</span><span style="color: black;">各样</span><span style="color: black;">原由</span>可能会<span style="color: black;">显现</span><span style="color: black;">各样</span>故障。本文将对GBase 8c分布式场景下安装和运行过程中<span style="color: black;">平常</span>的故障进行分析,并<span style="color: black;">供给</span>相应的处理<span style="color: black;">办法</span>,以期为<span style="color: black;">运用</span>GBase 8c的用户<span style="color: black;">供给</span>参考。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">1、</span>安装过程中的<span style="color: black;">平常</span>故障及处理</h1>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">1.1 安装报错:Failed to start instance</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">问题描述:在安装GBase 8c时,可能会<span style="color: black;">显现</span>“Failed to start instance. Error: Please check the gs_ctl log for failure details.”的错误。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:</p><span style="color: black;">检测</span>系统配置:<span style="color: black;">检测</span>/etc/sysctl.conf中的kernel.shmmax配置<span style="color: black;">是不是</span>过小。若过小,需添加或修改该<span style="color: black;">行径</span>kernel.shmmax = 18446744073692774399,并执行sysctl -p使其生效。<span style="color: black;">处理</span>端口占用:<span style="color: black;">倘若</span>集群卸载后进程还在占用端口,<span style="color: black;">能够</span><span style="color: black;">思虑</span>重启<span style="color: black;">设备</span>或<span style="color: black;">运用</span><span style="color: black;">关联</span>命令如kill来杀掉<span style="color: black;">关联</span>进程。<h1 style="color: black; text-align: left; margin-bottom: 10px;">1.2 Failed to initialize instance</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">问题描述:初始化实例失败。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:</p><span style="color: black;">检测</span>配置文件:仔细<span style="color: black;">检测</span>gbase.yml文件,<span style="color: black;">保证</span>文件格式正确,两个空格为一个缩进。<span style="color: black;">能够</span><span style="color: black;">运用</span>在线YML编辑器进行校验。<span style="color: black;">检测</span>互信操作:<span style="color: black;">保证</span>各节点间的互信操作配置成功,<span style="color: black;">能够</span><span style="color: black;">运用</span>ssh命令进行测试。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">配置互信时,可提前修改<span style="color: black;">设备</span>主机名<span style="color: black;">叫作</span>,<span style="color: black;">以避免</span>在互信配置时<span style="color: black;">显现</span>异常。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">1.3 端口被占用</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">问题描述:安装过程中提示端口被占用。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:<span style="color: black;">运用</span>lsof -i:端口号命令查看哪个进程占用了该端口,并关闭相应的进程或<span style="color: black;">运用</span>其他端口。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">1.4 配置文件错误</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">问题描述:在预<span style="color: black;">检测</span>或安装过程中,因配置文件错误<span style="color: black;">引起</span>失败。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">检测</span>cluster_config.xml配置文件<span style="color: black;">是不是</span>完整,<span style="color: black;">尤其</span>是<span style="color: black;">是不是</span>缺失了&lt;ROOT&gt;标签。<span style="color: black;">保证</span>/etc/hosts文件中的IP和hostname与配置文件中的一致,<span style="color: black;">尤其</span>是双网卡环境或集群配置变更后。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">1.5、ubantu安装数据库时报错</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">报错信息:如下方<span style="color: black;">实质</span>操作场景,报错:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="//q6.itc.cn/images01/20240815/3a15721fbaed437795964d994471c8c9.png" style="width: 50%; margin-bottom: 20px;"></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">更换 bash,<span style="color: black;">运用</span>以下命令:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">sudo dp<span style="color: black;">公斤</span>-reconfigure dash</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">并在过程中,<span style="color: black;">选取</span> No 并回车。退出后<span style="color: black;">就可</span>自动切换为 bash。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">2、</span>运行过程中的<span style="color: black;">平常</span>故障及处理</h1>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">2.1 Rpc request failed</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">问题描述:在运行过程中,可能<span style="color: black;">显现</span>Rpc请求失败的错误,如“Rpc request failed:Coordinator cnl start failed”。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:</p><span style="color: black;">检测</span>内存和磁盘空间:<span style="color: black;">运用</span>free -m查看内存<span style="color: black;">是不是</span>足够,并<span style="color: black;">检测</span>磁盘空间<span style="color: black;">是不是</span>充足。若不足,需清理空间或<span style="color: black;">增多</span>内存。查看数据库日志:<span style="color: black;">仔细</span>查看数据库运行日志,寻找错误<span style="color: black;">原由</span>,可能是磁盘空间不足或其他资源限制。<h1 style="color: black; text-align: left; margin-bottom: 10px;">2.2 权限不足</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">问题描述:在切换用户或执行特定命令时,因权限不足<span style="color: black;">引起</span>失败。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:</p><span style="color: black;">保证</span>目录和文件的权限设置正确。例如,<span style="color: black;">倘若</span>gbase用户<span style="color: black;">没法</span><span style="color: black;">拜访</span>/var/log/gbase目录,<span style="color: black;">能够</span><span style="color: black;">运用</span>chown gbase:gbase -R /var/log/gbase/命令修改权限。切换用户时,<span style="color: black;">尽可能</span><span style="color: black;">运用</span>su - gbase而不是su gbase,以加载gbase用户的环境变量。<h1 style="color: black; text-align: left; margin-bottom: 10px;">2.3 集群已安装</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">问题描述:执行安装命令时提示集群已安装。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span><span style="color: black;">办法</span>:</p>清理环境变量。<span style="color: black;">检测</span><span style="color: black;">每一个</span>节点的~/.bashrc文件,确认GAUSS_ENV环境变量<span style="color: black;">是不是</span>设置错误,将其重置或删除后重新执行安装命令。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">env|grep GAUSS_ENV</p>卸载已安装的集群,<span style="color: black;">能够</span><span style="color: black;">运用</span>gha_ctl uninstall和gha_ctl destroy dcs命令。<h1 style="color: black; text-align: left; margin-bottom: 10px;">2.4 current transaction is aborted</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">报错信息:<span style="color: black;">实质</span>环境报错信息如下:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="//q8.itc.cn/images01/20240815/245b7f0b1a66412eafc34c6a5edcb439.png" style="width: 50%; margin-bottom: 20px;"></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这个错误的意思,是前面的事务失败了,需要先回滚,<span style="color: black;">而后</span><span style="color: black;">才可</span>执行新的语句。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">处理</span>办法:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">有两种可能的<span style="color: black;">原由</span>:</p>前面<span style="color: black;">运用</span>begin手动开启了事务,此时需要手动执行rollback命令关闭事务。前面会话中设置autocommit 为 off,<span style="color: black;">引起</span>数据库隐式<span style="color: black;">起始</span>了事务,此时<span style="color: black;">亦</span>需要手动rollback。<h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">3、</span>总结</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">GBase 8c在安装和运行过程中可能会遇到多种故障,但<span style="color: black;">经过</span><span style="color: black;">仔细</span>的问题分析和适当的处理<span style="color: black;">办法</span>,大<span style="color: black;">都数</span>问题都<span style="color: black;">能够</span>得到<span style="color: black;">处理</span>。本文从安装过程中的<span style="color: black;">平常</span>错误和运行中的<span style="color: black;">平常</span>问题两个方面进行了<span style="color: black;">仔细</span>的故障分析和处理<span style="color: black;">办法</span>的介绍,<span style="color: black;">期盼</span>对<span style="color: black;">运用</span>GBase 8c的用户有所<span style="color: black;">帮忙</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在处理故障时,务必遵循以下原则:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1)仔细查看错误信息:错误信息<span style="color: black;">常常</span>能<span style="color: black;">供给</span><span style="color: black;">处理</span>问题的关键线索。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">2)<span style="color: black;">检测</span>系统配置和依赖:<span style="color: black;">保证</span>系统配置和依赖库正确无误。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">3)查看日志文件:<span style="color: black;">仔细</span>查看<span style="color: black;">关联</span>日志文件,以便更准确地定位问题。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">4)<span style="color: black;">运用</span>正确的命令和工具:在<span style="color: black;">处理</span>权限问题时,<span style="color: black;">重视</span><span style="color: black;">运用</span>正确的命令和工具。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">原文链接:https://www.gbase.cn/community/post/4295</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">更加多</span>精彩内容<span style="color: black;">都在</span>南大通用GBase技术社区,南大通用致力于<span style="color: black;">作为</span>用户最信赖的数据库<span style="color: black;">制品</span>供应商。<a style="color: black;"><span style="color: black;">返回<span style="color: black;">外链论坛: http://www.fok120.com</span>,查看<span style="color: black;">更加多</span></span></a></p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">责任编辑:网友投稿</span></p>




4zhvml8 发表于 2024-10-16 16:02:14

请问、你好、求解、谁知道等。

4lqedz 发表于 2024-10-28 02:51:53

我深受你的启发,你的话语是我前进的动力。

nykek5i 发表于 2024-10-30 01:15:17

感谢楼主的分享!我学到了很多。

m5k1umn 发表于 3 天前

你的言辞如同繁星闪烁,点亮了我心中的夜空。
页: [1]
查看完整版本: 南大通用GBase 8c分布式场景故障分析及处理——安装&运行问题