运用Modin加速Pandas
<img src="https://p3-sign.toutiaoimg.com/pgc-image/15264580353771934ab31bc~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728101217&x-signature=HA%2BVkAK79W43Rs3neXXPKCIYHZA%3D" style="width: 50%; margin-bottom: 20px;"><p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">我<span style="color: black;">近期</span>偶然<span style="color: black;">发掘</span>了一个叫做Modin 的小型库,它声<span style="color: black;">叫作</span><span style="color: black;">能够</span>让pandas运行的更快。<span style="color: black;">她们</span>用来描述这个项目的其中一句话是:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">"<span style="color: black;">经过</span>更改一行代码来加速你的pandas工作流"</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">听起来<span style="color: black;">特别有</span>趣,<span style="color: black;">倘若</span>是真的话,那就<span style="color: black;">道理</span>重大了。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span>modin只需要导入modin来代替pandas,并且<span style="color: black;">亦</span>不需要更改你的现有代码。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">一个警告</strong>—— modin<span style="color: black;">日前</span><span style="color: black;">运用</span>panda 0.20.3(当<span style="color: black;">运用</span> pip install modin 来安装modin时,<span style="color: black;">最少</span>需要安装panda 0.20)。<span style="color: black;">倘若</span>你正在<span style="color: black;">运用</span>最新版本的pandas,并且需要以前版本中不存在的新功能,<span style="color: black;">那样</span>你可能需要<span style="color: black;">检测</span>一下modin—<span style="color: black;">或</span>尝试让它与最新版本的pandas<span style="color: black;">一块</span>工作(我还<span style="color: black;">无</span><span style="color: black;">这般</span>做)。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">安装modin:</p><img src="https://p3-sign.toutiaoimg.com/pgc-image/RJc3UzP34nsTDZ~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728101217&x-signature=3uxlAo9lmcpKcdlYg3FXCdT5tVY%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span>modin:</p><img src="https://p3-sign.toutiaoimg.com/pgc-image/RJc3UzcBMeluoy~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728101217&x-signature=eVclEWrTtuHrNYnFqEEJR4kN5ZQ%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">便是</span><span style="color: black;">这般</span>。你只需要<span style="color: black;">运用</span> import modin.pandas as pd 而不是 import pandas as pd,<span style="color: black;">而后</span>你就<span style="color: black;">能够</span><span style="color: black;">得到</span>额外的速度<span style="color: black;">优良</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">按照</span>文档,modin利用了现代<span style="color: black;">设备</span>上的多核技术,而pandas<span style="color: black;">无</span>。从<span style="color: black;">她们</span>的网站<span style="color: black;">能够</span>看到:</p><img src="https://p3-sign.toutiaoimg.com/pgc-image/RJc3Uzo5TkiovV~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728101217&x-signature=3zCeHWFkI6QnxHBcIedhnB%2BmDy8%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Modin<span style="color: black;">供给</span>的一个Read CSV基准</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在pandas中,当你做任何类型的计算的时候你每次只能<span style="color: black;">运用</span>一个核心。<span style="color: black;">然则</span><span style="color: black;">运用</span>Modin,你<span style="color: black;">能够</span><span style="color: black;">运用</span><span style="color: black;">设备</span>上的所有CPU内核。即使在read_csv中,<span style="color: black;">经过</span>有效地在<span style="color: black;">全部</span><span style="color: black;">设备</span>上分布工作,<span style="color: black;">咱们</span><span style="color: black;">亦</span>看到了巨大的成效。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span>来试一试,<span style="color: black;">瞧瞧</span>它是<span style="color: black;">怎样</span>工作的。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在这个测试中,我将尝试<span style="color: black;">她们</span>的read_csv<span style="color: black;">办法</span>,<span style="color: black;">由于</span>这是<span style="color: black;">她们</span>所强调的亮点。<span style="color: black;">针对</span>这个测试,我有一个105MB的csv文件。让<span style="color: black;">咱们</span>给pandas 和 modin计时,<span style="color: black;">瞧瞧</span>它们是<span style="color: black;">怎样</span>工作的。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span>先从pandas<span style="color: black;">起始</span>。</p><img src="https://p26-sign.toutiaoimg.com/pgc-image/RJc3V015vXgRjl~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728101217&x-signature=BklY%2FTzD8Gu0f8p3%2FqwKsM0UaM4%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">运用</span>pandas,读取一个105MB csv文件平均需要1.26秒。</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">此刻</span>,<span style="color: black;">咱们</span>来看一下modin。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在继续之前,我应该和<span style="color: black;">大众</span>分享一下,除了执行 pip install modin 之外,我还需要执行<span style="color: black;">有些</span>额外的<span style="color: black;">过程</span><span style="color: black;">才可</span>让modin正常工作。我还必须安装typing和dask。</p><img src="https://p3-sign.toutiaoimg.com/pgc-image/RJc3V0EIfwGUa1~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728101217&x-signature=AQ%2Bl2wDGTOLvGTENfwkL1UBILFM%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span>与上面完全相同的代码(除了导入modin的一点小更改—— import modin.pandas as pd)。</p><img src="https://p3-sign.toutiaoimg.com/pgc-image/RJc3VLeGXIWxSb~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728101217&x-signature=Mu%2BlRNME0lyqzyIVEwO4J6CZTqE%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">运用</span>modin时,读取一个105MB csv文件平均需要0.96秒。</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在这个示例中,<span style="color: black;">运用</span>modin,我<span style="color: black;">能够</span>从读取这个105MB csv文件的平均读取时间中节省0.3秒。这可能看起来不是<span style="color: black;">非常多</span>时间,<span style="color: black;">然则</span>节省了大约27%。想象一下,<span style="color: black;">倘若</span>你有5000个<span style="color: black;">体积</span><span style="color: black;">类似</span>的csv文件要读取,这平均<span style="color: black;">能够</span>节省1500秒的时间,仅仅在读取文件上就<span style="color: black;">能够</span>节省25分钟的时间。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Modin<span style="color: black;">运用</span>Ray 来加速pandas,<span style="color: black;">因此</span><span style="color: black;">倘若</span>你<span style="color: black;">运用</span>Ray的<span style="color: black;">有些</span>设置,可能会节省<span style="color: black;">更加多</span>的时间。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">我将在以后的<span style="color: black;">有些</span>项目中<span style="color: black;">更加多</span>的<span style="color: black;">运用</span>modin来提<span style="color: black;">有效</span>率。好好看一下它,<span style="color: black;">而后</span>告诉我你的想法。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">关联</span>链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">——https://github.com/modin-project/modin</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">——https://github.com/ray-project/ray/</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> 英文原文:https://pythondata.com/quick-tip-speed-up-pandas-using-modin/ </p>译者:一瞬
你的话语如春风拂面,温暖了我的心房,真的很感谢。 论坛的成果是显著的,但我们不能因为成绩而沾沾自喜。 你的话语如春风拂面,温暖了我的心房,真的很感谢。 这夸赞甜到心里,让我感觉温暖无比。 你的留言真是温暖如春,让我感受到了无尽的支持与鼓励。
页:
[1]