2020年入门数据分析选取Python还是SQL?七个常用操作对比!
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/pxMrQvgWLUJZElVuEuCdN9GAPeLvVzcSnUZiaI3fVxMjFWQYHS3L1icPWZG8g98H3ZFTm5jAtS0cajI5aB8pbCew/640?wx_fmt=jpeg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">作者:刘早起</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">源自</span>:早起Python</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">SQL和Python几乎是当前数据分析师<span style="color: black;">必要</span>要<span style="color: black;">认识</span>的两门语言,它们在处理数据时有什么区别?本文将分别用MySQL和pandas来展示七个在数据分析中常用的操作,<span style="color: black;">期盼</span><span style="color: black;">能够</span><strong style="color: blue;"><span style="color: black;">帮忙</span><span style="color: black;">把握</span>其中一种语言的读者快速<span style="color: black;">认识</span>另一种<span style="color: black;">办法</span></strong>!</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在阅读本文前,你<span style="color: black;">能够</span><span style="color: black;">拜访</span>下方网站下载本文<span style="color: black;">运用</span>的示例数据,并导入MySQL与pandas中,一边敲代码一边阅读!</p>https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/io/data/csv/tips.csv<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">1、</span><span style="color: black;">选取</span></span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在SQL中,<span style="color: black;">咱们</span><span style="color: black;">能够</span><span style="color: black;">运用</span>SELECT语句从表<span style="color: black;">选取</span>数据,结果被存储在一个结果表中,语法如下:</p>
<span style="color: black;">SELECT</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> column_name,column_name</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> table_name;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">倘若</span><strong style="color: blue;">不想<span style="color: black;">表示</span><span style="color: black;">所有</span>的记录</strong>,<span style="color: black;">能够</span><span style="color: black;">运用</span>TOP或LIMIT来限制行数。<span style="color: black;"><span style="color: black;">因此呢</span><span style="color: black;">选取</span>tips表中的部分列<span style="color: black;">能够</span><span style="color: black;">运用</span>下面的语句</span></p><span style="color: black;">SELECT</span> total_bill, tip, smoker, <span style="color: black;">time</span><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tips</p><span style="color: black;">LIMIT</span> <span style="color: black;">5</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDf12o4Hyys74TPIJ6LiczicrfEZYlyIK2mBicYPia4PWtK3Af38IWSDpYxQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">而在pandas中,<span style="color: black;">咱们</span><span style="color: black;">能够</span><span style="color: black;">经过</span><strong style="color: blue;">将列名列表传递给DataFrame</strong>来完成列<span style="color: black;">选取</span><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDUuPQc2M1LwNtmIXH9jFGoYrGlNLFnfcnTsLbxF2OLZ5xLors0gLIew/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">在SQL中,进行<span style="color: black;">选取</span>的<span style="color: black;">同期</span>还<span style="color: black;">能够</span>进行计算,<span style="color: black;">例如</span>添加一列</p><span style="color: black;">SELECT</span> *, tip/total_bill <span style="color: black;">as</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tip_rate</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tips</p><span style="color: black;">LIMIT</span> <span style="color: black;">5</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDkGmfRC4jaj8HfeuJzicnFymVQoOfXRUoTsVx28NZgZr4icIibrWZgSJQQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">在pandas中<span style="color: black;">运用</span>DataFrame.assign()<span style="color: black;">一样</span><span style="color: black;">能够</span>完成这个操作<img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDRCUxff3mKdPoxRm6DuWKHUJ1znOicYv88Wice1PX5gXTXKeNJQ3fwWog/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">2、</span><span style="color: black;">查询</span></span></strong></span></p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">单<span style="color: black;">要求</span><span style="color: black;">查询</span></span></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在SQL中,WHERE子句用于提取<span style="color: black;">哪些</span>满足指定<span style="color: black;">要求</span>的记录,语法如下</p><span style="color: black;">SELECT</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> column_name,column_name</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> table_name</p><span style="color: black;">WHERE</span> column_name <span style="color: black;">operator</span> <span style="color: black;">value</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">例如</span><span style="color: black;">查询</span>示例数据中time = dinner的记录</p><span style="color: black;">SELECT</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> *</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tips</p><span style="color: black;">WHERE</span> <span style="color: black;">time</span> = <span style="color: black;">Dinner</span><span style="color: black;">LIMIT</span> <span style="color: black;">5</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoD3ON7W4wNicIibeBFH4cZ5akxxI6vMKY5sSTSoa16VwkT09rJicCll8S0w/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">而在pandas中,<span style="color: black;">根据</span><span style="color: black;">要求</span>进行<span style="color: black;">查询</span>则<span style="color: black;">能够</span>有多种形式,<span style="color: black;">例如</span><span style="color: black;">能够</span><strong style="color: blue;">将含有True/False的Series对象传递给DataFrame,并返回所有带有True的行</strong><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDyoPE60crhhItfXTBCx349KjOrXiajMuvENnwsHCNm7Gk8uCyuibVZgSQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">多<span style="color: black;">要求</span><span style="color: black;">查询</span></span></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在SQL中,进行多<span style="color: black;">要求</span><span style="color: black;">查询</span><span style="color: black;">能够</span><span style="color: black;">运用</span>AND/OR来完成</p><span style="color: black;">SELECT</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> *</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tips</p><span style="color: black;">WHERE</span> <span style="color: black;">time</span> = <span style="color: black;">Dinner</span> <span style="color: black;">AND</span> tip > <span style="color: black;">5.00</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDicFtpxKz6vzI584DNNq2u3LX06V5R9bicG4GicMS3ahIsVs65N4Z0oaFg/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">在pandas中<span style="color: black;">亦</span>有类似的操作<img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoD8VOaiaBJpEGJlXqqpGFvAToJUZRdxpt6OJeza7V7zjt5yntHsRwtukQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;"><span style="color: black;">查询</span>空值</span></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在pandas<span style="color: black;">检测</span>空值是<span style="color: black;">运用</span>notna()和isna()<span style="color: black;">办法</span>完成的。</p>frame[frame[<span style="color: black;">col1</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">].notna()]</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在SQL中<span style="color: black;">能够</span><span style="color: black;">运用</span>IS NULL和IS NOT NULL完成</p><span style="color: black;">SELECT</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> *</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> frame</p><span style="color: black;">WHERE</span> col2 <span style="color: black;">IS</span> <span style="color: black;">NULL</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p><span style="color: black;">SELECT</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> *</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> frame</p><span style="color: black;">WHERE</span> col1 <span style="color: black;">IS</span> <span style="color: black;">NOT</span> <span style="color: black;">NULL</span>;<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">3、</span>更新</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在SQL中<span style="color: black;">运用</span>UPDATE</p><span style="color: black;">UPDATE</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tips</p><span style="color: black;">SET</span> tip = tip*<span style="color: black;">2</span><span style="color: black;">WHERE</span> tip < <span style="color: black;">2</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">而在pandas中则有多种<span style="color: black;">办法</span>,<span style="color: black;">例如</span><span style="color: black;">运用</span>loc函数</p>tips.loc < <span style="color: black;">2</span>, <span style="color: black;">tip</span>] *= <span style="color: black;">2</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">4、</span>删除</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在SQL中<span style="color: black;">运用</span>DELETE</p><span style="color: black;">DELETE</span> <span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tips</p><span style="color: black;">WHERE</span> tip > <span style="color: black;">9</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在pandas中,<span style="color: black;">咱们</span><span style="color: black;">选取</span>应<span style="color: black;">保存</span>的行,而不是删除它们</p>tips = tips.loc <= <span style="color: black;">9</span>]<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">5、</span>分组</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在pandas中,<span style="color: black;">运用</span>groupby()<span style="color: black;">办法</span>实现分组。groupby()<span style="color: black;">一般</span><span style="color: black;">指的是</span>一个过程,在该过程中,<span style="color: black;">咱们</span><span style="color: black;">期盼</span>将数据集分为几组,应用某些功能(<span style="color: black;">一般</span>是聚合),<span style="color: black;">而后</span>将各组组合在<span style="color: black;">一块</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">平常</span>的SQL操作是获取<span style="color: black;">全部</span>数据集中<span style="color: black;">每一个</span>组中的记录数。例如,<span style="color: black;">经过</span><strong style="color: blue;">对性别进行分组<span style="color: black;">查找</span></strong></p><span style="color: black;">SELECT</span> sex, <span style="color: black;">count</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">(*)</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> tips</p><span style="color: black;">GROUP</span> <span style="color: black;">BY</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> sex;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDeicfNg7HWV38icueNuA3mfoMu62IAxhia9YGDYuciaZBpgicQvnezk4rk9Q/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">在pandas中的等价操<span style="color: black;">做为</span><img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoDMhOjOY0jpiaUTUboSKgsibQNbpgEjKc7THAoPhz7amXT8ZfjhEzqUaZQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"><strong style="color: blue;"><span style="color: black;">重视</span></strong>,在上面代码中,<span style="color: black;">咱们</span><strong style="color: blue;"><span style="color: black;">运用</span>size()而不是count()</strong> 这是<span style="color: black;">由于</span>count()将函数应用于每一列,并返回每一列中非空记录的数量!</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">6、</span>连接</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在pandas<span style="color: black;">能够</span><span style="color: black;">运用</span>join()或merge()进行连接,每种<span style="color: black;">办法</span>都有参数,可让指定要执行的联接类型(LEFT,RIGHT,INNER,FULL)或要联接的列。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">此刻</span>让<span style="color: black;">咱们</span>重新创建两组示例数据,分别用代码来演示<span style="color: black;">区别</span>的连接</p>df1 = pd.DataFrame({<span style="color: black;">key</span>: [<span style="color: black;">A</span>, <span style="color: black;">B</span>, <span style="color: black;">C</span>, <span style="color: black;">D</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">],</p> ....: <span style="color: black;">value</span>: np.random.randn(<span style="color: black;">4</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">)})</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> ....: </p>df2 = pd.DataFrame({<span style="color: black;">key</span>: [<span style="color: black;">B</span>, <span style="color: black;">D</span>,<span style="color: black;">D</span>, <span style="color: black;">E</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">],</p> ....: <span style="color: black;">value</span>: np.random.randn(<span style="color: black;">4</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">)})</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">内连接</span></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">内联接<span style="color: black;">运用</span>比较运算符<span style="color: black;">按照</span><span style="color: black;">每一个</span>表共有的列的值匹配两个表中的行,在SQL中实现内连接<span style="color: black;">运用</span>INNER JOIN</p><span style="color: black;">SELECT</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> *</p><span style="color: black;">FROM</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> df1</p><span style="color: black;">INNER</span> <span style="color: black;">JOIN</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> df2</p> <span style="color: black;">ON</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> df1.key = df2.key;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在pandas中<span style="color: black;">能够</span><span style="color: black;">运用</span>merge()<img src="https://mmbiz.qpic.cn/mmbiz_png/2GcSFhuAFlCFrTuuAoLSReVN9jCDDpoD1J9CCSBIzHlJXOYDjmEVU79Fo14jgde4sicBuP3bPV4Uib04nx6KXjuw/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">merge()<span style="color: black;">供给</span>了<span style="color: black;">有些</span>参数,<span style="color: black;">能够</span>将一个DataFrame的列与另一个DataFrame的索引连接在<span style="color: black;">一块</span> 楼主听话,多发外链好处多,快到碗里来!外链论坛 http://www.fok120.com/ 对于这个问题,我有不同的看法... 你的努力一定会被看见,相信自己,加油。
页:
[1]