機(jī)器學(xué)習(xí)基礎(chǔ)——matplotlib.pyplot和seaborn的使用

import matplotlib.pyplot as plt
import numpy as np

第一步 生成數(shù)據(jù)集

x = np.linspace(-3,3,50)#平均采樣,[-3,3]采樣50個(gè)
x.shape
(50,)
y1 = 2*x + 1
y1.shape
(50,)
y2 = x**2
y2
array([9.00000000e+00, 8.28029988e+00, 7.59058726e+00, 6.93086214e+00,
       6.30112453e+00, 5.70137443e+00, 5.13161183e+00, 4.59183673e+00,
       4.08204915e+00, 3.60224906e+00, 3.15243648e+00, 2.73261141e+00,
       2.34277384e+00, 1.98292378e+00, 1.65306122e+00, 1.35318617e+00,
       1.08329863e+00, 8.43398584e-01, 6.33486047e-01, 4.53561016e-01,
       3.03623490e-01, 1.83673469e-01, 9.37109538e-02, 3.37359434e-02,
       3.74843815e-03, 3.74843815e-03, 3.37359434e-02, 9.37109538e-02,
       1.83673469e-01, 3.03623490e-01, 4.53561016e-01, 6.33486047e-01,
       8.43398584e-01, 1.08329863e+00, 1.35318617e+00, 1.65306122e+00,
       1.98292378e+00, 2.34277384e+00, 2.73261141e+00, 3.15243648e+00,
       3.60224906e+00, 4.08204915e+00, 4.59183673e+00, 5.13161183e+00,
       5.70137443e+00, 6.30112453e+00, 6.93086214e+00, 7.59058726e+00,
       8.28029988e+00, 9.00000000e+00])
plt.figure()
plt.plot(x,y1)
[<matplotlib.lines.Line2D at 0x111d0f9e8>]
output_7_1.png
plt.figure()
plt.plot(x,y2)
[<matplotlib.lines.Line2D at 0x111da3860>]
output_8_1.png
plt.plot(x,y2)
plt.show()
output_9_0.png

# 將x,y1,y2畫在一起
plt.plot(x,y1)
plt.plot(x,y2)
[<matplotlib.lines.Line2D at 0x111d60fd0>]
output_11_1.png

支持中文字體

from pylab import mpl#import matplotlib as mpl
mpl.rcParams['font.sans-serif']=['FangSong']
mpl.rcParams['axes.unicode_minus']=False
# 參數(shù)修改
plt.plot(x,y1,'.b')
plt.plot(x,y2,color='r',linewidth=5.0,linestyle=':')#linestyle取值:"-",”-.“,":".該變量是復(fù)合變量也可以省略字段直接寫”.r“
[<matplotlib.lines.Line2D at 0x111f908d0>]
output_14_1.png
##label標(biāo)記
plt.plot([1,2,3,4],[2,3,3,3])
plt.ylabel('Some Num')
plt.xlabel('自變量')#默認(rèn)不支持中文字體
Text(0.5,0,'自變量')
output_16_1.png

散點(diǎn)圖

plt.plot([1,2,3,4],[2,3,3,3],'g^')
[<matplotlib.lines.Line2D at 0x1121b4080>]
output_18_1.png

常用的linestyle

ro:紅色的圓點(diǎn)

bs:藍(lán)色的方塊

g^:綠色的三角
t=np.linspace(-5,5,100)
plt.plot(t,t**2)
plt.plot(t,t**5)
[<matplotlib.lines.Line2D at 0x1121159e8>]
output_21_1.png
plt.plot(t,t**2,'r--',t,t**5,'y-.')#多個(gè)函數(shù)圖杖狼,可以合并為一個(gè)函數(shù)馒胆,但是要求(自變量,因變量介杆,style字段)
[<matplotlib.lines.Line2D at 0x112327390>,
 <matplotlib.lines.Line2D at 0x1123274e0>]
output_22_1.png

結(jié)構(gòu)化數(shù)據(jù)繪制散點(diǎn)圖

np.arange(50)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
data = {
    'a':np.arange(50),
    'c':np.random.randint(0,50,50),
    'd':np.random.rand(50)
}
data
{'a': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),
 'c': array([21, 22, 31,  1, 30, 13, 47, 19, 16, 45, 45, 34, 24, 11, 30, 49,  3,
        38, 24, 26,  9, 24, 33, 44, 48, 49,  6, 49,  8, 30, 11, 43, 16, 25,
        29, 34, 14, 21,  4, 20, 13, 46, 11, 25, 20, 39, 41, 34, 47, 36]),
 'd': array([0.03337497, 0.58555231, 0.6983719 , 0.3098672 , 0.0355206 ,
        0.27251523, 0.968375  , 0.7585922 , 0.53316131, 0.2134523 ,
        0.76735142, 0.56798347, 0.98154299, 0.07708504, 0.93535569,
        0.84546409, 0.13395731, 0.24076688, 0.44660032, 0.88671819,
        0.00921326, 0.39650877, 0.44355761, 0.30306934, 0.98691421,
        0.39195663, 0.6424303 , 0.68474638, 0.02455291, 0.90485831,
        0.7171299 , 0.18596694, 0.12510926, 0.57805232, 0.93718472,
        0.21482173, 0.02909599, 0.26395894, 0.39508085, 0.74490499,
        0.17457859, 0.93607408, 0.58727838, 0.76517609, 0.53999965,
        0.5932926 , 0.05968155, 0.70313421, 0.72178338, 0.47063122])}

plt.scatter()繪制散點(diǎn)圖

plt.scatter('a','d',data=data)
plt.xlabel('a 數(shù)據(jù)')
plt.ylabel('d 數(shù)據(jù)')
Text(0,0.5,'d 數(shù)據(jù)')
output_27_1.png
plt.scatter('a','c',data=data)
<matplotlib.collections.PathCollection at 0x1124a3d68>
output_28_1.png
data['b'] = np.abs(data['d'])
plt.scatter('a','b',data = data,marker='>',c = 'c')
<matplotlib.collections.PathCollection at 0x112557eb8>
output_30_1.png
plt.scatter('c','d',data = data,marker='>',c = 'c')
<matplotlib.collections.PathCollection at 0x1125b8198>
output_31_1.png
data
{'a': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),
 'c': array([21, 22, 31,  1, 30, 13, 47, 19, 16, 45, 45, 34, 24, 11, 30, 49,  3,
        38, 24, 26,  9, 24, 33, 44, 48, 49,  6, 49,  8, 30, 11, 43, 16, 25,
        29, 34, 14, 21,  4, 20, 13, 46, 11, 25, 20, 39, 41, 34, 47, 36]),
 'd': array([0.03337497, 0.58555231, 0.6983719 , 0.3098672 , 0.0355206 ,
        0.27251523, 0.968375  , 0.7585922 , 0.53316131, 0.2134523 ,
        0.76735142, 0.56798347, 0.98154299, 0.07708504, 0.93535569,
        0.84546409, 0.13395731, 0.24076688, 0.44660032, 0.88671819,
        0.00921326, 0.39650877, 0.44355761, 0.30306934, 0.98691421,
        0.39195663, 0.6424303 , 0.68474638, 0.02455291, 0.90485831,
        0.7171299 , 0.18596694, 0.12510926, 0.57805232, 0.93718472,
        0.21482173, 0.02909599, 0.26395894, 0.39508085, 0.74490499,
        0.17457859, 0.93607408, 0.58727838, 0.76517609, 0.53999965,
        0.5932926 , 0.05968155, 0.70313421, 0.72178338, 0.47063122]),
 'b': array([0.03337497, 0.58555231, 0.6983719 , 0.3098672 , 0.0355206 ,
        0.27251523, 0.968375  , 0.7585922 , 0.53316131, 0.2134523 ,
        0.76735142, 0.56798347, 0.98154299, 0.07708504, 0.93535569,
        0.84546409, 0.13395731, 0.24076688, 0.44660032, 0.88671819,
        0.00921326, 0.39650877, 0.44355761, 0.30306934, 0.98691421,
        0.39195663, 0.6424303 , 0.68474638, 0.02455291, 0.90485831,
        0.7171299 , 0.18596694, 0.12510926, 0.57805232, 0.93718472,
        0.21482173, 0.02909599, 0.26395894, 0.39508085, 0.74490499,
        0.17457859, 0.93607408, 0.58727838, 0.76517609, 0.53999965,
        0.5932926 , 0.05968155, 0.70313421, 0.72178338, 0.47063122])}

柱狀圖

names = ['A類型','B類型','C類型']
value = [1,10,100]
plt.bar(range(len(names)),value)
plt.xticks(range(len(names)),names)#橫坐標(biāo)
([<matplotlib.axis.XTick at 0x112650a20>,
  <matplotlib.axis.XTick at 0x112650438>,
  <matplotlib.axis.XTick at 0x1126437f0>],
 <a list of 3 Text xticklabel objects>)
output_35_1.png
plt.scatter(names,value)
<matplotlib.collections.PathCollection at 0x111f32fd0>
output_36_1.png
plt.scatter(range(len(names)),value)
plt.xticks(range(len(names)),names)
([<matplotlib.axis.XTick at 0x111f69be0>,
  <matplotlib.axis.XTick at 0x111f69518>,
  <matplotlib.axis.XTick at 0x111f692b0>],
 <a list of 3 Text xticklabel objects>)
output_37_1.png
plt.scatter(range(len(names)),value)
plt.xticks(range(len(names)),names)
plt.title('離散數(shù)據(jù)散點(diǎn)圖')
Text(0.5,1,'離散數(shù)據(jù)散點(diǎn)圖')
output_38_1.png

子圖 SubPlot

1. 講一個(gè)畫布進(jìn)行切分(Figure)

2.將切分后的圖分配到固定的位置

3.將圖可以設(shè)置成固定的大小
plt.figure(1)
plt.subplot(131)#一行三列放在第一的位置
plt.bar(names,value,color='r')
plt.subplot(235)#二行三列放在第五的位置
plt.scatter(names,value,color='y')
plt.subplot(233)#二行三列放在第三的位置
plt.plot(names,value,color='g')
plt.title("離散數(shù)據(jù)的柱狀圖惭笑,散點(diǎn)圖,折線圖")
Text(0.5,1,'離散數(shù)據(jù)的柱狀圖答姥,散點(diǎn)圖漓糙,折線圖')
output_40_1.png

第 2 部分 Seaborn的繪圖練習(xí)

道/法/術(shù)/器
import seaborn as sns
tips = sns.load_dataset('tips')
tips
# total_bill 和 

2.1 帶狀圖-離散數(shù)據(jù)和l連續(xù)數(shù)據(jù)的之間的關(guān)系

sns.stripplot(data=tips,x='day',y='total_bill',jitter = True)#jitter抖動(dòng)铣缠,默認(rèn)為TRUE

<matplotlib.axes._subplots.AxesSubplot at 0x1a16f45710>
output_47_1.png

蜂群圖-離散數(shù)據(jù)和連續(xù)數(shù)據(jù)之間的關(guān)系-密度排列

sns.swarmplot(x='day',y='total_bill',data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x1a16aadc18>
output_49_1.png
tips.head()

分析每天中 午餐和晚餐的賬單分布

2.3 Hue 分組參數(shù)

sns.swarmplot(x='day',y='total_bill',data=tips,hue='time')
<matplotlib.axes._subplots.AxesSubplot at 0x1a16ff30b8>
output_53_1.png

在每天的付賬人群中的性別分布

sns.swarmplot(x='day',y='total_bill',data=tips,hue='sex')
<matplotlib.axes._subplots.AxesSubplot at 0x1a16e2fc50>
output_55_1.png
sns.swarmplot(x='day',y='total_bill',data=tips,hue='size')
<matplotlib.axes._subplots.AxesSubplot at 0x1a1719cf60>
output_56_1.png
sns.swarmplot(x='size',y='total_bill',data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x1a1726a470>
output_57_1.png
# 上圖解釋了,pizza的不同size的基礎(chǔ)價(jià)格
# size和相關(guān)系數(shù)(皮爾遜系數(shù))的關(guān)系
tips['size'].corr(tips['total_bill'])
0.5983151309049012

2.4箱型圖

sns.boxplot('day','total_bill',data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x1a17971ef0>
output_61_1.png

sns.swarmplot('day','total_bill',data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x1a178bc390>
output_63_1.png
sns.boxplot("day","total_bill",data=tips,hue='time')
<matplotlib.axes._subplots.AxesSubplot at 0x1a17b547b8>
output_64_1.png

2.5 提琴圖

如何來表示total_bill的概率分布
sns.violinplot('day','total_bill',data=tips,hue='time')
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x1a17c781d0>
output_66_2.png
sns.violinplot('day','total_bill',data=tips,hue='time',split = True)
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x1a17ddd860>
output_67_2.png
### 多圖合成展示
sns.violinplot('day','total_bill',data=tips)
sns.swarmplot('day','total_bill',data=tips,color='w')
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x1a17eb31d0>
output_69_2.png

3.0 單一變量估計(jì)---離散型變量的統(tǒng)計(jì)

每天的交易數(shù)量

sns.countplot('day',data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x1a17f9a550>
output_73_1.png
sns.countplot('time',data=tips)
<matplotlib.axes._subplots.AxesSubplot at 0x1a17eb33c8>
output_74_1.png
sns.countplot('day',data=tips,hue='time')
<matplotlib.axes._subplots.AxesSubplot at 0x1a180f3a90>
output_75_1.png

4.0 連續(xù)型數(shù)據(jù)的核密度估計(jì)

tips.head()
sns.distplot(tips['total_bill'])
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x1a181b0780>
output_78_2.png

抵消偏度(修改接近正態(tài)分布)

sns.distplot(np.log(tips['total_bill']))
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x1a18289908>
output_80_2.png
sns.distplot(np.sqrt(tips['total_bill']))
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x1a1803e7b8>
output_81_2.png

抽取前99.5%的數(shù)據(jù)(去除離散值的方法)

np.percentile(tips['total_bill'],99.5)
48.317099999999996
tips[tips['total_bill']>48.31]

帶回歸的散點(diǎn)圖

sns.lmplot('size','total_bill',data = tips)
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<seaborn.axisgrid.FacetGrid at 0x1a18465cf8>
output_86_2.png

聯(lián)合分布(既有散點(diǎn)的特性昆禽,又有線性回歸蝗蛙,同事還有概率分布)

sns.jointplot('total_bill','tip',data=tips,kind='reg')
/anaconda3/lib/python3.7/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<seaborn.axisgrid.JointGrid at 0x1a18556940>
output_88_2.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市醉鳖,隨后出現(xiàn)的幾起案子捡硅,更是在濱河造成了極大的恐慌,老刑警劉巖盗棵,帶你破解...
    沈念sama閱讀 221,695評(píng)論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件壮韭,死亡現(xiàn)場離奇詭異北发,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)喷屋,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,569評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門琳拨,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人屯曹,你說我怎么就攤上這事狱庇。” “怎么了恶耽?”我有些...
    開封第一講書人閱讀 168,130評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵密任,是天一觀的道長。 經(jīng)常有香客問我驳棱,道長批什,這世上最難降的妖魔是什么农曲? 我笑而不...
    開封第一講書人閱讀 59,648評(píng)論 1 297
  • 正文 為了忘掉前任社搅,我火速辦了婚禮,結(jié)果婚禮上乳规,老公的妹妹穿的比我還像新娘形葬。我一直安慰自己,他們只是感情好暮的,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,655評(píng)論 6 397
  • 文/花漫 我一把揭開白布笙以。 她就那樣靜靜地躺著,像睡著了一般冻辩。 火紅的嫁衣襯著肌膚如雪猖腕。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 52,268評(píng)論 1 309
  • 那天恨闪,我揣著相機(jī)與錄音倘感,去河邊找鬼。 笑死咙咽,一個(gè)胖子當(dāng)著我的面吹牛老玛,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播钧敞,決...
    沈念sama閱讀 40,835評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼蜡豹,長吁一口氣:“原來是場噩夢(mèng)啊……” “哼!你這毒婦竟也來了溉苛?” 一聲冷哼從身側(cè)響起镜廉,我...
    開封第一講書人閱讀 39,740評(píng)論 0 276
  • 序言:老撾萬榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎愚战,沒想到半個(gè)月后娇唯,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體威根,經(jīng)...
    沈念sama閱讀 46,286評(píng)論 1 318
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,375評(píng)論 3 340
  • 正文 我和宋清朗相戀三年视乐,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了洛搀。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 40,505評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡佑淀,死狀恐怖留美,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情伸刃,我是刑警寧澤谎砾,帶...
    沈念sama閱讀 36,185評(píng)論 5 350
  • 正文 年R本政府宣布,位于F島的核電站捧颅,受9級(jí)特大地震影響景图,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜碉哑,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,873評(píng)論 3 333
  • 文/蒙蒙 一挚币、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧扣典,春花似錦妆毕、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,357評(píng)論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至湿硝,卻和暖如春薪前,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背关斜。 一陣腳步聲響...
    開封第一講書人閱讀 33,466評(píng)論 1 272
  • 我被黑心中介騙來泰國打工示括, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人蚤吹。 一個(gè)月前我還...
    沈念sama閱讀 48,921評(píng)論 3 376
  • 正文 我出身青樓例诀,卻偏偏與公主長得像,于是被迫代替她去往敵國和親裁着。 傳聞我的和親對(duì)象是個(gè)殘疾皇子繁涂,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,515評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容