English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Exemples d'opérations d'agrégation Pandas
Après avoir créé des objets rolling, extended et ewm, vous pouvez effectuer des agrégations sur les données de plusieurs manières.
Nous créons un DataFrame et appliquons-y une agrégation
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print(r)
运行结果如下:
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 0.790670 -0.387854 -0.668132 0.267283 2000-01-03 -0.575523 -0.965025 0.060427 -2.179780 2000-01-04 1.669653 1.211759 -0.254695 1.429166 2000-01-05 0.100568 -0.236184 0.491646 -0.466081 2000-01-06 0.155172 0.992975 -1.205134 0.320958 2000-01-07 0.309468 -0.724053 -1.412446 0.627919 2000-01-08 0.099489 -1.028040 0.163206 -1.274331 2000-01-09 1.639500 -0.068443 0.714008 -0.565969 2000-01-10 0.326761 1.479841 0.664282 -1.361169 Rolling [window=3,min_periods=1,center=False,axis=0]
Nous pouvons effectuer une agrégation sur tout le DataFrame en passant une fonction entière, ou en sélectionnant une colonne via la méthode getItem standard.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print(r.aggregate(np.sum))
运行结果如下:
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print(r['A'].aggregate(np.sum))
运行结果如下:
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 2000-01-01 1.088512 2000-01-02 1.879182 2000-01-03 1.303660 2000-01-04 1.884801 2000-01-05 1.194699 2000-01-06 1.925393 2000-01-07 0.565208 2000-01-08 0.564129 2000-01-09 2.048458 2000-01-10 2.065750 Freq: D, Nom: A, dtype: float64
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print(r[['A', 'B']].aggregate(np.sum))
运行结果如下:
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B 2000-01-01 1.088512 -0.650942 2000-01-02 1.879182 -1.038796 2000-01-03 1.303660 -2.003821 2000-01-04 1.884801 -0.141119 2000-01-05 1.194699 0.010551 2000-01-06 1.925393 1.968551 2000-01-07 0.565208 0.032738 2000-01-08 0.564129 -0.759118 2000-01-09 2.048458 -1.820537 2000-01-10 2.065750 0.383357
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print(r['A'].aggregate([np.sum, np.mean]))
运行结果如下:
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 somme 2000-01-01 1.088512 1.088512 2000-01-02 1.879182 0.939591 2000-01-03 1.303660 0.434553 2000-01-04 1.884801 0.628267 2000-01-05 1.194699 0.398233 2000-01-06 1.925393 0.641798 2000-01-07 0.565208 0.188403 2000-01-08 0.564129 0.188043 2000-01-09 2.048458 0.682819 2000-01-10 2.065750 0.688583
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r[['A','B']].aggregate([np.sum,np.mean])
运行结果如下:
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 1.879182 -1.038796 -3.215581 -0.299575 2000-01-03 1.303660 -2.003821 -3.155154 -2.479355 2000-01-04 1.884801 -0.141119 -0.862400 -0.483331 2000-01-05 1.194699 0.010551 0.297378 -1.216695 2000-01-06 1.925393 1.968551 -0.968183 1.284044 2000-01-07 0.565208 0.032738 -2.125934 0.482797 2000-01-08 0.564129 -0.759118 -2.454374 -0.325454 2000-01-09 2.048458 -1.820537 -0.535232 -1.212381 2000-01-10 2.065750 0.383357 1.541496 -3.201469 A B sum mean sum mean 2000-01-01 1.088512 1.088512 -0.650942 -0.650942 2000-01-02 1.879182 0.939591 -1.038796 -0.519398 2000-01-03 1.303660 0.434553 -2.003821 -0.667940 2000-01-04 1.884801 0.628267 -0.141119 -0.047040 2000-01-05 1.194699 0.398233 0.010551 0.003517 2000-01-06 1.925393 0.641798 1.968551 0.656184 2000-01-07 0.565208 0.188403 0.032738 0.010913 2000-01-08 0.564129 0.188043 -0.759118 -0.253039 2000-01-09 2.048458 0.682819 -1.820537 -0.606846 2000-01-10 2.065750 0.688583 0.383357 0.127786
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(3, 4), index = pd.date_range('1/1/2000', periods=3), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r.aggregate({'A' : np.sum,'B' : np.mean})
运行结果如下:
A B C D 2000-01-01 -1.575749 -1.018105 0.317797 0.545081 2000-01-02 -0.164917 -1.361068 0.258240 1.113091 2000-01-03 1.258111 1.037941 -0.047487 0.867371 A B 2000-01-01 -1.575749 -1.018105 2000-01-02 -1.740666 -1.189587 2000-01-03 -0.482555 -0.447078