Traitement de texte Pandas

Exemples d'opérations de traitement de texte Pandas

Dans ce chapitre, nous utiliserons les Series de base / L'index discute des opérations sur les chaînes de caractères. Dans les chapitres suivants, nous apprendrons comment appliquer ces fonctions de chaîne de caractères sur les DataFrame.

Pandas fournit un ensemble de fonctions de chaîne de caractères qui permettent de manipuler facilement les données de chaîne. Ce qui est le plus important, ces fonctions ignorent (ou excluent) les valeurs manquantes/ Valeur NaN.

Presque toutes ces méthodes peuvent être utilisées pour les fonctions de chaîne de caractères en Python (voir : https://docs.python.org/3/library/stdtypes.html#string-methods)。因此，将Series对象转换为String对象，然后执行该操作。

我们看看每个操作如何执行。

方法	说明
lower()	将系列/索引中的字符串转换为小写。
upper()	将系列/索引中的字符串转换为大写。
len()	计算字符串length()。
strip()	帮助从两侧从系列/索引中的每个字符串中去除空格（包括换行符）。
split(' ')	用给定的模式分割每个字符串。
cat(sep=' ')/td>	用给定的分隔符连接系列/索引元素。
get_dummies()	返回具有一键编码值的DataFrame。
contains(pattern)	如果子字符串包含在元素中，则为每个元素返回一个布尔值True，否则返回False。
replace(a,b)	a值替换成b。
repeat(value)	以指定的次数重复每个元素。
count(pattern)	返回每个元素中模式出现的次数。
startswith(pattern)	如果系列/索引中的元素以模式开头，则返回true。
endswith(pattern)	如果系列/索引中的元素以模式结尾，则返回true。
find(pattern)	返回模式首次出现的第一个位置。
findall(pattern)	返回所有出现的模式的列表。
swapcase	大小写互换
islower()<	检查“系列/索引”中每个字符串中的所有字符是否都小写。返回布尔值
isupper()	检查“系列/索引”中每个字符串中的所有字符是否都大写。返回布尔值。
isnumeric()	检查“系列/索引”中每个字符串中的所有字符是否都是数字。返回布尔值。

我们来创建一个Series，看看以上所有功能如何工作。

Exemple

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert', np.nan, '1234','SteveSmith'])
　print s

Résultat de l'exécution :

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　4　NaN
　5　1234
　6　Steve　Smith
　dtype:　object

lower()

Exemple

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert', np.nan, '1234','SteveSmith'])
　print　s.str.lower()

Résultat de l'exécution :

　0　tom
　1　william　rick
　2　john
　3　alber@t
　4　NaN
　5　1234
　6　steve　smith
　dtype:　object

upper()

Exemple

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert', np.nan, '1234','SteveSmith'])
　print　s.str.upper()

Résultat de l'exécution :

　0　TOM
　1　WILLIAM　RICK
　2　JOHN
　3　ALBER@T
　4　NaN
　5　1234
　6　STEVE　SMITH
　dtype:　object

len()

Exemple

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert', np.nan, '1234','SteveSmith'])
　imprimer s.str.len()

Résultat de l'exécution :

　0　3.0
　1　12.0
　2　4.0
　3　7.0
　4　NaN
　5　4.0
　6　10.0
　dtype : float64

strip()

Exemple

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s
　imprimer ("Après Élimination des espaces en début et en fin de chaîne :")
　imprimer s.str.strip()

Résultat de l'exécution :

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object
　Après Élimination des espaces en début et en fin de chaîne :
　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object

split(pattern)

Exemple

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s
　print ("Split Pattern:")
　print s.str.split('　')

Résultat de l'exécution :

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object
　Split Pattern:
　0    [Tom, , , , , , , , , , ]
　1　[, , , , , William, Rick]
　2　[John]
　3　[Alber@t]
　dtype:　object

cat(sep=pattern)

Exemple

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.cat(sep='_')

Résultat de l'exécution :

　　　Tom _ William Rick_John_Alber@t

get_dummies()

Exemple

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.get_dummies()

Résultat de l'exécution :

　　　William Rick        Alber@t        John        Tom
0        0        0        0        0        0        0        0        0　　　　　1
1　　　　　　　　　　　　　1　　　　　　　　　0        0        0        0
2　　　　　　　　　　　　　0        0        0　　　　　　1　　　　　0
3　　　　　　　　　　　　　0　　　　　　　　　1　　　　　　0        0

contains ()

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.contains('　')

Résultat de l'exécution :

　0    True
　1　　True
　2　　False
　3　　False
　dtype:　bool

replace(a,b)

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s
　print ("After replacing @ with $:")
　print s.str.replace('@',')
　)

Résultat de l'exécution :

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object
　After replacing @ with $:
　0    Tom
　1　William Rick
　2　John
　3　Alber$t
　dtype:　object

repeat(value)

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.repeat(2)

Résultat de l'exécution :

0        Tom        Tom
1　　　William Rick        William Rick
2　　　　　　　　　　　　　　　　　　JohnJohn
3　　　　　　　　　　　　　　　　　　Alber@tAlber@t
dtype:　object

count(pattern)

Exemple

　import　pandas　as　pd
　　
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print ("Number of 'm' in each string:")
　print s.str.count('m')

Résultat de l'exécution :

　Number of 'm' in each string:
　0　1
　1　1
　2　0
　3　0

startswith(pattern)

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print ("Strings that start with 'T':")
　print s.str.startwith('T')

Résultat de l'exécution :

　0    True
　1　　False
　2　　False
　3　　False
　dtype:　bool

endswith(pattern)

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print ("Strings that end with 't':")
　print s.str.endswith('t')

Résultat de l'exécution :

　Chains de caractères se terminant par 't':
　0　　False
　1　　False
　2　　False
　3　　True
　dtype:　bool

find(pattern)

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print　s.str.find('e')

Résultat de l'exécution :

　0　-1
　1　-1
　2　-1
　3　3
　dtype:　int64

“ -1”表示元素中没有匹配到。

findall(pattern)

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print　s.str.findall('e')

Résultat de l'exécution :

　0　[]
　1　[]
　2　[]
　3　[e]
　dtype:　object

Une liste vide ([]) signifie que l'élément ne contient pas de correspondance

swapcase()

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.swapcase()

Résultat de l'exécution :

　0　tOM
　1　wILLIAM　rICK
　2　jOHN
　3　aLBER@T
　dtype:　object

islower()

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.islower()

Résultat de l'exécution :

　0　　False
　1　　False
　2　　False
　3　　False
　dtype:　bool

isupper()

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.isupper()

Résultat de l'exécution :

　0　　False
　1　　False
　2　　False
　3　　False
　dtype:　bool

isnumeric()

Exemple

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.isnumeric()

Résultat de l'exécution :

　0　　False
　1　　False
　2　　False
　3　　False
　dtype:　bool

Opérations SQL Pandas Tri Pandas

Tutoriel Pandas

Traitement de texte Pandas

lower()

upper()

len()

strip()

split(pattern)

cat(sep=pattern)

get_dummies()

contains ()

replace(a,b)

repeat(value)

count(pattern)

startswith(pattern)

endswith(pattern)

find(pattern)

findall(pattern)

swapcase()

islower()

isupper()

isnumeric()