Écrire un Logistic Regression avec python

用一条直线对数据进行拟合的过程称为回归。逻辑回归分类的思想是：根据现有数据对分类边界线建立回归公式。
公式表示为：

一、梯度上升法

每次迭代所有的数据都参与计算。

for 循环次数:
　　　　　　　训练

代码如下：

import numpy as np
import matplotlib.pyplot as plt
def loadData():
 labelVec = []
 dataMat = []
 with open('testSet.txt') as f:
  for line in f.readlines():
   dataMat.append([1.0,line.strip().split()[0],line.strip().split()[1])
   labelVec.append(line.strip().split()[2])
 return dataMat,labelVec
def Sigmoid(inX):
 return 1/(1+np.exp(-inX))
def trainLR(dataMat,labelVec):
 dataMatrix = np.mat(dataMat).astype(np.float64)
 lableMatrix = np.mat(labelVec).T.astype(np.float64)
 m,n = dataMatrix.shape
 w = np.ones((n,1))
 alpha = 0.001
 for i in range(500):
  predict = Sigmoid(dataMatrix*w)
  erreur = predict-lableMatrix
  w = w - alpha*dataMatrix.T*erreur
 return w
def plotBestFit(wei,data,label):
 if type(wei).__name__ == 'ndarray':
  weights = wei
 else:
  weights = wei.getA()
 fig = plt.figure(0)
 ax = fig.add_subplot(111)
 xxx = np.arange(-3,3,0.1)
 yyy = - weights[0]/weights[2] - weights[1]/weights[2]*xxx
 ax.plot(xxx,yyy)
 cord1 = []
 cord0 = []
 for i in range(len(label)):
  if label[i] == 1:
   cord1.append(data[i][1:3])
  else:
   cord0.append(data[i][1:3])
 cord1 = np.array(cord1)
 cord0 = np.array(cord0)
 ax.scatter(cord1[:,0],cord1[:,1],c='red')
 ax.scatter(cord0[:,0],cord0[:,1],c='green')
 plt.show()
if __name__ == "__main__":
 data,label = loadData()
 data = np.array(data).astype(np.float64)
 label = [int(item) for item in label]
 weight = trainLR(data,label)
 plotBestFit(weight,data,label)

二、随机梯度上升法

1.学习参数随迭代次数调整，可以缓解参数的高频波动。
2.随机选取样本来更新回归参数，可以减少周期性的波动。

for 循环次数:
　　　 for 样本数量：
　　　　　　　更新学习速率
　　　　　　　随机选取样本
　　　　　　　训练
　　　　　　　在样本集中删除该样本

代码如下：

import numpy as np
import matplotlib.pyplot as plt
def loadData():
 labelVec = []
 dataMat = []
 with open('testSet.txt') as f:
  for line in f.readlines():
   dataMat.append([1.0,line.strip().split()[0],line.strip().split()[1])
   labelVec.append(line.strip().split()[2])
 return dataMat,labelVec
def Sigmoid(inX):
 return 1/(1+np.exp(-inX))
def plotBestFit(wei,data,label):
 if type(wei).__name__ == 'ndarray':
  weights = wei
 else:
  weights = wei.getA()
 fig = plt.figure(0)
 ax = fig.add_subplot(111)
 xxx = np.arange(-3,3,0.1)
 yyy = - weights[0]/weights[2] - weights[1]/weights[2]*xxx
 ax.plot(xxx,yyy)
 cord1 = []
 cord0 = []
 for i in range(len(label)):
  if label[i] == 1:
   cord1.append(data[i][1:3])
  else:
   cord0.append(data[i][1:3])
 cord1 = np.array(cord1)
 cord0 = np.array(cord0)
 ax.scatter(cord1[:,0],cord1[:,1],c='red')
 ax.scatter(cord0[:,0],cord0[:,1],c='green')
 plt.show()
def stocGradAscent(dataMat,labelVec,trainLoop):
 m,n = np.shape(dataMat)
 w = np.ones((n,1))
 for j in range(trainLoop):
  dataIndex = range(m)
  for i in range(m):
   alpha = 4/(i+j+1) + 0.01
   randIndex = int(np.random.uniform(0,len(dataIndex)))
   predict = Sigmoid(np.dot(dataMat[dataIndex[randIndex]],w))
   erreur = predict - labelVec[dataIndex[randIndex]]
   w = w - alpha*erreur*dataMat[dataIndex[randIndex]].reshape(n,1)
   np.delete(dataIndex,randIndex,0)
 return w
if __name__ == "__main__":
 data,label = loadData()
 data = np.array(data).astype(np.float64)
 label = [int(item) for item in label]
 weight = stocGradAscent(data,label,300) 
 plotBestFit(weight,data,label)

Troisième partie : astuces de programmation

1.extraction de chaîne

Retirer les caractères '\n', ‘\r', ‘\t', ' ‘ de la chaîne, diviser selon le symbole d'espace.

string.strip().split()

2.type de jugement

if type(secondTree[value]).__name__ == 'dict':

3.multiplication

La multiplication de deux vecteurs de types matrices numpy donne toujours un tableau

c = a*b
c
Out[66]: matrix([[ 6.830482])

Le produit de deux vecteurs de types vectoriels est un tableau à deux dimensions

b
Out[80]: 
array([[ 1.],
  [ 1.],
  [ 1.]])
a
Out[81]: array([1, 2, 3])
a*b
Out[82]: 
array([[ 1., 2., 3.],
  [ 1., 2., 3.],
  [ 1., 2., 3.]])
b*a
Out[83]: 
array([[ 1., 2., 3.],
  [ 1., 2., 3.],
  [ 1., 2., 3.]])

C'est tout pour cet article, j'espère que cela aidera à votre apprentissage, et j'espère que vous soutiendrez également le tutoriel de cri.

Déclaration : ce contenu est issu du réseau, propriété des auteurs originaux, contribué et téléversé spontanément par les utilisateurs d'Internet, ce site n'en possède pas la propriété, n'a pas été édité par l'homme, et n'assume aucune responsabilité juridique. Si vous trouvez du contenu suspect de violation de droits d'auteur, vous êtes invité à envoyer un e-mail à : notice#w3Déclaration : le contenu de cet article est issu du réseau, propriété des auteurs originaux, contenu contribué et téléversé spontanément par les utilisateurs d'Internet, ce site n'en possède pas la propriété, n'a pas été édité par l'homme, et n'assume aucune responsabilité juridique. Si vous trouvez du contenu suspect de violation de droits d'auteur, vous êtes invité à envoyer un e-mail à : notice#w

Tutoriel de base