Logistic regression to diagnose heart disease

The preject source code url : Heart

load data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

data = pd.read_csv('./data/heart.csv')
# the csv url: https://github.com/hivandu/colab/blob/master/AI_Data/data/heart.csv

# Print a brief summary of the data set

data.info()
data.shape

data.target.value_counts()

The params meaning

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Params	  Meaning	
age 年龄
sex 性别(1 = 男性, 0 = 女性)
cp 胸部疼痛类型(值 1:典型心绞痛,值 2:非典型性心绞痛,值 3:非心绞痛,值 4:无症状)
trestbps 血压
chol 胆固醇
fbs 空腹血糖(> 120 mg/dl,1=真;0=假)
restecg 心电图结果(0=正常,1=患有 ST-T 波异常,2=根据 Estes 的标准显示可能或确定的左心室肥大)
thalach 最大心跳数
exang 运动时是否心绞痛(1=有过;0=没有)
oldpeak 运动相对于休息的 ST
slop 心电图 ST segment 的倾斜度(值 1:上坡,值 2:平坦,值 3:下坡)
ca 透视检查看到的血管数
thal 缺陷种类(3=正常;6=固定缺陷;7=可逆缺陷)
target 是否患病(0=否,1=是)

Perform analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

# Change the "sex" column into two columns "sex_0" and "sex_1"
sex = pd.get_dummies(data['sex'], prefix = 'sex')

# Add "sex_0" and "sex_1" to the data set.
data = pd.concat([data, sex], axis = 1)


# And delete the sex column.
data = data.drop(columns = ['sex'])


# Print out the first five lines. Check whether sex_0, sex_1 are added successfully, and whether sex is deleted successfully.
data.head()

# Get sample label
data_y = data.target.values
data_y.shape

# Get sample feature set
data_x = data.drop(['target'], axis = 1)
data_x.shape

# Divide the data set
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size = 0.3, random_state=33)

Normalization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# initialize
ss = StandardScaler()

# The fit function/module is used to train model parameters
ss.fit(train_x)

# Standardize the training set and test set
train_x = ss.transform(train_x)
test_x = ss.transform(test_x)

# Define a logistic regression model
lr = LogisticRegression()
lr.fit(train_x, train_y)

# Calculate the training set score
lr.score(train_x, train_y)

# Calculate test set score
lr.score(test_x, test_y)

# Use the classification_report function to display a text report of the main classification indicators
predict = lr.predict(test_x)
print(classification_report(test_y, predict))

Logistic regression to diagnose heart disease

https://hivan.me/heart/

作者

Hivan Du

发布于

2021-08-31

更新于

2024-01-16

许可协议

评论