A Preliminary Study of Machine Learning

Gradient

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def loss(k):
return 3 * (k ** 2) + 7 * k -10

# -b / 2a = -7 / 6

def partial(k):
return 6 * k + 7

k = ramdom.randint(-10, 10)
alpha = 1e-3 # 0.001

for i in range(1000):
k = k + (-1) * partial(k) * alpha
print(k, loss(k))

Introduction to Artificial Intelligence

The code address of this article is: Example 01

The source code is in ipynb format, and the output content can be viewed.

rule based

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import random 
from icecream import ic


#rules = """
#复合句子 = 句子 , 连词 句子
#连词 = 而且 | 但是 | 不过
#句子 = 主语 谓语 宾语
#主语 = 你| 我 | 他
#谓语 = 吃| 玩
#宾语 = 桃子| 皮球
#
#"""

rules = """
复合句子 = 句子 , 连词 复合句子 | 句子
连词 = 而且 | 但是 | 不过
句子 = 主语 谓语 宾语
主语 = 你| 我 | 他
谓语 = 吃| 玩
宾语 = 桃子| 皮球

"""

def get_grammer_by_description(description):
rules_pattern = [r.split('=') for r in description.split('\n') if r.strip()]
target_with_expend = [(t, ex.split('|')) for t, ex in rules_pattern]
grammer = {t.strip(): [e.strip() for e in ex] for t, ex in target_with_expend}

return grammer

#generated = [t for t in random.choice(grammer['句子']).split()]

#test_v = [t for t in random.choice(grammer['谓语']).split()]


def generate_by_grammer(grammer, target='句子'):
if target not in grammer: return target

return ''.join([generate_by_grammer(grammer, t) for t in random.choice(grammer[target]).split()])

if __name__ == '__main__':

grammer = get_grammer_by_description(rules)

#ic(generated)
#ic(test_v)
#ic(generate_by_grammer(grammer))
ic(generate_by_grammer(grammer, target='复合句子'))

water pouring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def water_pouring(b1, b2, goal, start=(0, 0)):
if goal in start:
return [start]

explored = set()
froniter = [[('init', start)]]

while froniter:
path = froniter.pop(0)
(x, y) = path[-1][-1]

for (state, action) in successors(x, y, b1, b2).items():
if state not in explored:
explored.add(state)

path2 = path + [(action, state)]

if goal in state:
return path2
else:
froniter.append(path2)

return []


def successors(x, y, X, Y):
return {
((0, y+x) if x + y <= Y else (x + y - Y, Y)): 'X -> Y',
((x + y, 0) if x + y <= X else (X, x + y - X)): 'X <- Y',
(X, y): '灌满 X',
(x, Y): '灌满 Y',
(0, y): '倒空 X',
(x, 0): '倒空 Y',
}


if __name__ == '__main__':
print(water_pouring(4, 9, 5))
print(water_pouring(4, 9, 5, start=(4, 0)))
print(water_pouring(4, 9, 6))

Logistic regression to diagnose heart disease

The preject source code url : Heart

load data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

data = pd.read_csv('./data/heart.csv')
# the csv url: https://github.com/hivandu/colab/blob/master/AI_Data/data/heart.csv

# Print a brief summary of the data set

data.info()
data.shape

data.target.value_counts()

The params meaning

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Params	  Meaning	
age 年龄
sex 性别(1 = 男性, 0 = 女性)
cp 胸部疼痛类型(值 1:典型心绞痛,值 2:非典型性心绞痛,值 3:非心绞痛,值 4:无症状)
trestbps 血压
chol 胆固醇
fbs 空腹血糖(> 120 mg/dl,1=真;0=假)
restecg 心电图结果(0=正常,1=患有 ST-T 波异常,2=根据 Estes 的标准显示可能或确定的左心室肥大)
thalach 最大心跳数
exang 运动时是否心绞痛(1=有过;0=没有)
oldpeak 运动相对于休息的 ST
slop 心电图 ST segment 的倾斜度(值 1:上坡,值 2:平坦,值 3:下坡)
ca 透视检查看到的血管数
thal 缺陷种类(3=正常;6=固定缺陷;7=可逆缺陷)
target 是否患病(0=否,1=是)

Perform analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

# Change the "sex" column into two columns "sex_0" and "sex_1"
sex = pd.get_dummies(data['sex'], prefix = 'sex')

# Add "sex_0" and "sex_1" to the data set.
data = pd.concat([data, sex], axis = 1)


# And delete the sex column.
data = data.drop(columns = ['sex'])


# Print out the first five lines. Check whether sex_0, sex_1 are added successfully, and whether sex is deleted successfully.
data.head()

# Get sample label
data_y = data.target.values
data_y.shape

# Get sample feature set
data_x = data.drop(['target'], axis = 1)
data_x.shape

# Divide the data set
train_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size = 0.3, random_state=33)

Normalization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# initialize
ss = StandardScaler()

# The fit function/module is used to train model parameters
ss.fit(train_x)

# Standardize the training set and test set
train_x = ss.transform(train_x)
test_x = ss.transform(test_x)

# Define a logistic regression model
lr = LogisticRegression()
lr.fit(train_x, train_y)

# Calculate the training set score
lr.score(train_x, train_y)

# Calculate test set score
lr.score(test_x, test_y)

# Use the classification_report function to display a text report of the main classification indicators
predict = lr.predict(test_x)
print(classification_report(test_y, predict))

Foundation of Artificial Intelligence - Lecture 1

Algorithm --> Data Structure

No obvious solution ==> Algorithm engineers do it If there is a clear implementation path ==> the person who develops the project will do it

What's the Algorithm?

{Ace of hearts, 10 of spades, 3 of spades, 9 of hearts, 9 clubs, 4 of diamonds, J}

First: Hearts> Diamonds> Spades> Clubs Second: Numbers are arranged from small to large

  1. Some people put the colors together first
  2. Some people arrange the size first, and extract the colors one by one

\[ 1024 --> 10^3 --> 1k \] \[ 1024 * 1024 --> 10^6 --> 1M \] \[ 1024 * 1024 * 1024 --> 10^9 --> 1G \]

1
2
3
4
5
6
struction-0  00011101
struction-1 00011111
struction-2 00011100
struction-3 00011101
struction-4 00011100
struction-5 00011001

2.6G Hz

1
2
3
4
5
def fac(n): # return n!
if n == 1:
return 1 # 返回操作
else:
return n * fac(n-1) # 乘法操作 + 返回操作 + 函数调用
1
2
3
4
5
6
7
8
9
10
fac(1)
> 1

fac(100)
> 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

fac_100 = """93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000"""

len(fac_100)
> 158
1
2
3
4
5
?? N --> fac(n)
# 乘法操作 + 返回操作 + 函数调用
?? (N - 1)--> fac(n-1)
?? N == 100 fac(N)
??? 99
1
2
3
4
Object ` N --> fac(n)` not found.
Object ` (N - 1)--> fac(n-1)` not found.
Object ` N == 100 fac(N)` not found.
Object `? 99` not found.

\[ Time(N) - Time(N-1) = constant \] \[ Time(N-1) - Time(N-2) = constant \] \[ Time(N-2) - Time(N-3) = constant \] \[ Time(2) - Time(1) = constant \] \[ Time(N) - Time(1) == (N-1)constant \] \[ Time(N) == (N-1)constant + Time(1) \] \[ Time(N) == N * constant + (Time(1) - constant) \]

SVM-based Text Classification in Practice

The source code: SVM-based Text Classification in Practice

'cnews.train.txt' data cannot be uploaded because it is too large, so it needs to be decompressed and imported after compression.

Use SVM to implement a simple text classification based on bag of words and support vector machine.

import data

1
2
3
4
# import
import codecs
import os
import jieba

Chinese news data is prepared as a sample data set. The number of training data is 50,000 and the number of test data is 10,000. All data is divided into 10 categories: sports, finance, real estate, home furnishing, education, technology, fashion, current affairs, games and entertainment . From the training text, you can load the code, view the data format and samples:

1
2
3
4
5
6
7
8
9
10
11

data_train = './data/cnews.train.txt' # training data file name
data_test = './data/cnews.test.txt' # test data file name
vocab = './data/cnews.vocab.txt' # dictionary

with codecs.open(data_train, 'r', 'utf-8') as f:
lines = f.readlines()

# print sample content
label, content = lines[0].strip('\r\n').split('\t')
content

Take the first item of the training data as an example to segment the loaded news data. Here I use the word segmentation function of LTP, you can also use jieba, and the segmentation results are displayed separated by "/" symbols.

1
2
3
# print word segment results
segment = jieba.cut(content)
print('/'.join(segment))

To sort out the above logic a bit, implement a class to load training and test data and perform word segmentation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# cut data
def process_line(idx, line):
data = tuple(line.strip('\r\n').split('\t'))
if not len(data)==2:
return None
content_segged = list(jieba.cut(data[1]))
if idx % 1000 == 0:
print('line number: {}'.format(idx))
return (data[0], content_segged)

# data loading method
def load_data(file):
with codecs.open(file, 'r', 'utf-8') as f:
lines = f.readlines()
data_records = [process_line(idx, line) for idx, line in enumerate(lines)]
data_records = [data for data in data_records if data is not None]
return data_records

# load and process training data
train_data = load_data(data_train)
print('first training data: label {} segment {}'.format(train_data[0][0], '/'.join(train_data[0][1])))
# load and process testing data
test_data = load_data(data_test)
print('first testing data: label {} segment {}'.format(test_data[0][0], '/'.join(test_data[0][1])))

After spending some time on word segmentation, you can start building a dictionary. The dictionary is built from the training set and sorted by word frequency.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def build_vocab(train_data, thresh):
vocab = {'<UNK>': 0}
word_count = {} # word frequency
for idx, data in enumerate(train_data):
content = data[1]
for word in content:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
word_list = [(k, v) for k, v in word_count.items()]
print('word list length: {}'.format(len(word_list)))
word_list.sort(key = lambda x : x[1], reverse = True) # sorted by word frequency
word_list_filtered = [word for word in word_list if word[1] > thresh]
print('word list length after filtering: {}'.format(len(word_list_filtered)))
# construct vocab
for word in word_list_filtered:
vocab[word[0]] = len(vocab)
print('vocab size: {}'.format(len(vocab))) # vocab size is word list size +1 due to unk token
return vocab

vocab = build_vocab(train_data, 1)

In addition, according to category, we know that the label itself also has a "dictionary":

1
2
3
4
5
6
7
8
9
10
def build_label_vocab(cate_file):
label_vocab = {}
with codecs.open(cate_file, 'r', 'utf-8') as f:
for lines in f:
line = lines.strip().split('\t')
label_vocab[line[0]] = int(line[1])
return label_vocab

label_vocab = build_label_vocab('./data/cnews.category.txt')
print(f'label vocab: {label_vocab}')

Next, construct the id-based training and test sets, because we only consider the bag of words, so the order of words is excluded. Constructed to look like libsvm can eat. Note that because the bag of word model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def construct_trainable_matrix(corpus, vocab, label_vocab, out_file):
records = []
for idx, data in enumerate(corpus):
if idx % 1000 == 0:
print('process {} data'.format(idx))
label = str(label_vocab[data[0]]) # label id
token_dict = {}
for token in data[1]:
token_id = vocab.get(token, 0)
if token_id in token_dict:
token_dict[token_id] += 1
else:
token_dict[token_id] = 1
feature = [str(int(k) + 1) + ':' + str(v) for k,v in token_dict.items()]
feature_text = ' '.join(feature)
records.append(label + ' ' + feature_text)

with open(out_file, 'w') as f:
f.write('\n'.join(records))

construct_trainable_matrix(train_data, vocab, label_vocab, './data/train.svm.txt')
construct_trainable_matrix(test_data, vocab, label_vocab, './data/test.svm.txt')

Training process

The remaining core model is simple: use libsvm to train the support vector machine, let your svm eat the training and test files you have processed, and then use the existing method of libsvm to train, we can change different parameter settings . The documentation of libsvm can be viewed here, where the "-s, -t, -c" parameters are more important, and they decide what you choose Svm, your choice of kernel function, and your penalty coefficient.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from libsvm import svm
from libsvm.svmutil import svm_read_problem,svm_train,svm_predict,svm_save_model,svm_load_model

# train svm
train_label, train_feature = svm_read_problem('./data/train.svm.txt')
print(train_label[0], train_feature[0])
model=svm_train(train_label,train_feature,'-s 0 -c 5 -t 0 -g 0.5 -e 0.1')

# predict
test_label, test_feature = svm_read_problem('./data/test.svm.txt')
print(test_label[0], test_feature[0])
p_labs, p_acc, p_vals = svm_predict(test_label, test_feature, model)

print('accuracy: {}'.format(p_acc))

After a period of training, we can observe the experimental results. You can change different svm types, penalty coefficients, and kernel functions to optimize the results.

Auto operation Weibo

The code address of this article is: auto operation weibo Chromedrive download: Taobao Mirror , need to be consistent with your Chrome version

auto operation weibo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from selenium import webdriver
import time
driver = webdriver.Chrome('/Applications/chromedriver')

# login weibo
def weibo_login(username, password):

# open weibo index
driver.get('https://passport.weibo.cn/signin/login')
driver.implicitly_wait(5)
time.sleep(1)

# fill the info: username, password
driver.find_element_by_id('loginName').send_keys(username)
driver.find_element_by_id('loginPassword').send_keys(password)
time.sleep(1)

# click login
driver.find_element_by_id('loginAction').click()
time.sleep(1)

# set username, password
username = 'ivandoo75@gmail.com'
password = 'ooxx'

# Mobile phone verification is required here, but still can’t log in fully automatically
weibo_login(username, password)

follow user

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def add_follow(uid):
driver.get('https://m.weibo.com/u/' + str(uid))
time.sleep(1)

# driver.find_element_by_id('follow').click()
follow_button = driver.find_element_by_xpath('//div[@class="btn_bed W_fl"]')
follow_button.click()
time.sleep(1)

# select group
group_button = driver.find_element_by_xpath('//div[@class="list_content W_f14"]/ul[@class="list_ul"]/li[@class="item"][2]')
group_button.click()
time.sleep(1)

# cancel the select
cancel_button = driver.find_element_by_xpath('//div[@class="W_layer_btn S_bg1"]/a[@class="W_btn_b btn_34px"]')
cancel_button.click()
time.sleep(1)

# 每天学点心理学 UID
uid = '1890826225'
add_follow(uid)

create text and publish

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def add_comment(weibo_url, content):
driver.get(weibo_url)
driver.implicitly_wait(5)

content_textarea = driver.find_element_by_css_selector('textarea.W.input').clear()
content_textarea = driver.find_element_by_css_selector('textarea.W.input').send_keys(content)

time.sleep(2)

comment_button = driver.find_element_by_css_selector('.W_btn_a').click()

# post the text
def post_weibo(content):
# go to the user index
driver.get('https://weibo.com')
driver.implicitly_wait(5)

# click publish button
# post_button = driver.find_element_by_css_selector('[node-type="publish"]').click()

# input content word to textarea
content_textarea = driver.find_element_by_css_selector('textarea.W_input[node-type="textEl"]').send_keys(content)
time.sleep(2)

# click publish button
post_button = driver.find_element_by_css_selector("[node-type='submit']").click()
time.sleep(1)

# comment the weibo
weibo_url = 'https://weibo.com/1890826225/HjjqSahwl'
content= 'here is Hivan du, Best wish to u.'

# auto send weibo
content = 'Learning is a belief!'
post_weibo(content)

Boston house analysis

The source code: Boston House

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Import package
# Used to load the Boston housing price data set
from sklearn.datasets import load_boston
# pandas toolkit If you are unfamiliar with pandas, you can refer to the official 10-minute tutorial: https://pandas.pydata.org/pandas-docs/stable/10min.html
import pandas as pd
import numpy as np
# seaborn for drawing
import seaborn as sns
import matplotlib.pyplot as plt
# Show drawing
%matplotlib inline


data = load_boston() # load datase

data.keys() # Fields inside data

df = pd.DataFrame(data['data'])

# Looking at the first 5 rows of the dataframe, we can see that the column names are numbers
df.head(5)

data['feature_names'] # Feature name

The Table params and chinese info

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
params	chinese info
CRIM 住房所在城镇的人均犯罪率
ZN 住房用地超过 25000 平方尺的比例
INDUS 住房所在城镇非零售商用土地的比例
CHAS 有关查理斯河的虚拟变量(如果住房位于河边则为 1,否则为 0
NOX 一氧化氮浓度
RM 每处住房的平均房间数
AGE 建于 1940 年之前的业主自住房比例
DIS 住房距离波士顿五大中心区域的加权距离
RAD 离住房最近的公路入口编号
TAX10000 美元的全额财产税金额
PTRATIO 住房所在城镇的师生比例
B 1000(Bk-0.63)^2,其中 Bk 指代城镇中黑人的比例
LSTAT 弱势群体人口所占比例
MEDV 业主自住房的中位数房价(以千美元计)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Replace numeric column names with feature names
df.columns = data['feature_names']
df.head(5)

# The target is the house price, which is also our target value. We assign the target value to the dataframe
df['price'] = data['target']
df.head(5)

# View the correlation coefficient between the feature and price, positive correlation and negative correlation
sns.heatmap(df.corr(), annot=True, fmt='.1f')

plt.scatter(df['RM'], df['price'])


plt.figure(figsize=(20, 5))

# View the data distribution display of some features and price
features = ['LSTAT', 'RM']
target = df['price']

for i, col in enumerate(features):
plt.subplot(1, len(features), i+1)
x = df[col]
y = target
plt.scatter(x, y, marker = 'o')
plt.title('{} price'.format(col))
plt.xlabel(col)
plt.ylabel('price')


# Simple example: univariate forecast price
x = df['RM']
y = df['price']

history_notes = {_x: _y for _x, _y in zip(x,y)}

history_notes[6.575]


# Find the top three prices that are closest to RM:6.57,
similary_ys = [y for _, y in sorted(history_notes.items(), key=lambda x_y: (x_y[0] - 6.57) ** 2)[:3]]
similary_ys


# Calculate the average of three
np.mean(similary_ys)

Use historical data to predict data that has never been seen before, the most direct method

K-Neighbor-Nearst

1
2
3
4
5
6
7
8
def knn(query_x, history, top_n = 3):
sorted_notes = sorted(history.items(), key = lambda x_y: (x_y[0] - query_x)**2)
similar_notes = sorted_notes[:top_n]
similar_ys = [y for _, y in similar_notes]

return np.mean(similar_ys)

knn(5.4, history_notes)

In order to obtain results faster, we hope to obtain predictive power by fitting a function

\[ f(rm) = k * rm + b \]

Random Approach

\[ Loss(k, b) = \frac{1}{n} \sum_{i \in N} (\hat{y_i} - y_i) ^ 2 \] \[ Loss(k, b) = \frac{1}{n} \sum_{i \in N} ((k * rm_i + b) - y_i) ^ 2 \]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def loss(y_hat, y):
return np.mean((y_hat - y)**2)

import random

min_loss = float('inf')

best_k, best_b = None, None


for step in range(1000):
min_v, max_v = -100, 100
k, b = random.randrange(min_v, max_v), random.randrange(min_v, max_v)
y_hats = [k * rm_i + b for rm_i in x]
current_loss = loss(y_hats, y)

if current_loss < min_loss:
min_loss = current_loss
best_k, best_b = k, b
print(f'{step}, we have func f(rm) = {k} * rm + {b}, lss is :{current_loss}')

plt.scatter(x, y)
plt.scatter(x, [best_k * rm + best_b for rm in x])

Monte Carlo simulation(蒙特卡洛模拟)

Supervisor

\[ Loss(k, b) = \frac{1}{n} \sum_{i \in N} ((k * rm_i + b) - y_i) ^ 2 \]

\[ \frac{\partial{loss(k, b)}}{\partial{k}} = \frac{2}{n}\sum_{i \in N}(k * rm_i + b - y_i) * rm_i \]

\[ \frac{\partial{loss(k, b)}}{\partial{b}} = \frac{2}{n}\sum_{i \in N}(k * rm_i + b - y_i)\]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def partial_k(k, b, x, y):
return 2 * np.mean((k*x+b-y) * x)

def partial_b(k, b, x, y):
return 2 * np.mean(k*x+b-y)

k, b = random.random(), random.random()
min_loss = float('inf')

best_k, best_b = None, None
learning_rate = 1e-2

for step in range(2000):
k, b = k + (-1 * partial_k(k, b, x, y) * learning_rate), b + (-1 * partial_b(k, b, x, y) * learning_rate)
y_hats = k * x + b
current_loss = loss(y_hats, y)

if current_loss < min_loss:
min_loss = current_loss
best_k, best_b = k, b
print(f'setp {step}, we have func f(rm) = {k} * rm + {b}, lss is :{current_loss}')

best_k, best_b


plt.scatter(x, y)
plt.scatter(x, [best_k * rm + best_b for rm in x])

Supervised Learning

We turn the forecast of housing prices into a more responsible and sophisticated model. What should we do?

\[ f(x) = k * x + b \]

\[ f(x) = k2 * \sigma(k_1 * x + b_1) + b2 \]

\[ \sigma(x) = \frac{1}{1 + e^(-x)} \]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def sigmoid(x):
return 1 / (1+np.exp(-x))

sub_x = np.linspace(-10, 10)
plt.plot(sub_x, sigmoid(sub_x))


def random_linear(x):
k, b = random.random(), random.random()
return k * x + b

def complex_function(x):
return (random_linear(x))

for _ in range(10):
index = random.randrange(0, len(sub_x))
sub_x_1, sub_x_2 = sub_x[:index], sub_x[index:]
new_y = np.concatenate((complex_function(sub_x_1), complex_function(sub_x_2)))
plt.plot(sub_x, new_y)

We can implement more complex functions through simple, basic modules and repeated superposition

For more and more complex functions? How does the computer seek guidance?

  1. What is machine learning?
  2. The shortcomings of this method of KNN, what is the background of the proposed linear fitting
  3. How to get faster function weight update through supervision method
  4. The combination of nonlinear and linear functions can fit very complex functions
  5. Deep learning we can fit more complex functions through basic function modules

Assigment:

\[ L2-Loss(y, \hat{y}) = \frac{1}{n}\sum{(\hat{y} - y)}^2 \]

\[ L1-Loss(y, \hat{y}) = \frac{1}{n}\sum{|(\hat{y} - y)|} \]

L2-Loss becomes L1Loss and achieves gradient descent

Realize L1Loss gradient descent from 0

1. import package

1
2
import numpy as np
import pandas as pd

2. load data

1
2
3
4
5
6
7
8
9
10
11
from sklearn.datasets import load_boston
data = load_boston()
data.keys()

data_train = data.data
data_traget = data.target

df = pd.DataFrame(data_train, columns = data.feature_names)
df.head()

df.describe() # Data description, you can view the statistics of each variable

3. Data preprocessing

Normalization or standardization can prevent a certain dimension or a few dimensions from affecting the data too much when there are very many dimensions, and secondly, the program can run faster. There are many methods, such as standardization, min-max, z-score, p-norm, etc. How to use it depends on the characteristics of the data set.

Further reading-数据标准化的迷思之深度学习领域

1
2
3
4
5
6
7
8
9
10
11
12
13
from sklearn.preprocessing import StandardScaler
# z = (x-u) / s u is the mean, s is the standard deviation
ss = StandardScaler()
data_train = ss.fit_transform(data_train)
# For linear models, normalization or standardization is generally required, otherwise gradient explosion will occur, and tree models are generally not required
data_train = pd.DataFrame(data_train, columns = ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT'])
data_train.describe()

# y=Σwixi+
# Because the derivation of b is all 1, add a bias b to the data and set it to 1, as a feature of the data and update the gradient wi*b=wi
data_train['bias'] = 1
data_train

Divide the data set, where 20% of the data is used as the test set X_test, y_test, and the other 80% are used as the training set X_train, y_train, where random_state is the random seed

1
2
3
4
5
6
7
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(data_train, data_traget, test_size = 0.2, random_state=42)

print('train_x.shape, train_y.shape', train_x.shape, train_y.shape)
print('test_x.shape, test_y.shape', test_x.shape, test_y.shape)

train_x = np.array(train_x)

Model training and gradient update

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def l1_cost(x, y, theta):
"""
x: 特征
y: 目标值
thta: 模型参数
"""
k = x.shape[0]
total_cost = 0
for i in range(k):
total_cost += 1/k * np.abs(y[i] -theta.dot(x[i, :]))
return total_cost

def l2_cost(x, y, theta):
k = x.shape[0]
total_cost = 0
for i in range(k):
total_cost += 1/k * (y[i] -theta.dot(x[i,:])) ** 2
return total_cost

np.zeros(10).shape

def step_l1_gradient(x, y, learning_rate, theta):
"""
Function to calculate the gradient of the MAE loss function
Return the gradient value 0 for the non-differentiable point at 0
X:特征向量
y:目标值
learing_rate:学习率
theta:参数
"""
n = x.shape[0]
# print(n)
e = y - x @ theta
gradients = - (x.T @ np.sign(e)) / n # sign is a sign function
thata = theta - learning_rate * gradients
return theta

def step_l2_gradient(x, y, learning_rate, theta):
k = x.shape[0]
n = x.shape[1]
gradients = np.zeros(n)
for i in range(k):
for j in range(n):
gradients[j] += (-2/k) * (y[i] - (theta.dot(x[i, :]))) * x[i, j]
theta = theta - learning_rate * gradient
return theta

# def step_gradient(X, y, learning_rate, theta):
# """
# X:特征向量
# y:目标值
# learing_rate:学习率
# theta:参数
# """
# m_deriv = 0
# N = len(X)
# for i in range(N):
# # 计算偏导
# # -x(y - (mx + b)) / |mx + b|
# m_deriv += - X[i] * (y[i] - (theta*X[i] + b)) / abs(y[i] - (theta*X[i] + b))
# # We subtract because the derivatives point in direction of steepest ascent
# theta -= (m_deriv / float(N)) * learning_rate
# # theta = theta - learning_rate * gradients
# return theta

def gradient_descent(train_x, train_y, learning_rate, iterations):
k = train_x.shape[0]
n = train_x.shape[1]
theta = np.zeros(n) # Initialization parameters

loss_values = []
# print(theta.shape)

for i in range(iterations):
theta = step_l1_gradient(train_x, train_y, learning_rate, theta)
loss = l1_cost(train_x, train_y, theta)
loss_values.append(loss)
print(i, 'cost:', loss)
return theta, loss_values

# Training parameters
learning_rate = 0.04 # Learning rate
iterations = 300 # Number of iterations
theta, loss_values = gradient_descent(train_x, train_y, learning_rate, iterations)

Boston house price CART regression tree

On the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# CART regression tree prediction
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.metrics import r2_score,mean_absolute_error, mean_squared_error
from sklearn.tree import DecisionTreeRegressor,export_graphviz
import graphviz

# Prepare data set
boston = load_boston()

# Explore data
print(boston.feature_names)

# Get feature set and price
features = boston.data
prices = boston.target


# Randomly extract 33% of the data as the test set, and the rest as the training set
train_features, test_features, train_price, test_price = train_test_split(features,prices,test_size=0.33)

# Create CART regression tree
dtr = DecisionTreeRegressor()

# Fitting and constructing CART regression tree
dtr.fit(train_features, train_price)

# Predict housing prices in the test set
predict_price = dtr.predict(test_features)

grap_data = export_graphviz(dtr, out_file=None)
graph = graphviz.Source(grap_data)

# Result evaluation of test set
print(f'Regression tree mean squared deviation:',mean_squared_error(test_price, predict_price))
print(f'Regression tree absolute value deviation mean:',mean_absolute_error(test_price, predict_price))

# Generate regression tree visualization
graph.render('Boston')

!> Before running this code, please ensure that the relevant dependencies have been installed;

Digits recognition

The code address of this article is: digit recognition

Convolution operation demo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pylab
import numpy as np
from scipy import signal

# set img
img = np.array([[10, 10, 10, 10, 10],[10, 5, 5, 5, 10], [10, 5, 5, 5, 10], [10, 5, 5, 5, 10], [10, 10, 10, 10, 10]])

# set convolution
fil = np.array([[-1, -1, 0], [-1, 0, 1], [0, 1, 1]])

# convolution the img
res = signal.convolve2d(img, fil, mode='valid')

# output the result
print(res)

output

1
2
3
[[ 15  10   0]
[ 10 0 -10]
[ 0 -10 -15]]

A image demo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import matplotlib.pyplot as plt
import pylab
import cv2
import numpy as np
from scipy import signal

# read the image
img = cv2.imread('./data/weixin.jpg', 0) # Any picture

# show the image
plt.imshow(img, cmap='gray')
pylab.show()

# set the convolution
fil = np.array([[-1,-1,0], [-1, 0, 1], [0, 1, 1]])

# convolution operation
res = signal.convolve2d(img, fil, mode='valid')
print(res)

# show convolution image
plt.imshow(res, cmap = 'gray')
pylab.show()

use LeNet model to recognize Mnist handwritten digits

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import keras
from keras.datasets import mnist
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Flatten
from keras.models import Sequential
import warnings
warnings.filterwarnings('ignore')

# load data
(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)
train_x = train_x / 255
test_x = test_x / 255

train_y = keras.utils.to_categorical(train_y, 10)
test_y = keras.utils.to_categorical(test_y, 10)

# create sequential models
model = Sequential()

# The first convolutional layer: 6 convolution kernels, the size is 5*5, relu activation function
model.add(Conv2D(6, kernel_size = (5,5), activation='relu', input_shape=(28, 28, 1)))

# the second pooling layer: maximum pooling
model.add(MaxPooling2D(pool_size = (2, 2)))

# the third convolutional layer: 16 convolution kernels, the size is 5*5, relu activation function
model.add(Conv2D(16, kernel_size = (5, 5), activation = 'relu'))

# the second pooling layer: maximum pooling
model.add(MaxPooling2D(pool_size = (2, 2)))

# Flatten the parameters, which is called a convolutional layer in leNet5. in fact, this layer is a one-dimensional vector, the same as the fully connected layer
model.add(Flatten())
model.add(Dense(120, activation = 'relu'))

# Fully connected layer, the number of output nodes is 84
model.add(Dense(84, activation = 'relu'))

# The output layer uses the softmax activation function to calculate the classification probability
model.add(Dense(10, activation='softmax'))

# set the loss function and optimizer configuration
model.compile(loss = keras.metrics.categorical_crossentropy, optimizer = keras.optimizers.Adam(), metrics = ['accuracy'])

# Incoming training data for training
model.fit(train_x, train_y, batch_size = 128, epochs = 2, verbose = 1, validation_data = (test_x, test_y))

# Evaluate the results
score = model.evaluate(test_x, test_y)
print('Error: %.4lf' % score[0])
print('Accuracy: ', score[1])
1
2
3
4
5
6
7
8
Train on 60000 samples, validate on 10000 samples
Epoch 1/2
60000/60000 [==============================] - 37s 616us/step - loss: 0.3091 - accuracy: 0.9102 - val_loss: 0.1010 - val_accuracy: 0.9696
Epoch 2/2
60000/60000 [==============================] - 36s 595us/step - loss: 0.0876 - accuracy: 0.9731 - val_loss: 0.0572 - val_accuracy: 0.9814
10000/10000 [==============================] - 3s 328us/step
Error: 0.0572
Accuracy: 0.9814000129699707