Logistic regression to diagnose heart disease
The preject source code url : Heart
load data
1 |
|
The params meaning
1 |
|
Perform analysis
1 |
|
Normalization
1 |
|
Logistic regression to diagnose heart disease
The preject source code url : Heart
1 |
|
1 |
|
1 |
|
1 |
|
Foundation of Artificial Intelligence - Lecture 1
No obvious solution ==> Algorithm engineers do it If there is a clear implementation path ==> the person who develops the project will do it
{Ace of hearts, 10 of spades, 3 of spades, 9 of hearts, 9 clubs, 4 of diamonds, J}
First: Hearts> Diamonds> Spades> Clubs Second: Numbers are arranged from small to large
\[ 1024 --> 10^3 --> 1k \] \[ 1024 * 1024 --> 10^6 --> 1M \] \[ 1024 * 1024 * 1024 --> 10^9 --> 1G \]
1 |
|
2.6G Hz
1 |
|
1 |
|
1 |
|
1 |
|
\[ Time(N) - Time(N-1) = constant \] \[ Time(N-1) - Time(N-2) = constant \] \[ Time(N-2) - Time(N-3) = constant \] \[ Time(2) - Time(1) = constant \] \[ Time(N) - Time(1) == (N-1)constant \] \[ Time(N) == (N-1)constant + Time(1) \] \[ Time(N) == N * constant + (Time(1) - constant) \]
SVM-based Text Classification in Practice
The source code: SVM-based Text Classification in Practice
'cnews.train.txt' data cannot be uploaded because it is too large, so it needs to be decompressed and imported after compression.
Use SVM to implement a simple text classification based on bag of words and support vector machine.
1 |
|
Chinese news data is prepared as a sample data set. The number of training data is 50,000 and the number of test data is 10,000. All data is divided into 10 categories: sports, finance, real estate, home furnishing, education, technology, fashion, current affairs, games and entertainment . From the training text, you can load the code, view the data format and samples:
1 |
|
Take the first item of the training data as an example to segment the loaded news data. Here I use the word segmentation function of LTP, you can also use jieba, and the segmentation results are displayed separated by "/" symbols.
1 |
|
To sort out the above logic a bit, implement a class to load training and test data and perform word segmentation.
1 |
|
After spending some time on word segmentation, you can start building a dictionary. The dictionary is built from the training set and sorted by word frequency.
1 |
|
In addition, according to category, we know that the label itself also has a "dictionary":
1 |
|
Next, construct the id-based training and test sets, because we only consider the bag of words, so the order of words is excluded. Constructed to look like libsvm can eat. Note that because the bag of word model
1 |
|
The remaining core model is simple: use libsvm to train the support vector machine, let your svm eat the training and test files you have processed, and then use the existing method of libsvm to train, we can change different parameter settings . The documentation of libsvm can be viewed here, where the "-s, -t, -c" parameters are more important, and they decide what you choose Svm, your choice of kernel function, and your penalty coefficient.
1 |
|
After a period of training, we can observe the experimental results. You can change different svm types, penalty coefficients, and kernel functions to optimize the results.
The code address of this article is: auto operation weibo Chromedrive download: Taobao Mirror , need to be consistent with your Chrome version
1 |
|
1 |
|
1 |
|
The source code: Boston House
1 |
|
1 |
|
1 |
|
1 |
|
In order to obtain results faster, we hope to obtain predictive power by fitting a function
\[ f(rm) = k * rm + b \]
\[ Loss(k, b) = \frac{1}{n} \sum_{i \in N} (\hat{y_i} - y_i) ^ 2 \] \[ Loss(k, b) = \frac{1}{n} \sum_{i \in N} ((k * rm_i + b) - y_i) ^ 2 \]
1 |
|
\[ Loss(k, b) = \frac{1}{n} \sum_{i \in N} ((k * rm_i + b) - y_i) ^ 2 \]
\[ \frac{\partial{loss(k, b)}}{\partial{k}} = \frac{2}{n}\sum_{i \in N}(k * rm_i + b - y_i) * rm_i \]
\[ \frac{\partial{loss(k, b)}}{\partial{b}} = \frac{2}{n}\sum_{i \in N}(k * rm_i + b - y_i)\]
1 |
|
We turn the forecast of housing prices into a more responsible and sophisticated model. What should we do?
\[ f(x) = k * x + b \]
\[ f(x) = k2 * \sigma(k_1 * x + b_1) + b2 \]
\[ \sigma(x) = \frac{1}{1 + e^(-x)} \]
1 |
|
We can implement more complex functions through simple, basic modules and repeated superposition
For more and more complex functions? How does the computer seek guidance?
\[ L2-Loss(y, \hat{y}) = \frac{1}{n}\sum{(\hat{y} - y)}^2 \]
\[ L1-Loss(y, \hat{y}) = \frac{1}{n}\sum{|(\hat{y} - y)|} \]
L2-Loss becomes L1Loss and achieves gradient descent
Realize L1Loss gradient descent from 0
1 |
|
1 |
|
Normalization or standardization can prevent a certain dimension or a few dimensions from affecting the data too much when there are very many dimensions, and secondly, the program can run faster. There are many methods, such as standardization, min-max, z-score, p-norm, etc. How to use it depends on the characteristics of the data set.
Further reading-数据标准化的迷思之深度学习领域
1 |
|
Divide the data set, where 20% of the data is used as the test set X_test, y_test, and the other 80% are used as the training set X_train, y_train, where random_state is the random seed
1 |
|
1 |
|
Boston house price CART regression tree
1 |
|
!> Before running this code, please ensure that the relevant dependencies have been installed;
The code address of this article is: digit recognition
1 |
|
output
1 |
|
1 |
|
1 |
|
1 |
|
The code address of this article is: example_01_Assignment
Analysis data and research report collection
The purpose is to facilitate finding specific locations when doing data analysis by yourself
国外咨询机构较多,数据详实,无论是海外出海产品,海外报告中多有亚洲和中国的重点研究,相关报告和趋势分析都可以选看
最近研究发现,还有一个好的行业信息获取来源,就是通过站内或者 App 内的爬虫抓取,这个渠道获取的数据,通常可以帮助你了解行业和竞品的站内使用情况,用户喜欢的内容,用户的分布,用户的行为和喜好等等。
爬虫,简单来说是通过程序来获取网页的信息,整理成数据库,从而进行数据挖掘的得到分析结论的过程。比如你可以爬虫购物的页面,知道哪个商品的销量好,比如你可以爬虫小红书的页面,知道哪些 kol 收到欢迎,你还可以爬取他们的分类,知道美妆和购物的 kol 表现好,并且有多少个这样的 kol。如果你没有对方的数据库权限(当然你肯定没有),那么从外部爬虫是最好的了解他们业务数据的方式。
通常的搜索方式 是:你要了解的网站/App+爬虫,在搜索平台比如百度搜索
这里举例一些程序员垂直的网站,
CSDN 网站:在这个网站内搜索:网站/app +爬虫 这个关键词,在站内搜索
简书 - 创作你的创作:在这个网站内搜索:网站/app +爬虫 这个关键词,在站内搜索,
V2EX:在这个网站内搜索:网站/app +爬虫 这个关键词,在站内搜索
掘金:在这个网站内搜索:网站/app +爬虫 这个关键词,在站内搜索
现在发现很多好的内部报告和难以获得报告,是通过加入一些干货群,内部群来获得的。
比如做直播电商的人自己比较关注一些直播和电商带货的详细的数据和报告趋势,大家会自己组建一些干货群,只要是市面上有的报告,自己内部发现的都会往里面扔。
这个是淘宝直播的负责人赵圆圆离开淘宝后创业,同时聚集的几个群,里面关于直播的干货非常多。
其他的关于投资的,趋势,创业的类似群也很多,获取报告也很一手,大家也可以自己开发下这样的群组织。
搜索引擎还是可以搜到很多你个性化想要找的报告和趋势。以前我没觉得搜索引擎很很难,后来发现也需要学习和熟练使用,才能让其为自己所用。
通常对于上市公司来说,财报信息包含的内容是最全面的,关于用户,商业,渠道,增长,业务策略等等。所以如果想了解一个公司,如果是上市公司最好第一手先看财报后者 SEC(上市报告)。
很多同学问我财报哪里找,不知道怎么看。其实每个公司都有自己的 IR(投资者页面),在上面有财报的完整的 pdf 下载。另外,也推荐大家听听每期的企业 conference call(回答财报问题),可以听下 CEO 对财报的解读。
这里我列举几个大公司财报的网站
如果大家有自己想要了解的公司,在百度 or google 搜索:公司名字+IR ,可以 定位到他们公司的财报网站页面。在页面上找到 conference call 或者 webcast,可以 找到他们的财报解读音频。
这个世界有很多有用的信息,搜索引擎只解决了其中 20%,其他 80%的信息再各个角落,包括微信群,包括口口,甚至包括直播里都有,但是都不在搜索引擎。
就搜索引擎而言,现在很多人只是使用了其中的 5%还不到。搜索引擎的技巧可以提升,但是其他 80%的信息获取渠道更为隐蔽和无法公开获得的。
我加了很多 群,里面都是这些报告和信息和各行各业的各种信息,这些是搜索引擎提供不了的
这些冰山下的信息才决定了信息的获取的不同和优质与否。
除了上述渠道外,能找到靠谱渠道,找到合适的报告随时存储起来,等用的时候随手打开用是最好的。分享一个我最近看的收藏的精品的报告收藏夹,也希望对你们有用(随时更新)
找到行业信息报告知识获取信息只是其中一个层次 ,获取信息是否更有价值更直接可用,在于基本功行业信息报告的甄别和获取,积累和提炼,这是非常重要的。
但是 越往上走,越是接近信息更有价值,更新鲜,更真实有效,更直接,有大量的渠道 可以 获得更多 的信息,这些不仅是通过 行业报告获取的,还有包括自己可以控制的方法,包括爬虫,数据挖掘,信息技术 等,还包括人脉,圈子,内幕的等渠道。大家感兴趣可以到这个答案看下详情,我对每个层级的方法的解读。
nvm
作为 node
的版本管理器,并不具备自动切换版本切换的功能,有的时候我们需要针对某一个项目切换当前的
node 版本,这个时候就需要用到其他工具了。比如avn
举例项目:project
因为最近 Node 更新到 10 之后,我将系统默认版本切换到了
10,有不更新不舒服斯基强迫症 而project
编译的版本为
8,否则会出现编译出错。
1 |
|
之后在project
根目录中添加一个文件.node-version