杨同学2021-07-19 20:18:23
老师说chi-square是高频和低频词剔除,但是讲义却说高频词被选择作为feature。
回答(1)
Kevin2021-07-20 09:17:28
同学你好!
高的chi-square的确是作为feature。chi-square度量特征词条 和文档类别 之间的相关程度,特征词条对于某类的统计值越高,它与该类之间的相关性越大,携带的类别信息也越多。
书上的原文如下:
Chi-square test can be useful for feature selection in text data. The chi-square test is applied to test the independence of two events: occurrence of the token and occurrence of the class. The test ranks the tokens by their usefulness to each class in text classification problems. Tokens with the highest chi-square test statistic values occur more frequently in texts associated with a particular class and therefore can be selected for use as features for ML model training due to higher discriminatory potential。
致正在努力的你,望能解答你的疑惑~
如此次答疑能更好地帮助你理解该知识点,烦请【点赞】。你的反馈是我们进步的动力,祝你顺利通过考试~
- 评论(0)
- 追问(0)
评论
0/1000
追答
0/1000
+上传图片

