Leo Deng2020-11-17 12:46:00
老师您好, 请问为什么A和B不对?
回答(1)
Kevin2020-11-17 14:24:28
同学你好!
这里记住结论即可,参考原版书544页,或者如下。
1.Tokens with the highest chi-square test statistic values occur more frequently in texts associated with a particular class and therefore can be selected for use as features for ML model training due to higher discriminatory potential.
2.The mutual information value will be equal to 0 if the token’s distribution in all text classes is the same. The MI value approaches 1 as the token in any one class tends to occur more often in only that particular class of text.
即high chi-square一般是作为features,不是噪声;high MI value说明包含的信息较多,也不是噪声。这是需要focus的,而不是low chi-square或者low MI value。
致正在努力的你,望能解答你的疑惑~
如此次答疑能更好地帮助你理解该知识点,烦请【点赞】。你的反馈是我们进步的动力,祝你顺利通过考试~
- 评论(0)
- 追问(0)


评论
0/1000
追答
0/1000
+上传图片