心理科学 ›› 2012, Vol. 35 ›› Issue (6): 1502-1506.

• 统计与测量 • 上一篇    下一篇

Fisher与Neyman–Pearson的分歧与心理统计中的假设检验争议

吕小康   

  1. 南开大学社会心理学系
  • 收稿日期:2012-02-12 修回日期:2012-07-01 出版日期:2012-11-20 发布日期:2012-11-20
  • 通讯作者: 吕小康

The discrepancy between Fisher and Neyman-Pearson on hypothesis testing and the Controversy on the Null Hypothesis Significance Test in Psychology

Xiao-Kang Lu   

  • Received:2012-02-12 Revised:2012-07-01 Online:2012-11-20 Published:2012-11-20
  • Contact: Xiao-Kang Lu

摘要:

假设检验思想的提出者Fisher与Neyman–Pearson在统计模型的方法论基础、两类错误的性质、显著性水平的理解、以及假设检验的功能等方面存在诸多分歧, 使得心理统计中最常用的原假设显著性检验模式呈现出隐含的各种矛盾, 从而引发了应用上的争议。心理统计不仅需要检讨现有检验模型的模糊之处和提出其他补充性的统计推论方式,更应注重反思心理统计的教育传统, 以建立更加开放和多元的统计应用视野, 使心理统计为更好地心理学研究服务。

关键词: 假设检验, 显著性水平, 假想无限总体, 一致最大功效, p值

Abstract:

Understanding the difference among Fisher’s and Newman–Pearson’s hypothesis testing models is vital to address the current controversy about the null hypothesis significance test in psychological studies. As the most prominent masters of hypothesis testing, Fisher and Neyman–Pearson differed sharply in the conceptualization of statistical model, the property of two types of errors, the nature of significance level, and the proper function of hypothesis testing. Fisher constructed his significance test theory based on the vague concept of hypothetical population, which he postulated as a vital pretence that made scientific inference possible. Fisher never considered a type II error was necessary in the statistical testing, and argued test of significance contained no criterion for accepting a hypothesis and did not lead to any probability about the real world, but to a change in the investigator’s attitudes towards the hypothesis under consideration. Hence, tests of significance served as a useful but conceptual tool of inductive inference, but not a practical tool of inductive behavior. However, Neyman–Pearson argued error of the type II error was the cornerstone of their hypothesis testing theory, without which no purely probabilistic test was possible. The two errors were interpreted from a strict frequency perspective, based on the same vague and unrealistic proposition of repeated sampling of the identical population. Uniformly most powerful test or unbiased uniformly most powerful test were recommended as perfect statistical testing procedures, though their application were rather limited due to the difficulty in calculating the power of test. Hypothesis testing implied a decision rule that either accepting the null hypothesis or rejecting it when certain events were observed and calculated. Their elegant verification won their theory considerable popularity among statisticians. It was widely considered as a refinement of Fisher’s significance testing, though Fisher himself kept a vehemently critical viewpoint on Newman–Pearson’s theory throughout his life. The discrepancy between the two models has triggered considerable attacks among psychologists, arguing and debating the nature and underlying drawbacks of the null hypothesis significance test, the most frequently applied test mode in psychological studies which hybrids Fisher’s and Neyman–Pearson’s statistical thoughts. No explicit information concerning the dispute between Fisher and Neyman–Pearson is given in the dominant statistics textbooks’ demonstration of NHST in psychology education field. NHST is known to many psychological students and researchers as a self consistent and only method of hypothesis testing, ignoring both the cleft between Fisher and Neyman–Pearson, and the existing and effective testing procedures, such as Bayesian hypothesis testing methods. It would be shortsighted to abandon NHST just because its vagueness, for it at least provides a convenient tool to judge whether an experiment effect is significant. The key point that should draw investigators’ attention is what is the real meaning of be significant and which alternative statistical tool kits they could apply other than NHST, considering the real context of experiment design. Reflections on the statistical education tradition in psychology and an open and diverse horizon of statistical application are recommended to facilitate the utilization of various statistical tools in psychological studies.

Key words: hypothesis testing, significance level, hypothetical infinite population, uniformly most powerful test, p value