2022年2月5日のブログ記事一覧-裏 RjpWiki

Julia: HypothesisTests.BinomialTest() の p 値に難あり

2022年02月05日 | ブログラミング

HypothesisTests の二項検定の p 値が駄目だ。正確に言うと，母比率 ≒ 0.5 のときの両側検定の場合に限るのだが。

母比率 ≒ 0.5のときは，確率分布が歪んでいる。

例えば，x = 18, n = 24, p = 0.68 だと，以下のようになる。

julia> using Plots

julia> bar(pdf(obj), xticks=(1:25, 0:24), xlimit=(7, 26), grid=false, tick_direction=:out, label="")

julia> using DataFrames

julia> obj = Binomial(24, 0.68)
Binomial{Float64}(n=24, p=0.68)

julia> p = [pdf(obj, x) for x in 0:24];

julia> cum1 = cumsum(p);

julia> cum2 = reverse(cumsum(reverse(p)));

julia> df = DataFrame(x = 0:24, p = p, cum1 = cum1, cum2 = cum2)
25×4 DataFrame
Row │ x p cum1 cum2
│ Int64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────
1 │ 0 1.32923e-12 1.32923e-12 1.0
2 │ 1 6.77906e-11 6.91199e-11 1.0
3 │ 2 1.65663e-9 1.72575e-9 1.0
4 │ 3 2.58159e-8 2.75416e-8 1.0
5 │ 4 2.88008e-7 3.1555e-7 1.0
6 │ 5 2.44807e-6 2.76362e-6 1.0
7 │ 6 1.64735e-5 1.92371e-5 0.999997
8 │ 7 9.00158e-5 0.000109253 0.999981
9 │ 8 0.000406477 0.00051573 0.999891
10 │ 9 0.00153558 0.00205131 0.999484
11 │ 10 0.00489467 0.00694598 0.997949
12 │ 11 0.0132378 0.0201838 0.993054
13 │ 12 0.0304746 0.0506585 0.979816
14 │ 13 0.0597772 0.110436 0.949342
15 │ 14 0.0998065 0.210242 0.889564
16 │ 15 0.141393 0.351635 0.789758
17 │ 16 0.169008 0.520643 0.648365
18 │ 17 0.169008 0.689651 0.479357
19 │ 18 0.139667 0.829318 0.310349
20 │ 19 0.0937236 0.923041 0.170682
21 │ 20 0.0497907 0.972832 0.0769586
22 │ 21 0.0201534 0.992985 0.0271679
23 │ 22 0.0058399 0.998825 0.00701455
24 │ 23 0.00107911 0.999904 0.00117466
25 │ 24 9.55463e-5 1.0 9.55463e-5

x = 18 が平均値 n * p = 24 * 0.68 = 16.32 より大きいので，18〜24 の確率の合計を求めて「2倍してしまう」これは大きな間違い（大したことではないという人もいるんだが）。

正しくは，x = 18 と平均値を挟んで反対側にあるx = 18 のときの確率より大きいものの最小値から1引いた14 から 0 までの確率の和，青で色付した 0.210242 と，x = 18~24 の確率の和，赤で色付した 0.310349 の和，0.210242 + 0.310349 = 0.520591 が求める p 値である。

ということで，以下のプログラムになってしまった。

最初っから書いてももっと短くなるかもしれないが，どういうわけか母比率の信頼区間は正しく計算されているようなので，その部分は残した(confint をそのまま使って）。

使い方と結果の出力を，R に似せておいた。

binom_test(18, 24, p = 0.68, conf_level=0.98, alternative="two.sided")

Exact binomial test

number of successes = 18, number of trials = 24, p-value = 0.52059
alternative hypothesis: true probability of success is not equal to 0.68
98 percent confidence interval: [0.4951508, 0.9200126]
sample estimates: nprobability of success = 0.75

#####

using HypothesisTests
using Distributions

roundn(x, digits) = round(x, digits=digits)
formatp(p) = p < 0.00001 ? "< 0.00001" : string(roundn(p, 5))

function alt_type(alternative)
res = indexin([alternative], ["two.sided", "less", "greater"])[1]
if isnothing(res)
println("alternative is invalid, and changed to 'two.sided'")
res = 1
end
res
end

function binom_test(x, n; p=0.5, conf_level=0.95, alternative="two.sided")
function pvalue(x, n, p, tail)
obj = Binomial(n, p)
if tail == :left
cdf(obj, x)
elseif tail == :right
ccdf(obj, x - 1)
else
if p == 0
Float64(x == 0)
elseif p == 1
Float64(x == n)
else
limit = 1.0000001pdf(obj, x)
m = n * p
if x == m
1
elseif x < m
y = sum(pdf.(obj, ceil(Int, m):n) .<= limit)
cdf(obj, x) + ccdf(obj, n - y)
else
y = sum(pdf.(obj, 0:floor(Int, m)) .<= limit)
cdf(obj, y - 1) + ccdf(obj, x - 1)
end
end
end
end
###
type = alt_type(alternative)
word = ["not equal to", "less than", "greater than"][type]
tail = [:both, :left, :right][type]
result = BinomialTest(x, n, p);
p_value = pvalue(x, n, p, tail)
lo, hi = confint(result, level=conf_level, tail=tail)
println("\tExact binomial test\n\ndata: x and n")
println("number of successes = $x, number of trials = $n, p-value = $(formatp(p_value))")
println("alternative hypothesis: true probability of success is $word $p")
println("$(round(Int, 100conf_level)) percent confidence interval: [$(roundn(lo, 7)), $(roundn(hi, 7))]")
println("sample estimates: nprobability of success = $(roundn(x/n, 7))\n\n")
end

2022年2月
日	月	火	水	木	金	土
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28

裏 RjpWiki

Julia ときどき R, Python によるコンピュータプログラム，コンピュータ・サイエンス，統計学

Julia: HypothesisTests.BinomialTest() の p 値に難あり

PVアクセスランキング にほんブログ村

プロフィール

最新記事

バックナンバー

カレンダー

カテゴリー

最新コメント

雨雲の動き

ログイン

goo blog お知らせ

goo blog おすすめ

PVアクセスランキングにほんブログ村