Julia に翻訳--223 Shapiro-Wilk 検定,正規性の検定
#==========
Julia の修行をするときに,いろいろなプログラムを書き換えるのは有効な方法だ。
以下のプログラムを Julia に翻訳してみる。
ファイル名: shapirotest.jl 関数名: shapirotest
翻訳するときに書いたメモ
Julia に翻訳--220, 221, 222 は他ならず,これを翻訳するための道具であった。
alnorm, ppnd は名前をそのまま残し,pnorm, qnorm を使う。
alnorm(z, flag=true) = pnorm(z, flag)
ppnd(p) = qnorm(p)
途中,謎の数値がいくつかあるが,正確な記述に改めた。
結果として R の shapiro.test() と同じ精度の結果が得られる。(n = 3 のときのp値の末尾がちょっと違う)
その他,Julia の特徴を生かした簡潔なプログラムになったと思う。
==========#
using Statistics, Rmath
function shapirotest(x)
x = sort(x);
n = length(x)
(n < 3 || n > 5000) && error("sample size must be between 3 and 5000")
rng = x[n] - x[1]
rng == 0 && ("all 'x' values are identical")
rng < 1e-10 && (x ./= rng)
# 仮定
# 1. 打切りデータ,欠損値はない
# 2. 3 <= n <= 5000
# 3. データは昇順にソートされている
# 4. 全て同じ値ということはない
# 5. rng < 1e-10 のときは,事前に標準化されている
swilk(x, n)
end
function swilk(x, n)
#=====
algorithm as r94 appl. statist. (1995) vol.44, no.4
calculates the shapiro-wilk w test and its significance level
input:
x(n) sample values in ascending order.
n the total sample size (including any right-censored values).
output
w the shapiro-wilks w-statistic.
pw the p-value for w.
=====#
n2 = n ÷ 2
# calculates coefficients for the test
if n == 3
# a = [0.70711]
a = [1/sqrt(2)]
else
a = zeros(n2);
an25 = n + 0.25
a = [ppnd((i - 0.375) / an25) for i = 1:n2]
summ2 = 2sum(a .^ 2)
ssumm2 = sqrt(summ2)
rsn = 1 / sqrt(n)
c1 = [0.0, 0.221157, -0.147981, -2.07119, 4.434685, -2.706056]
a1 = poly(c1, rsn) - a[1] / ssumm2
# normalize coefficients
if n > 5
c2 = [0.0, 0.042981, -0.293762, -1.752461, 5.682633, -3.582633]
a2 = -a[2] / ssumm2 + poly(c2, rsn)
fac = sqrt((summ2 - 2 * a[1] ^ 2 - 2 * a[2] ^ 2) / (1 - 2 * a1 ^ 2 - 2 * a2 ^ 2))
a[1] = a1
a[2] = a2
a[3:n2] ./= -fac
else
fac = sqrt((summ2 - 2 * a[1] ^ 2) / (1 - 2 * a1 ^ 2))
a[1] = a1
a[2:n2] ./= -fac
end
end
# calculate w statistic as squared correlation between data and coefficients
x ./= x[n] - x[1]
x .-= mean(x)
ssa = ssx = sax = 0
for i = 1:n
j = n + 1 - i
asa = i == j ? 0 : sign(i - j) * a[min(i, j)]
ssa += asa^2
ssx += x[i]^2
sax += asa * x[i]
end
# w1 equals (1-w) claculated to avoid excessive rounding error
# for w very near 1 (a potential problem in very large samples)
ssassx = sqrt(ssa * ssx)
w1 = (ssassx - sax) * (ssassx + sax) / (ssa * ssx)
w = 1 - w1
# calculate significance level for w (exact for n=3)
if n == 3
# pw = 1.909859 * (asin(sqrt(w)) - 1.047198)
pw = 6 / π * (asin(sqrt(w)) - π / 3)
else
y = log(w1)
xx = log(n)
m = 0
s = 1
if n <= 11
gamma = poly([-2.273, 0.459], n)
y >= gamma && return w, 1e-10
y = -log(gamma - y)
m = poly([0.5440, -0.39978, 0.025054, -0.6714e-3], n)
s = exp(poly([1.3822, -0.77857, 0.062767, -0.0020322], an))
else
m = poly([-1.5861, -0.31082, -0.083751, 0.0038915], xx)
s = exp(poly([-0.4803, -0.082676, 0.0030302], xx))
end
pw = alnorm((y - m) / s, false)
end
w, pw
end
function poly(coefficients, x)
# algorithm as 181.2 appl. statist. (1982) vol. 31, no. 2
# calculates the algebraic polynomial of order nored-1 with
# array of coefficients coefficients. zero order coefficient is coefficients[1]
nord = length(coefficients)
result = 0
for i = nord:-1:2
result = (result + coefficients[i]) * x
end
result + coefficients[1]
end
ppnd(p) = qnorm(p)
alnorm(z, flag=true) = pnorm(z, flag)
使用例
偶数個
x = [ 0.03, 0.08, 0.14, 0.14, 0.15, 0.17, 0.31, 0.37, 0.37, 0.40,
0.42, 0.43, 0.49, 0.53, 0.60, 0.62, 0.72, 0.75, 0.87, 0.97 ];
奇数個
y = [ 0.03, 0.08, 0.14, 0.14, 0.15, 0.17, 0.31, 0.37, 0.37, 0.40,
0.42, 0.43, 0.49, 0.53, 0.60, 0.62, 0.72, 0.75, 0.87 ];
最小個数
z = [ 0.03, 0.08, 0.14 ];
最大個数
using Random
Random.seed!(321)
u = randn(5000);
shapirotest(x) # (0.9594825287257593, 0.5335636425303667)
shapirotest(y) # (0.9607205920932019, 0.5866579913499355)
shapirotest(z) # (0.9972527472527474, 0.8998502800372317)
shapirotest(u) # (0.9995093332208211, 0.22746134654936953)
R の shapiro.test() の結果と比較
using RCall
R"""
options(digits=16)
a = shapiro.test($x); cat(a$statistic, a$p.value,"\n")
b = shapiro.test($y); cat(b$statistic, b$p.value,"\n")
c = shapiro.test($z); cat(c$statistic, c$p.value,"\n")
d = shapiro.test($u); cat(d$statistic, d$p.value,"\n")
"""
# 0.9594825287257593 0.5335636425303667
# 0.9607205920932019 0.5866579913499355
# 0.9972527472527474 0.8998502800372251
# 0.9995093332208211 0.2274613465493695