Julia に翻訳--223　Shapiro-Wilk 検定，正規性の検定

2021年05月07日 | ブログラミング

#==========
Julia の修行をするときに，いろいろなプログラムを書き換えるのは有効な方法だ。
以下のプログラムを Julia に翻訳してみる。

algorithm as241 appl. statist. (1988) vol. 37, no. 3, 477- 484.
https://jblevins.org/mirror/amiller/as181.f90

ファイル名: shapirotest.jl　　関数名: shapirotest

翻訳するときに書いたメモ

Julia に翻訳--220, 221, 222 は他ならず，これを翻訳するための道具であった。
alnorm, ppnd は名前をそのまま残し，pnorm, qnorm を使う。
alnorm(z, flag=true) = pnorm(z, flag)
ppnd(p) = qnorm(p)
途中，謎の数値がいくつかあるが，正確な記述に改めた。
結果として R の shapiro.test() と同じ精度の結果が得られる。（n = 3 のときのp値の末尾がちょっと違う）
その他，Julia の特徴を生かした簡潔なプログラムになったと思う。

==========#

using Statistics, Rmath

function shapirotest(x)
x = sort(x);
n = length(x)
(n < 3 || n > 5000) && error("sample size must be between 3 and 5000")
rng = x[n] - x[1]
rng == 0 && ("all 'x' values are identical")
rng < 1e-10 && (x ./= rng)
# 仮定
# 1. 打切りデータ，欠損値はない
# 2. 3 <= n <= 5000
# 3. データは昇順にソートされている
# 4. 全て同じ値ということはない
# 5. rng < 1e-10 のときは，事前に標準化されている
swilk(x, n)
end

function swilk(x, n)
#=====
algorithm as r94 appl. statist. (1995) vol.44, no.4

calculates the shapiro-wilk w test and its significance level

input:
x(n) sample values in ascending order.
n the total sample size (including any right-censored values).
output
w the shapiro-wilks w-statistic.
pw the p-value for w.
=====#

n2 = n ÷ 2

# calculates coefficients for the test
if n == 3
# a = [0.70711]
a = [1/sqrt(2)]
else
a = zeros(n2);
an25 = n + 0.25
a = [ppnd((i - 0.375) / an25) for i = 1:n2]
summ2 = 2sum(a .^ 2)
ssumm2 = sqrt(summ2)
rsn = 1 / sqrt(n)
c1 = [0.0, 0.221157, -0.147981, -2.07119, 4.434685, -2.706056]
a1 = poly(c1, rsn) - a[1] / ssumm2

# normalize coefficients
if n > 5
c2 = [0.0, 0.042981, -0.293762, -1.752461, 5.682633, -3.582633]
a2 = -a[2] / ssumm2 + poly(c2, rsn)
fac = sqrt((summ2 - 2 * a[1] ^ 2 - 2 * a[2] ^ 2) / (1 - 2 * a1 ^ 2 - 2 * a2 ^ 2))
a[1] = a1
a[2] = a2
a[3:n2] ./= -fac
else
fac = sqrt((summ2 - 2 * a[1] ^ 2) / (1 - 2 * a1 ^ 2))
a[1] = a1
a[2:n2] ./= -fac
end
end

# calculate w statistic as squared correlation between data and coefficients
x ./= x[n] - x[1]
x .-= mean(x)
ssa = ssx = sax = 0
for i = 1:n
j = n + 1 - i
asa = i == j ? 0 : sign(i - j) * a[min(i, j)]
ssa += asa^2
ssx += x[i]^2
sax += asa * x[i]
end

# w1 equals (1-w) claculated to avoid excessive rounding error
# for w very near 1 (a potential problem in very large samples)
ssassx = sqrt(ssa * ssx)
w1 = (ssassx - sax) * (ssassx + sax) / (ssa * ssx)
w = 1 - w1

# calculate significance level for w (exact for n=3)
if n == 3
# pw = 1.909859 * (asin(sqrt(w)) - 1.047198)
pw = 6 / π * (asin(sqrt(w)) - π / 3)
else
y = log(w1)
xx = log(n)
m = 0
s = 1
if n <= 11
gamma = poly([-2.273, 0.459], n)
y >= gamma && return w, 1e-10
y = -log(gamma - y)
m = poly([0.5440, -0.39978, 0.025054, -0.6714e-3], n)
s = exp(poly([1.3822, -0.77857, 0.062767, -0.0020322], an))
else
m = poly([-1.5861, -0.31082, -0.083751, 0.0038915], xx)
s = exp(poly([-0.4803, -0.082676, 0.0030302], xx))
end
pw = alnorm((y - m) / s, false)
end
w, pw
end

function poly(coefficients, x)

# algorithm as 181.2 appl. statist. (1982) vol. 31, no. 2

# calculates the algebraic polynomial of order nored-1 with
# array of coefficients coefficients. zero order coefficient is coefficients[1]

nord = length(coefficients)
result = 0
for i = nord:-1:2
result = (result + coefficients[i]) * x
end
result + coefficients[1]
end

ppnd(p) = qnorm(p)
alnorm(z, flag=true) = pnorm(z, flag)

使用例

偶数個
x = [ 0.03, 0.08, 0.14, 0.14, 0.15, 0.17, 0.31, 0.37, 0.37, 0.40,
0.42, 0.43, 0.49, 0.53, 0.60, 0.62, 0.72, 0.75, 0.87, 0.97 ];

奇数個
y = [ 0.03, 0.08, 0.14, 0.14, 0.15, 0.17, 0.31, 0.37, 0.37, 0.40,
0.42, 0.43, 0.49, 0.53, 0.60, 0.62, 0.72, 0.75, 0.87 ];

最小個数
z = [ 0.03, 0.08, 0.14 ];

最大個数
using Random
Random.seed!(321)
u = randn(5000);

shapirotest(x) # (0.9594825287257593, 0.5335636425303667)
shapirotest(y) # (0.9607205920932019, 0.5866579913499355)
shapirotest(z) # (0.9972527472527474, 0.8998502800372317)
shapirotest(u) # (0.9995093332208211, 0.22746134654936953)

R の shapiro.test() の結果と比較
using RCall
R"""
options(digits=16)
a = shapiro.test($x); cat(a$statistic, a$p.value,"\n")
b = shapiro.test($y); cat(b$statistic, b$p.value,"\n")
c = shapiro.test($z); cat(c$statistic, c$p.value,"\n")
d = shapiro.test($u); cat(d$statistic, d$p.value,"\n")
"""
# 0.9594825287257593 0.5335636425303667
# 0.9607205920932019 0.5866579913499355
# 0.9972527472527474 0.8998502800372251
# 0.9995093332208211 0.2274613465493695

2024年11月
日	月	火	水	木	金	土
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

裏 RjpWiki

Julia ときどき R, Python によるコンピュータプログラム，コンピュータ・サイエンス，統計学

Julia に翻訳--223 Shapiro-Wilk 検定，正規性の検定

このブログの人気記事

コメントを投稿

「ブログラミング」カテゴリの最新記事

プロフィール

最新記事

バックナンバー

カレンダー

カテゴリー

最新コメント

雨雲の動き

ログイン

goo blog お知らせ

goo blog おすすめ

Julia に翻訳--223　Shapiro-Wilk 検定，正規性の検定