2021年1月4日のブログ記事一覧-裏 RjpWiki

Julia の REPL って

2021年01月04日 | ブログラミング

Julia の REPL って，そもそも何がよかったんだっけ？

Python では atom を使っていた。

atom で Julia を使おうとして，なにか変なことをしてしまって，atom で Julia が使えないと思ってしまった。

なんのことはない，atom の環境設定で language-julia つまり，Julia language support for Atom をインストールすればよいだけだった。Python での使用感とも同じで，R の RStuudio とほとんど同じに使える。
Juno なんかをインストールして変なことになっていた模様。Juno もちゃんと使えるようにできると思うけど，今の状態で十分満足。少なくとも，ターミナルで julia をタイプして起動される環境よりはるかに快適。
jupyter notebooku もインストールして使えるようになっているが，これはかえって面倒。余計な操作が必要になってしまうし，得られる結果も満足がいかない。

ということで，atom 上でプログラムを書いて，成果物を得たのでそれを貼り付けよう。

対象は，Brunner-Munzel 検定だ。並べ替え検定もできるようにした（ちょっと，不本意ではあるのだが）。

けっこう速いよ！

以下を，atom に書き込んでいる。

using StatsBase # for tiedrank
using Statistics # var, std
using Distributions
using Printf

# 欠損値定数 missing
# 欠損値を除く collect(skipmissing(x))
# 平均順位 tiedrank(x); using StatsBase
# var, std using Statisticd
# ベクトル連結 append!(x, y)

function bmt(x, y, alternative, conflevel)
x = collect(skipmissing(x))
y = collect(skipmissing(y))
n1 = length(x)
n2 = length(y)
r1 = tiedrank(x)
r2 = tiedrank(y)
r = tiedrank(vcat(x, y))
m1 = mean(r[1:n1])
m2 = mean(r[n1+1:n1+n2])
estimate = (m2 - (n2 + 1) / 2) / n1
v1 = sum((r[1:n1] .- r1 .- (m1 - (n1 + 1) / 2)).^2) / (n1 - 1)
v2 = sum((r[n1+1:n1+n2] .- r2 .- (m2 - (n2 + 1) / 2)).^2) / (n2 - 1)
n1v1 = n1 * v1
n2v2 = n2 * v2
if n1v1+n2v2 == 0
tstat = df = Inf
else
tstat = n1 * n2 * (m1 - m2) / (n1 + n2) / sqrt(n1v1 + n2v2)
df = (n1v1 + n2v2)^2 / (n1v1^2 / (n1 - 1) + n2v2^2 / (n2 - 1))
end
se = sqrt(v1 / (n1 * n2^2) + v2 / (n2 * n1^2))
obj = TDist(df)
if alternative == "less"
string = "less than"
pvalue = cdf(obj, tstat)
confint = (max(0.0, estimate - quantile(obj, conflevel) * se), 1.0)
elseif alternative == "greater"
string = "greater than"
pvalue = 1 - cdf(obj, tstat)
confint = (0.0, min(1.0, estimate + quantile(obj, conflevel) * se))
else # alternative == "two_sided"
string = "not equal to"
pvalue = 2 * cdf(obj, -abs(tstat))
temp = quantile(obj, (1 - conflevel) / 2) * se
confint = (max(0.0, estimate + temp), min(1.0, estimate - temp))
end
return Dict("tstat" => tstat, "df" => df, "pvalue" => pvalue,
"string" => string, "confint" => confint, "estimate" => estimate,
"conflevel" => conflevel)
end

function BrunnerMunzelTest(x, y; alternative="two_sided", conflevel=0.95)
res = bmt(x, y, alternative, conflevel)
output(res)
end

function output(res)
method = "Brunner-Munzel Test"
println(method)
@printf("t = %.7g, df = %.7g, p-value = %.7g\n", res["tstat"], res["df"], res["pvalue"])
@printf("alternative hypothesis: true p is %s 0.5\n", res["string"])
@printf("%.5g percent confidence interval: [%.5g, %.5g]\n",
res["conflevel"] * 100, res["confint"][1], res["confint"][2])
@printf("sample estimate P(X<Y)+0.5*P(X=Y): %.5g\n", res["estimate"])
end

function next_combn(n, r, a)
lNEO = r # lastNotEqualOffset
while a[lNEO] == n - r + lNEO
lNEO -= 1
if lNEO < 1
return false
end
end
a[lNEO] += 1
for i in lNEO + 1:r
a[i] = a[lNEO] + (i - lNEO)
end
return true
end

function permutationBrunnerMunzelTest(x, y; alternative="two_sided", conflevel=0.95)
res = bmt(x, y, alternative, conflevel)
output(res)
p0 = res["pvalue"]
t0 = abs(res["tstat"])
r = length(x)
n = r + length(y)
nCr = binomial(n, r)
#println("nCr = ", nCr)
a = collect(1:n);
b = collect(1:n);
z = vcat(x, y);
count = 0
while true
x_suf = a[1:r]
y_suf = copy(b)
filter!(e -> !(e in a[1:r]), y_suf)
res = bmt(z[x_suf], z[y_suf], alternative, conflevel)
if abs(res["tstat"]) >= t0
#if res["pvalue"] <= p0
count += 1
end
if next_combn(n, r, a) == false
break
end
end
@printf("exact p-value = %.7g\n", count / nCr)
end

どのように使うか。

まず，以上の関数定義の先端にカーソルを置き，

x = [1,2,1,1,1,1,1,1,1,1,2,4,1,1];
y = [3,3,4,3,1,2,3,1,1,5,4];

BrunnerMunzelTest(x, y)Brunner-Munzel Test
t = -3.137467, df = 17.68284, p-value = 0.005786209
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval: [0.59522, 0.98271]
sample estimate P(X<Y)+0.5*P(X=Y): 0.78896

ついで，並べ替え検定

permutationBrunnerMunzelTest(x, y)

Brunner-Munzel Test
t = -3.137467, df = 17.68284, p-value = 0.005786209
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval: [0.59522, 0.98271]
sample estimate P(X<Y)+0.5*P(X=Y): 0.78896
exact p-value = 0.008037645

以下は R の実行結果

using RCall

R"library(brunnermunzel)"

RObject{StrSxp}
[1] "brunnermunzel" "stats" "graphics" "grDevices"
[5] "utils" "datasets" "methods" "base"

R"system.time(print(brunnermunzel.permutation.test(c(1,2,1,1,1,1,1,1,1,1,2,4,1,1), c(3, 3, 4, 3, 1, 2, 3, 1, 1, 5, 4)))"

permuted Brunner-Munzel Test

data: c(1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1) and c(3, 3, 4, 3, 1, 2, 3, 1, 1, 5, 4)
p-value = 0.008038
sample estimates:
P(X<Y)+.5*P(X=Y)
0.788961

2021年1月
日	月	火	水	木	金	土
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

裏 RjpWiki

Julia ときどき R, Python によるコンピュータプログラム，コンピュータ・サイエンス，統計学

Julia の REPL って

PVアクセスランキング にほんブログ村

プロフィール

最新記事

バックナンバー

カレンダー

カテゴリー

最新コメント

雨雲の動き

ログイン

goo blog お知らせ

goo blog おすすめ

PVアクセスランキングにほんブログ村