まことに,申し訳ありませんでした。再現プログラムの文字列定義に誤りがありました。
かさねて,お詫びいたします。
========================= 元記事
"Any idea for creating a new variable from multiple-variable conditions?" にて,3 変数のとる値によって別の新しい変数を定義することについて,プログラムが2つ示されている。
どちらが速いか?と吟味しているが,そもそも,そのプログラム自体が間違っているので,速度には意味がない。
このような,若干複雑な問題については,稚拙でもよいから間違いが入り込む余地の少ないプログラミングが望まれる。そして,そのプログラムが正しいかどうかについて,適切なテストデータでもって検証する必要があるだろう。
そのような段階で,プログラムの仕様の曖昧さも分かったりするものだ。例えば,今回の例で言えば,HBsAg が pos なら,HBcAb, HBsAb が any で Active となっているが,NA も any に入るのか?とか。
テストデータの作成
> HBsAg = factor(c("pos", "neq", NA))
> HBcAb = factor(c("pos", "neq", NA))
> HBsAb = factor(c("pos", "neq", NA))
> hbv.data <- expand.grid(HBsAg, HBcAb, HBsAb)
> colnames(hbv.data) = c("HBsAg", "HBcAb", "HBsAb")
テストデータ
> hbv.data
HBsAg HBcAb HBsAb
1 pos pos pos
2 neq pos pos
3 <NA> pos pos
4 pos neq pos
5 neq neq pos
6 <NA> neq pos
7 pos <NA> pos
8 neq <NA> pos
9 <NA> <NA> pos
10 pos pos neq
11 neq pos neq
12 <NA> pos neq
13 pos neq neq
14 neq neq neq
15 <NA> neq neq
16 pos <NA> neq
17 neq <NA> neq
18 <NA> <NA> neq
19 pos pos <NA>
20 neq pos <NA>
21 <NA> pos <NA>
22 pos neq <NA>
23 neq neq <NA>
24 <NA> neq <NA>
25 pos <NA> <NA>
26 neq <NA> <NA>
27 <NA> <NA> <NA>
> ## No ifelse
> hbv.data1 <- within(hbv.data, {
+ ## Included is.na() condition to achieve "else if" in the SAS code. But I'm not sure if it does.
+
+ ## Create a HBV status vector and give a value
+ hbv.status <- NA
+ hbv.status[is.na(hbv.status) & HBsAg == "pos"] <- "positive"
+ hbv.status[is.na(hbv.status) & HBsAg == "neg" & HBcAb == "pos"] <- "past"
+ hbv.status[is.na(hbv.status) & HBsAg == "neg" & HBcAb == "neg" & HBsAb == "pos"] <- "vaccined"
+ hbv.status[is.na(hbv.status) & HBsAg == "neg" & HBcAb == "neg" & HBsAb == "neg"] <- "negative"
+ })
結果 1
> hbv.data1
HBsAg HBcAb HBsAb hbv.status
1 pos pos pos positive
2 neq pos pos <NA>
3 <NA> pos pos <NA>
4 pos neq pos positive
5 neq neq pos <NA>
6 <NA> neq pos <NA>
7 pos <NA> pos positive
8 neq <NA> pos <NA>
9 <NA> <NA> pos <NA>
10 pos pos neq positive
11 neq pos neq <NA>
12 <NA> pos neq <NA>
13 pos neq neq positive
14 neq neq neq <NA>
15 <NA> neq neq <NA>
16 pos <NA> neq positive
17 neq <NA> neq <NA>
18 <NA> <NA> neq <NA>
19 pos pos <NA> positive
20 neq pos <NA> <NA>
21 <NA> pos <NA> <NA>
22 pos neq <NA> positive
23 neq neq <NA> <NA>
24 <NA> neq <NA> <NA>
25 pos <NA> <NA> positive
26 neq <NA> <NA> <NA>
27 <NA> <NA> <NA> <NA>
> ## Nested ifelse
> hbv.data2 <- within(hbv.data, {
+ hbv.status <-
+ ifelse( HBsAg == "pos", "positive",
+ ifelse( HBsAg == "neg" & HBcAb == "pos", "past",
+ ifelse( HBsAg == "neg" & HBcAb == "neg" & HBsAb == "pos", "vaccined",
+ ifelse( HBsAg == "neg" & HBcAb == "neg" & HBsAb == "neg", "negative",
+ no = NA
+ )
+ )
+ )
+ )
+ })
結果 2
> hbv.data2
HBsAg HBcAb HBsAb hbv.status
1 pos pos pos positive
2 neq pos pos <NA>
3 <NA> pos pos <NA>
4 pos neq pos positive
5 neq neq pos <NA>
6 <NA> neq pos <NA>
7 pos <NA> pos positive
8 neq <NA> pos <NA>
9 <NA> <NA> pos <NA>
10 pos pos neq positive
11 neq pos neq <NA>
12 <NA> pos neq <NA>
13 pos neq neq positive
14 neq neq neq <NA>
15 <NA> neq neq <NA>
16 pos <NA> neq positive
17 neq <NA> neq <NA>
18 <NA> <NA> neq <NA>
19 pos pos <NA> positive
20 neq pos <NA> <NA>
21 <NA> pos <NA> <NA>
22 pos neq <NA> positive
23 neq neq <NA> <NA>
24 <NA> neq <NA> <NA>
25 pos <NA> <NA> positive
26 neq <NA> <NA> <NA>
27 <NA> <NA> <NA> <NA>
===================================== 以下のように訂正させて頂きます(元のプログラムには何の問題もありません)
こうあるべきでした。
> HBsAg = factor(c("pos", "neg", NA))
> HBcAb = factor(c("pos", "neg", NA))
> HBsAb = factor(c("pos", "neg", NA))
> hbv.data <- expand.grid(HBsAg, HBcAb, HBsAb)
> colnames(hbv.data) = c("HBsAg", "HBcAb", "HBsAb")
> hbv.data
HBsAg HBcAb HBsAb
1 pos pos pos
2 neg pos pos
3 <NA> pos pos
4 pos neg pos
5 neg neg pos
6 <NA> neg pos
7 pos <NA> pos
8 neg <NA> pos
9 <NA> <NA> pos
10 pos pos neg
11 neg pos neg
12 <NA> pos neg
13 pos neg neg
14 neg neg neg
15 <NA> neg neg
16 pos <NA> neg
17 neg <NA> neg
18 <NA> <NA> neg
19 pos pos <NA>
20 neg pos <NA>
21 <NA> pos <NA>
22 pos neg <NA>
23 neg neg <NA>
24 <NA> neg <NA>
25 pos <NA> <NA>
26 neg <NA> <NA>
27 <NA> <NA> <NA>
> ## No ifelse
> hbv.data1 <- within(hbv.data, {
+ ## Included is.na() condition to achieve "else if" in the SAS code. But I'm not sure if it does.
+ ## Create a HBV status vector and give a value
+ hbv.status <- NA
+ hbv.status[is.na(hbv.status) & HBsAg == "pos"] <- "positive"
+ hbv.status[is.na(hbv.status) & HBsAg == "neg" & HBcAb == "pos"] <- "past"
+ hbv.status[is.na(hbv.status) & HBsAg == "neg" & HBcAb == "neg" & HBsAb == "pos"] <- "vaccined"
+ hbv.status[is.na(hbv.status) & HBsAg == "neg" & HBcAb == "neg" & HBsAb == "neg"] <- "negative"
+ })
> hbv.data1
HBsAg HBcAb HBsAb hbv.status
1 pos pos pos positive
2 neg pos pos past
3 <NA> pos pos <NA>
4 pos neg pos positive
5 neg neg pos vaccined
6 <NA> neg pos <NA>
7 pos <NA> pos positive
8 neg <NA> pos <NA>
9 <NA> <NA> pos <NA>
10 pos pos neg positive
11 neg pos neg past
12 <NA> pos neg <NA>
13 pos neg neg positive
14 neg neg neg negative
15 <NA> neg neg <NA>
16 pos <NA> neg positive
17 neg <NA> neg <NA>
18 <NA> <NA> neg <NA>
19 pos pos <NA> positive
20 neg pos <NA> past
21 <NA> pos <NA> <NA>
22 pos neg <NA> positive
23 neg neg <NA> <NA>
24 <NA> neg <NA> <NA>
25 pos <NA> <NA> positive
26 neg <NA> <NA> <NA>
27 <NA> <NA> <NA> <NA>
> ## Nested ifelse
> hbv.data2 <- within(hbv.data, {
+ hbv.status <-
+ ifelse( HBsAg == "pos", "positive",
+ ifelse( HBsAg == "neg" & HBcAb == "pos", "past",
+ ifelse( HBsAg == "neg" & HBcAb == "neg" & HBsAb == "pos", "vaccined",
+ ifelse( HBsAg == "neg" & HBcAb == "neg" & HBsAb == "neg", "negative",
+ no = NA
+ )
+ )
+ )
+ )
+ })
> hbv.data2
HBsAg HBcAb HBsAb hbv.status
1 pos pos pos positive
2 neg pos pos past
3 <NA> pos pos <NA>
4 pos neg pos positive
5 neg neg pos vaccined
6 <NA> neg pos <NA>
7 pos <NA> pos positive
8 neg <NA> pos <NA>
9 <NA> <NA> pos <NA>
10 pos pos neg positive
11 neg pos neg past
12 <NA> pos neg <NA>
13 pos neg neg positive
14 neg neg neg negative
15 <NA> neg neg <NA>
16 pos <NA> neg positive
17 neg <NA> neg <NA>
18 <NA> <NA> neg <NA>
19 pos pos <NA> positive
20 neg pos <NA> past
21 <NA> pos <NA> <NA>
22 pos neg <NA> positive
23 neg neg <NA> <NA>
24 <NA> neg <NA> <NA>
25 pos <NA> <NA> positive
26 neg <NA> <NA> <NA>
27 <NA> <NA> <NA> <NA>
大人の対応,畏れ入ります。
expand.gridで全パターンを含むデモデータが作成できるのですね。参考になりました!
ちなみにサイトのデザインが渋くなりましたね。
あなたのテストデータではnegがneqになっていますよ。それから、条件のところにNAはanyに含まれると書かれています。