背景

liblinearのlogistic regression の性能がどうもおかしい。そこでソースを見てみると、

                double label=predict_values(model_, x, prob_estimates);
                for(i=0;i < nr_w;i++)
                        prob_estimates[i]=1/(1+exp(-prob_estimates[i]));

                if(nr_class==2) // for binary classification
                        prob_estimates[1]=1.-prob_estimates[0];
                else
                {
                        double sum=0;
                        for(i=0; i < nr_class; i++)
                                sum+=prob_estimates[i];

                        for(i=0; i < nr_class; i++)
                                prob_estimates[i]=prob_estimates[i]/sum;
                }

とある。クラス毎に二値用のロジスティックシグモイド関数を通してからその後、正規化している。本来ならば正規化指数関数もしくはソフトマックス関数を使うのが正しいのではないか。

参考

Wikipedia Multinomial logistic regression

修正

次のように修正した。

                double label=predict_values(model_, x, prob_estimates);
                for(i=0;i < nr_w;i++)
                        prob_estimates[i]=exp(prob_estimates[i]);

                if(nr_class==2) // for binary classification
                {
                        prob_estimates[0]=1/(1+1/prob_estimates[0]);
                        prob_estimates[1]=1.-prob_estimates[0];
                }
                else
                {
                        double sum=0;
                        for(i=0; i < nr_class; i++)
                                sum+=prob_estimates[i];

                        for(i=0; i <  nr_class; i++)
                                prob_estimates[i]=prob_estimates[i]/sum;
                }

修正点は黄色に着色してある。

結果

よくなった気がする（笑）。なにより、ソフトマックスの返す値が、実際の確率と近くなった。二値クラス分類の方は試していないので、バグがあるかもしれない。exp(-a) は 1/(exp(a)) だよね？

2024年7月
日	月	火	水	木	金	土
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

中野智文

中野智文(VOYAGE GROUP)のコンピュータなどのメモ

liblinearのlogistic regressionのmulticlassのときにsoftmaxになるよう修正

背景

参考

修正

結果

コメントを投稿