2022年6月12日のブログ記事一覧-統計ブログはじめました！

第2章-4：Multivariate Descriptive Statistics （続き）

2022-06-12 15:39:51 | 日記・エッセイ・コラム

統計技術第Ⅲ部：第2章多変量記述統計（7）
第2章-4：Multivariate Descriptive Statistics （続き）
(3) Agglomerative Nesting(Hierarchical Clustering)

● Free Statistics Software (Calculator) - Web-enabled scientific services & applications
https://www.wessa.net

上記サイトから、簡単な例題をやってみよう．

Descriptive Statistics→Multivariate Descriptive Statistics
↓
● Agglomerative Nesting(Hierarchical Clustering)を選択
↓
図1　データをセット（既存のデータと入れ替える）

Names of X column: [ Pollen Temple Humidity Weather ]とする．
↓
Compute をクリック
↓
凝集型階層的クラスタリング（Kaufman and Rousseeuw）の計算結果が出力される．
↓
(Lance-Williams formular は既定値のまま)
↓
出力結果(1)
Agglomerative Nesting ( Hierarchical Clustering )
Agglomerative Coefficient=0.7909976
（この係数が大きいほど蜜である）
↓
図１　出力結果(2)

euclidean（ユークリッド距離）、average（平均法）によるデンドログラフが出力される．

階層クラスタリング（AHC ：Agglomerative Hierarchical Clustering）は凝集型（ボトムアップ型：bottom-up clustering　とも言う）であり、多くは非類似度（距離のような場合）に基づいて計算される．

ここで、
データ解析環境「R」での方法を見てみよう．
下記のコマンドを実行する．
---------------------------------
Pollen<- c(4.3,2.2,4.6,11.1,29.9,36.5)
Temple<- c(4.1,5.1,6.2,6.8,14.2,14.9)
Humidity<- c(77,74,76,58,58,56)
Weather<- c(3,3,3,2,2,2)
dat <- data.frame(Pollen, Temple, Humidity, Weather)
dat
library(cluster)
# Compute agnes()
Res<- agnes(dat, diss=FALSE, metric="euclidian", method = "average")
# Agglomerative coefficient
Res$ac
# Plot the tree using pltree()
pltree(Res, cex = 0.6, hang = -1, main = "Dendrogram of Agnes")
---------------------------------
なお、　
観測データの単位が異なるときは標準化をおこなって検討することがある，
その時は、stan=TRUE"として実行すればよい．
---------------------------------
Res<- agnes(dat, diss=FALSE, stan=TRUE", metric="euclidian", method = "average")
---------------------------------

なお、一般的な方法として、
● Hierarchical Clustering を選択すれば、
階層的クラスター分析とは、個体間の類似度あるいは非類似度 (距離) に基づいて、最も似ている個体から順次に集めてクラスターを作っていく方法である。個体間の類似度あるいは非類似度 (距離) に基づいて、最も似ている個体から順次にクラスターを作られる．
↓
図２　階層的クラスター分析によるデンドログラム

「R」では下記のコマンドを実行する．
---------------------------------
Pollen<- c(4.3,2.2,4.6,11.1,29.9,36.5)
Temple<- c(4.1,5.1,6.2,6.8,14.2,14.9)
Humidity<- c(77,74,76,58,58,56)
Weather<- c(3,3,3,2,2,2)
dat <- data.frame(Pollen, Temple, Humidity, Weather)
dat

2022年6月
日	月	火	水	木	金	土
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

統計ブログはじめました！

各専門分野の統計技術、方法、テクニックなどを気ままに分かり易く例題をもとに解説します。

第2章-4：Multivariate Descriptive Statistics （続き）