導航:首頁 > 數據分析 > 什麼數據適用於r型聚類分析

什麼數據適用於r型聚類分析

發布時間：2023-07-22 11:17:22

㈠聚類分析中常見的數據類型有哪些

聚類分析，又稱群分析，即建立一種分類方法：將一批樣品或者指標（變數），按照它們在性質上的親疏、相似程度進行分類。
按其聚類的方法，數據類型有以下六種：
①系統聚類分析：開始每個對象自成一類，然後將最相似的兩類合並，合並過後重新計算新類與其它類的距離或相近性程度。這一過程一直繼續下去直到所有的對象歸為一類為止

②調優法（動態聚類法）：首先對n個對象進行初步分類，然後根據分類的損失函數盡可能小的原則對其進行調整，直到分類合理為止；
③最優分割法（有序樣品聚類法）：開始將所有樣品看成一類，然後根據某種最優准則將他們分割為二類、三類，一直分割到所需要的K類為止；
④模糊聚類法：利用模糊集理論來處理分類的問題，他將經濟領域中最有模糊特徵的兩態數據或多態數據具有明顯的分類效果；
⑤圖論據類法：利用圖論中最小支撐樹的概念來處理分類問題；
⑥聚類預報法：聚類預報彌補了回歸分析和判別分析的不足。
按分類對象的不同：聚類分為R型和Q型

㈡ R語言學習筆記之聚類分析

R語言學習筆記之聚類分析

使用k-means聚類所需的包：

factoextra

cluster#載入包

library(factoextra)
library(cluster)l

#數據准備
使用內置的R數據集USArrests

#load the dataset
data("USArrests")
#remove any missing value (i.e, NA values for not available)
#That might be present in the data
USArrests <- na.omit(USArrests)#view the first 6 rows of the data
head(USArrests, n=6)

在此數據集中，列是變數，行是觀測值
在聚類之前我們可以先進行一些必要的數據檢查即數據描述性統計，如平均值、標准差等

desc_stats <- data.frame( Min=apply(USArrests, 2, min),#minimum
Med=apply(USArrests, 2, median),#median
Mean=apply(USArrests, 2, mean),#mean
SD=apply(USArrests, 2, sd),#Standard deviation
Max=apply(USArrests, 2, max)#maximum
)
desc_stats <- round(desc_stats, 1)#保留小數點後一位head(desc_stats)

變數有很大的方差及均值時需進行標准化

df <- scale(USArrests)

#數據集群性評估
使用get_clust_tendency()計算Hopkins統計量

res <- get_clust_tendency(df, 40, graph = TRUE)
res$hopkins_stat
## [1] 0.3440875
#Visualize the dissimilarity matrix
res$plot

Hopkins統計量的值<0.5，表明數據是高度可聚合的。另外，從圖中也可以看出數據可聚合。

#估計聚合簇數
由於k均值聚類需要指定要生成的聚類數量，因此我們將使用函數clusGap()來計算用於估計最優聚類數。函數fviz_gap_stat()用於可視化。

set.seed(123)
## Compute the gap statistic
gap_stat <- clusGap(df, FUN = kmeans, nstart = 25, K.max = 10, B = 500)
# Plot the result
fviz_gap_stat(gap_stat)

圖中顯示最佳為聚成四類（k=4）

#進行聚類

set.seed(123)
km.res <- kmeans(df, 4, nstart = 25)
head(km.res$cluster, 20)

# Visualize clusters using factoextra
fviz_cluster(km.res, USArrests)

#檢查cluster silhouette圖

Recall that the silhouette measures (SiSi) how similar an object ii is to the the other objects in its own cluster versus those in the neighbor cluster. SiSi values range from 1 to - 1:
A value of SiSi close to 1 indicates that the object is well clustered. In the other words, the object ii is similar to the other objects in its group.
A value of SiSi close to -1 indicates that the object is poorly clustered, and that assignment to some other cluster would probably improve the overall results.

sil <- silhouette(km.res$cluster, dist(df))
rownames(sil) <- rownames(USArrests)
head(sil[, 1:3])

#Visualize
fviz_silhouette(sil)

圖中可以看出有負值，可以通過函數silhouette()確定是哪個觀測值

neg_sil_index <- which(sil[, "sil_width"] < 0)
sil[neg_sil_index, , drop = FALSE]
## cluster neighbor sil_width
## Missouri 3 2 -0.07318144

#eclust():增強的聚類分析

與其他聚類分析包相比，eclust()有以下優點：

簡化了聚類分析的工作流程

可以用於計算層次聚類和分區聚類

eclust()自動計算最佳聚類簇數。

自動提供Silhouette plot

可以結合ggplot2繪制優美的圖形

#使用eclust()的K均值聚類

# Compute k-means
res.km <- eclust(df, "kmeans")

# Gap statistic plot
fviz_gap_stat(res.km$gap_stat)

# Silhouette plotfviz_silhouette(res.km)
## cluster size ave.sil.width
## 1 1 13 0.31
## 2 2 29 0.38
## 3 3 8 0.39

#使用eclust（）的層次聚類

# Enhanced hierarchical clustering
res.hc <- eclust(df, "hclust") # compute hclust
fviz_dend(res.hc, rect = TRUE) # dendrogam

#下面的R代碼生成Silhouette plot和分層聚類散點圖。
fviz_silhouette(res.hc) # silhouette plot
## cluster size ave.sil.width
## 1 1 19 0.26
## 2 2 19 0.28
## 3 3 12 0.43

fviz_cluster(res.hc) # scatter plot

#Infos

This analysis has been performed using R software (R version 3.3.2)

閱讀全文

與什麼數據適用於r型聚類分析相關的資料

熱點內容

網路中常用的傳輸介質發布：2025-10-20 08:42:23 瀏覽：518

文件如何使用發布：2025-10-20 08:33:27 瀏覽：322

同步推密碼找回發布：2025-10-20 08:04:22 瀏覽：865

樂高怎麼才能用電腦編程序發布：2025-10-20 07:57:56 瀏覽：65

本機qq文件為什麼找不到發布：2025-10-20 07:39:47 瀏覽：264

安卓qq空間免升級發布：2025-10-20 07:36:50 瀏覽：490

linux如何刪除模塊驅動程序發布：2025-10-20 07:36:06 瀏覽：193

at89c51c程序發布：2025-10-20 07:35:06 瀏覽：329

怎麼創建word大綱文件發布：2025-10-20 07:24:54 瀏覽：622

裊裊朗誦文件生成器發布：2025-10-20 07:00:55 瀏覽：626

1054件文件是多少gb 發布：2025-10-20 06:03:27 瀏覽：371

高州禁養區內能養豬多少頭的文件發布：2025-10-20 05:51:26 瀏覽：927

win8ico文件發布：2025-10-20 05:47:08 瀏覽：949

仁和數控怎麼編程發布：2025-10-20 05:24:49 瀏覽：381

項目文件夾圖片發布：2025-10-20 04:42:54 瀏覽：87

怎麼在東芝電視安裝app 發布：2025-10-20 04:42:54 瀏覽：954

plc顯示數字怎麼編程發布：2025-10-20 04:42:54 瀏覽：439

如何辨別假網站發布：2025-10-20 04:26:28 瀏覽：711

寬頻用別人的賬號密碼發布：2025-10-20 04:08:00 瀏覽：556

新app如何佔有市場發布：2025-10-20 03:39:57 瀏覽：42

導航:首頁 > 數據分析 > 什麼數據適用於r型聚類分析

什麼數據適用於r型聚類分析

與什麼數據適用於r型聚類分析相關的資料

友情鏈接