STATISTICAL ANALYSIS OF DIGITAL IMAGE COLOR PROFILES AND THEIR CLUSTERING USING THE K-MEANS ALGORITHM
Main Article Content
Abstract
Introduction. The rapid growth of visual data volumes necessitates the development
of effective methods for automated digital image processing and classification. This paper addresses
the problem of grouping images by color similarity through statistical analysis of color profiles and
unsupervised clustering. A digital image is formalized as a statistical sample — a set of pixel vectors
in a color space – over which numerical characteristics are computed. Two color models, RGB and
HSV, are compared with respect to the informativeness of their statistical descriptors and the quality
of k-means clustering results.
Purpose. The aim of the study is to develop and validate a methodology for statistical analysis
of digital image color profiles using the k-means clustering algorithm, and to conduct a comparative
analysis of the influence of the color space choice (RGB vs HSV) on the informativeness of statistical
features and the quality of image clustering.
Results. For each image, the mean, variance, standard deviation, and mode were computed for
each color channel in both RGB and HSV spaces. Color profiles were represented as numerical
feature vectors used as input to the k-means algorithm. The optimal number of clusters was
determined using the Elbow Method (SSE analysis) and the Silhouette Score. Statistical significance of
inter-cluster differences was verified with ANOVA. The experimental study on a set of ten test images
showed that in the RGB space the optimal number of clusters is k = 2, while in the HSV space k = 3,
with higher Silhouette Score values and lower p-values in ANOVA, indicating clearer cluster
separation. The HSV space separates hue, saturation, and brightness information explicitly, which
leads to more semantically meaningful clusters and better interpretability of statistical characteristics.
The software system was implemented in Python using OpenCV, NumPy, scikit-learn, Matplotlib,
Pandas, and Tkinter libraries.
Conclusion. Statistical analysis of color profiles is an effective tool for quantitative image
description. The k-means algorithm combined with quality evaluation metrics (SSE, Silhouette Score,
ANOVA) provides reliable clustering of images by color features. The HSV color space is more
suitable for color profile clustering than RGB, owing to its better alignment with human visual
perception. The developed methodology can be applied in computer vision systems, automated photo
sorting, medical diagnostics, and satellite image analysis.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Kolomiiets, A. S., & Marchenko, O. H. (2015). Osnovy klasternoho analizu ta yoho zastosuvannia. Kyiv:
National University.
Kovalenko, I. V. (2017). Analiz metodiv klasteryzatsii u systemakh obrobky danykh [Analysis of clustering
methods in data processing systems]. Journal of Modern Information Technologies, 3(5), 75–81.
Skliar, V. M. (2020). Metody klasternoho analizu u mashynnomu navchanni [Methods of cluster analysis in
machine learning]. Mathematical Research, 1(4), 112–119.
Tsymbal, A. (2002). Vstup do klasternoho analizu: osnovni pidkhody ta metody [Introduction to cluster
analysis: basic approaches and methods]. Kharkiv: Naukova Dumka.
Vlasenko, O. A. (2016). Metryky dlia klasternoho analizu danykh: teoretychni osnovy ta praktychni aspekty
[Metrics for cluster data analysis: theoretical foundations and practical aspects]. Kyiv: KNEU Publishing
House.
Demchenko, O. B. (2018). Klasteryzatsiia yak metod obrobky danykh u suchasnykh doslidzhenniakh
[Clustering as a method of data processing in modern research]. Information Systems and Technologies,
(8), 50–57.
Shvets, K. V. (2024). Doslidzhennia modelei evoliutsii klasteriv v zadachakh rozpiznavannia obraziv [Study
of cluster evolution models in pattern recognition tasks]. Retrieved October 30, 2024, from
https://openarchive.nure.ua/server/api/core/bitstreams/4b7cf3f6-1df4-4f0f-b180-4beb0ca94454/content
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.
Marchenko, O. O., & Rossada, T. V. (2017). Aktualni problemy Data Mining [Current problems of Data
Mining]. Kyiv.
Dudko, O. V., & Poliakov, S. V. (2018). Osnovy klasternoho analizu: metody ta alhorytmy [Fundamentals
of cluster analysis: methods and algorithms]. Scientific Notes of NaUKMA. Computer Science, 1, 23–32.
Demianchuk, I. S. (2019). Klasteryzatsiia danykh: ohliad suchasnykh metodiv ta pidkhodiv [Data clustering:
review of modern methods and approaches]. Bulletin of Taras Shevchenko National University of Kyiv, 4,
–56.
Taranenko, A. I. (2020). Metryky u zadachakh klasteryzatsii: ohliad ta rekomendatsii [Metrics in clustering
problems: review and recommendations]. Problems of Applied Mathematics and Informatics, 2, 5–15.
Honcharenko, S. U., & Zlepko, S. M. (2015). Kompiuterne rozpiznavannia obraziv [Computer pattern
recognition]. Vinnytsia: VNTU.
Pankratova, N. D., & Yashyn, S. N. (2006). Kolirnyi analiz zobrazhen u kompiuternykh systemakh obrobky
informatsii [Color image analysis in computer information processing systems]. Kyiv: Naukova Dumka.