STATISTICAL ANALYSIS OF DIGITAL IMAGE COLOR PROFILES AND THEIR CLUSTERING USING THE K-MEANS ALGORITHM

Main Article Content

Yurii ZABOLOTNII
Alla LEVCHENKO
Kateryna TYSIACHNA

Abstract

Introduction. The rapid growth of visual data volumes necessitates the development
of effective methods for automated digital image processing and classification. This paper addresses
the problem of grouping images by color similarity through statistical analysis of color profiles and
unsupervised clustering. A digital image is formalized as a statistical sample — a set of pixel vectors
in a color space – over which numerical characteristics are computed. Two color models, RGB and
HSV, are compared with respect to the informativeness of their statistical descriptors and the quality
of k-means clustering results.
Purpose. The aim of the study is to develop and validate a methodology for statistical analysis
of digital image color profiles using the k-means clustering algorithm, and to conduct a comparative
analysis of the influence of the color space choice (RGB vs HSV) on the informativeness of statistical
features and the quality of image clustering.
Results. For each image, the mean, variance, standard deviation, and mode were computed for
each color channel in both RGB and HSV spaces. Color profiles were represented as numerical
feature vectors used as input to the k-means algorithm. The optimal number of clusters was
determined using the Elbow Method (SSE analysis) and the Silhouette Score. Statistical significance of
inter-cluster differences was verified with ANOVA. The experimental study on a set of ten test images
showed that in the RGB space the optimal number of clusters is k = 2, while in the HSV space k = 3,
with higher Silhouette Score values and lower p-values in ANOVA, indicating clearer cluster
separation. The HSV space separates hue, saturation, and brightness information explicitly, which
leads to more semantically meaningful clusters and better interpretability of statistical characteristics.
The software system was implemented in Python using OpenCV, NumPy, scikit-learn, Matplotlib,
Pandas, and Tkinter libraries.
Conclusion. Statistical analysis of color profiles is an effective tool for quantitative image
description. The k-means algorithm combined with quality evaluation metrics (SSE, Silhouette Score,
ANOVA) provides reliable clustering of images by color features. The HSV color space is more
suitable for color profile clustering than RGB, owing to its better alignment with human visual
perception. The developed methodology can be applied in computer vision systems, automated photo
sorting, medical diagnostics, and satellite image analysis.

Article Details

How to Cite
ZABOLOTNII , Y., LEVCHENKO , A., & TYSIACHNA, K. (2025). STATISTICAL ANALYSIS OF DIGITAL IMAGE COLOR PROFILES AND THEIR CLUSTERING USING THE K-MEANS ALGORITHM. Cherkasy University Bulletin: Applied Mathematics. Informatics, (1). https://doi.org/10.31651/2076-5886-2025-1-46-57
Section
Прикладна математика
Author Biographies

Yurii ZABOLOTNII , Talne Construction and Economic Professional College of Uman National University

Lecturer of Specialized Disciplines, Talne Construction and Economic Professional College of Uman
National University, Ukraine

Alla LEVCHENKO , Talne Secondary School I–III Grades No. 2 of Talne City Council

Mathematics Teacher, Talne Secondary School I–III Grades No. 2 of Talne City Council, Cherkasy
Region, Ukraine

Kateryna TYSIACHNA, Talne Secondary School I–III Grades No. 2 of Talne City Council

 

Student, Talne Secondary School I–III Grades No. 2 of Talne City Council, Cherkasy Region, Ukraine

References

Kolomiiets, A. S., & Marchenko, O. H. (2015). Osnovy klasternoho analizu ta yoho zastosuvannia. Kyiv:

National University.

Kovalenko, I. V. (2017). Analiz metodiv klasteryzatsii u systemakh obrobky danykh [Analysis of clustering

methods in data processing systems]. Journal of Modern Information Technologies, 3(5), 75–81.

Skliar, V. M. (2020). Metody klasternoho analizu u mashynnomu navchanni [Methods of cluster analysis in

machine learning]. Mathematical Research, 1(4), 112–119.

Tsymbal, A. (2002). Vstup do klasternoho analizu: osnovni pidkhody ta metody [Introduction to cluster

analysis: basic approaches and methods]. Kharkiv: Naukova Dumka.

Vlasenko, O. A. (2016). Metryky dlia klasternoho analizu danykh: teoretychni osnovy ta praktychni aspekty

[Metrics for cluster data analysis: theoretical foundations and practical aspects]. Kyiv: KNEU Publishing

House.

Demchenko, O. B. (2018). Klasteryzatsiia yak metod obrobky danykh u suchasnykh doslidzhenniakh

[Clustering as a method of data processing in modern research]. Information Systems and Technologies,

(8), 50–57.

Shvets, K. V. (2024). Doslidzhennia modelei evoliutsii klasteriv v zadachakh rozpiznavannia obraziv [Study

of cluster evolution models in pattern recognition tasks]. Retrieved October 30, 2024, from

https://openarchive.nure.ua/server/api/core/bitstreams/4b7cf3f6-1df4-4f0f-b180-4beb0ca94454/content

Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.

Marchenko, O. O., & Rossada, T. V. (2017). Aktualni problemy Data Mining [Current problems of Data

Mining]. Kyiv.

Dudko, O. V., & Poliakov, S. V. (2018). Osnovy klasternoho analizu: metody ta alhorytmy [Fundamentals

of cluster analysis: methods and algorithms]. Scientific Notes of NaUKMA. Computer Science, 1, 23–32.

Demianchuk, I. S. (2019). Klasteryzatsiia danykh: ohliad suchasnykh metodiv ta pidkhodiv [Data clustering:

review of modern methods and approaches]. Bulletin of Taras Shevchenko National University of Kyiv, 4,

–56.

Taranenko, A. I. (2020). Metryky u zadachakh klasteryzatsii: ohliad ta rekomendatsii [Metrics in clustering

problems: review and recommendations]. Problems of Applied Mathematics and Informatics, 2, 5–15.

Honcharenko, S. U., & Zlepko, S. M. (2015). Kompiuterne rozpiznavannia obraziv [Computer pattern

recognition]. Vinnytsia: VNTU.

Pankratova, N. D., & Yashyn, S. N. (2006). Kolirnyi analiz zobrazhen u kompiuternykh systemakh obrobky

informatsii [Color image analysis in computer information processing systems]. Kyiv: Naukova Dumka.