DATA ORGANIZATION MODELS AND THEIR OPTIMIZATION FOR SEMANTIC ANALYSIS OF SOCIAL MEDIA CONTENT

Main Article Content

Oleksandr KUTSENKO
Oleksandr PISKUN

Abstract

Introduction. This paper investigates fundamental approaches to data organization in
information systems and their application for optimizing semantic analysis of social media content.
Classical data models – hierarchical, network, and relational – are analyzed, and their advantages and
limitations in the context of processing large volumes of textual information are identified. The
methodology of content analysis as a tool for studying social communications and semantic text
analysis as a stage of automatic natural language understanding are examined. The relationship
between the choice of data organization model and the efficiency of semantic processing algorithms
for social media content is established.
The exponential growth of digital information, particularly in social networks, creates both new
opportunities and challenges for data analysis. Effective processing of such data arrays requires not
only powerful computing resources but also optimal data organization. The choice of data organization
model directly affects the speed of information access, the complexity of processing algorithms, and
the quality of final analysis results. Meanwhile, there is a growing need for automated content analysis
systems capable of identifying hidden patterns, trends, and semantic connections in large arrays of
textual data.
Content analysis as a research method originated in sociology and journalism but has
transformed into a powerful tool for automatic text processing with the development of computing
technology. Semantic analysis, which is a component of content analysis, allows identifying not only
explicit but also hidden content of messages, which is especially important for understanding public
sentiments, identifying trends, and predicting user behavior in social networks.
Purpose. The aim of this article is to analyze classical data organization models and determine
their optimality for the tasks of semantic analysis of social network content.
Results. The hierarchical model represents data in a tree-like structure with strict subordination
relationships but has limited flexibility for representing complex many-to-many connections. The
network model extends the hierarchical approach by allowing multiple parent relationships but
increases system complexity. The relational model, based on set theory and relational algebra,
provides data independence and declarative query language but may have performance limitations for
certain operations.
Modern data models, including document-oriented and graph databases, better meet the
requirements of social media content processing due to schema flexibility and optimization for
distributed systems. Hybrid architectures combining different types of storage depending on data
nature and operations are identified as the most promising direction. For effective semantic analysis,
the organization of indexes that provide fast search by both lexical and semantic features is critically
important.
Conclusion. The paper examines content analysis stages including source selection, sampling,
unit of analysis identification, and results interpretation. Semantic analysis methods are reviewed,
including sentiment analysis, entity recognition, and topic modeling. The relationship between data
organization model choice and semantic analysis algorithm efficiency is established.
Recommendations for optimizing data structures for processing large volumes of social media content
are provided.
The research demonstrates that no single data model is universally optimal for all aspects of
social media content analysis. Document-oriented databases are suitable for storing raw heterogeneous
data, relational databases for storing structured analysis results, and graph databases for representing
social connections. Distributed processing and stream processing technologies are essential for realtime
analysis of large-scale social media data.

Article Details

How to Cite
KUTSENKO , O., & PISKUN , O. (2022). DATA ORGANIZATION MODELS AND THEIR OPTIMIZATION FOR SEMANTIC ANALYSIS OF SOCIAL MEDIA CONTENT. Cherkasy University Bulletin: Applied Mathematics. Informatics, (1). https://doi.org/10.31651/2076-5886-2022-1-62-72
Section
Інформатика
Author Biographies

Oleksandr KUTSENKO , Bohdan Khmelnytsky National University of Cherkasy

Postgraduate, Department of Informatics and Applied Mathematics, The Bohdan Khmelnytsky
National University of Cherkasy, Ukraine

Oleksandr PISKUN , Bohdan Khmelnytsky National University of Cherkasy

Candidate of Technical Sciences, Associate Professor, Head of Department of Applied Mathematics
and Informatics, Bohdan Khmelnytsky National University of Cherkasy

References

Manyika J., Chui M., Brown B. (2011). Big data: The next frontier for innovation, competition, and

productivity. McKinsey Global Institute.

Stadnyk A. V., Penkovskyi O. I. (2016). Analysis of big data in social networks. Bulletin of Lviv Polytechnic

National University. Information Systems and Networks, 854, 357–367 [in Ukrainian].

Connolly T., Begg C. (2015), Database Systems: A Practical Approach to Design, Implementation and

Management, 6th ed. Pearson.

Neuendorf K. A. (2017). The Content Analysis Guidebook, 2nd ed. AGE Publications.

Pang B., Lee L. (2008) Opinion mining and sentiment analysis. Foundations and Trends in Information

Retrieval, 2, 1–135.

Date C. J. (2004). An Introduction to Database Systems, 8th ed. Pearson.

Elmasri R., Navathe S. B. (2016). Fundamentals of Database Systems, 7th ed. Pearson.

Garcia-Molina H., Garcia-Molina H., Ullman J. D., Widom J. (2009). Database Systems: The Complete

Book, 2nd ed. Pearson.

Tsichritzis D. C., Klug A. (1978). The ANSI/X3/SPARC DBMS Framework. Information Systems, 3(3),

–191.

Silberschatz A., Korth H. F., Sudarshan S. (2020). Database System Concepts, 7th ed. McGraw-Hill.

Ramakrishnan R., Gehrke J. (2003). Database Management Systems, 3rd ed. McGraw-Hill.

Bachman C. W. (1973). The programmer as navigator. Communications of the ACM, 16, 11, 653–658.

Taylor R. W., FrankR. L. (1976) A comparison of the CODASYL and relational approaches to data-base

management. Proceedings of the 1976 Conference on Data: Abstraction, Definition and Structure, ACM,

–67.

Ozsoyoglu G., Wang H. (1993). A summary of the redesign of the CODASYL sets. Proceedings of the 1993

ACM SIGMOD International Conference on Management of Data, 503–504.

Codd E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM,

(6), 377–387.

Maier D. (1983). The Theory of Relational Databases. Computer Science Press.

Stonebraker M., Rowe L. A. (1986). The design of POSTGRES. Proceedings of the 1986 ACM SIGMOD

International Conference on Management of Data, 340–355.

Atkinson M., Bancilhon F., De Witt. (1989). The object-oriented database system manifesto. Proceedings of

the First International Conference on Deductive and Object-Oriented Databases, 223–240.

Cattell R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), 12–27.

Robinson I., Webber J., Eifrem E. (2015) Graph Databases: New Opportunities for Connected Data, 2nd ed.

O’Reilly Media.

Krippendorff K. (2019) Content Analysis: An Introduction to Its Methodology, 4th ed. SAGE Publications.

Berelson B. (1952). Content Analysis in Communication Research. Free Press.

Holsti O. R. (1969). Content Analysis for the Social Sciences and Humanities. Addison-Wesley.

Riffe D., Lacy S., Fico F. (2014). Analyzing Media Messages: Using Quantitative Content Analysis in

Research, 3rd ed. Routledge.

Manaiev O. T. (1998). Content Analysis of Mass Media Materials. Kyiv: Free Press Center. [in Ukrainian].

Stieglitz S., Dang-Xuan M., Bruns A. (2014). Social media analytics. Business & Information Systems

Engineering, 6(2), 89–96.

Nguyen T. H., Shirai K., Velcin J. (2015). Social media analytics for enterprises. ACM Computing Surveys,

(1), 1–37.

Jurafsky D., Martin J. H. (2021). Speech and Language Processing, 3rd ed. Pearson.

Allen J. (1995). Natural Language Understanding, 2nd ed. Benjamin Cummings.

Navigli R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.

Turney P. D. Pantel P. (2010). From frequency to meaning: Vector space models of semantics. Journal of

Artificial Intelligence Research, 37, 141–188.

Liu B. (2020). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, 2nd ed. Cambridge

University Press

Mohammad S. M. (2016). Sentiment analysis: Detecting valence, emotions, and other affectual states from

text. Emotion Measurement. Elsevier, 201–237.

Stonebraker M., Hellerstein J. M. (2005). What goes around comes around. Readings in Database Systems,

th ed. MIT Press, 2–41.

Strauch C., Sites U. L. S., Kriha W. (2011). NoSQL Databases. Stuttgart Media University.

Zobel J. Inverted files for text search engines / J. Zobel, A. Moffat // ACM Computing Surveys. – 2006. –

Vol. 38, № 2. – P. 1–56.

Johnson J., Douze M., Jégou H. (2021). Billion-scale similarity search with GPUs. IEEE Transactions on

Big Data, 7(3), 535–547.

Zaharia M., Xin R. S., Wendell P. (2016). Apache Spark: A unified engine for big data processing.

Communications of the ACM, 59(11), 56–65.

Carbone P., Katsifodimos A., Ewen S. (2015). Apache Flink: Stream and batch processing in a single

engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 38(4), 28–38.