ІНСТРУКЦІЙНО-КЕРОВАНЕ ВИРІВНЮВАННЯ ЧАТ-БОТІВ НА ОСНОВІ ВЕЛИКИХ МОВНИХ МОДЕЛЕЙ ДЛЯ КОМПЛАЄНС-ОБМЕЖЕНИХ ФІНАНСОВИХ СЦЕНАРІЇВ

Олександр ПІСКУН

doi:10.31651/2076-5886-2025-1-58-72

пдф

Опубліковано: Dec 29, 2025

DOI: https://doi.org/10.31651/2076-5886-2025-1-58-72

Ключові слова:

великі мовні моделі, чат-бот, інструкційне вирівнювання, комплаєнс, управління валютними ризиками, діалогові системи, оцінювання моделей, штучний інтелект

Олександр ПІСКУН

Черкаський національний університет імені Богдана Хмельницького

https://orcid.org/0000-0001-5334-6337

Анотація

У роботі досліджується, чи можуть чат-боти на основі великих мовних моделей (LLM)
безпечно використовуватися у регульованих фінансових сценаріях на етапі попередньої
взаємодії з клієнтом без донавчання моделі, за умови застосування виключно інструкційних
обмежень. Як приклад розглядається управління валютними ризиками. У межах дослідження
спроєктовано та проаналізовано дві конфігурації чат-бота: базову (без обмежень) та
варіант з інструкційними обмеженнями, орієнтований на дотримання вимог комплаєнсу.
Для оцінювання запропоновано компактну рамку, що охоплює три ключові виміри:
порушення комплаєнсу, інформативність і прескриптивність. На основі підібраного набору
реалістичних користувацьких запитів показано, що інструкційне вирівнювання дозволяє
суттєво зменшити рекомендаційну поведінку моделі, водночас зберігаючи значну частину її
пояснювальної цінності

Як цитувати

ПІСКУН , О. (2025). ІНСТРУКЦІЙНО-КЕРОВАНЕ ВИРІВНЮВАННЯ ЧАТ-БОТІВ НА ОСНОВІ ВЕЛИКИХ МОВНИХ МОДЕЛЕЙ ДЛЯ КОМПЛАЄНС-ОБМЕЖЕНИХ ФІНАНСОВИХ СЦЕНАРІЇВ. Вісник Черкаського університету: Прикладна математика. Інформатика, (1). https://doi.org/10.31651/2076-5886-2025-1-58-72

Номер

№ 1 (2025): ВІСНИК ЧЕРКАСЬКОГО УНІВЕРСИТЕТУ: ПРИКЛАДНА МАТЕМАТИКА. ІНФОРМАТИКА

Розділ

Інформатика

Ця робота ліцензується відповідно до Creative Commons Attribution 4.0 International License.

Біографія автора

Олександр ПІСКУН , Черкаський національний університет імені Богдана Хмельницького

кандидат технічних наук, доцент,
завідувач кафедри прикладної математики
та інформатики, Черкаський національний
університет ім. Б. Хмельницького
e-mail: piskun@ukr.net
ORCID 0000-0001-5334-6337

Посилання

MSCI. (2016). Currency hedging: Adapting to volatility. MSCI Research.

Meketa Investment Group. (2022). Currency hedging. White paper.

Huang, W., Krohn, I., & Sushko, V. (2025). Global FX markets when hedging takes centre stage. BIS

Quarterly Review.

Isabella, G., de Almeida, M. I. S., Duran, F. M., & Gabler, C. (2025). From static to conversational: The role

of landing pages and chatbots in B2B lead generation. Journal of Business Research, 201, 115681.

Maga, S., & Bodlaj, M. (2025). Drivers and outcomes of chatbot use in the business-to-business context.

Journal of Business & Industrial Marketing, 40(1), 250–264.

Bank of England, & Financial Conduct Authority. (2024). Artificial intelligence in UK financial services.

Bank of England. https://www.bankofengland.co.uk

International Organization of Securities Commissions. (2021). The use of artificial intelligence and machine

learning by market intermediaries and asset managers: Final report. IOSCO. https://www.iosco.org

Aldasoro, I., Gambacorta, L., Korinek, A., Shreeti, V., & Stein, M. (2024). Intelligent financial system: How

AI is transforming finance (BIS Working Papers No. 1194). Bank for International Settlements.

European Central Bank. (2024). The rise of artificial intelligence: Benefits and risks for financial stability.

Financial Stability Review.

Kang, H., & Liu, X.-Y. (2023). Deficiency of large language models in finance: An empirical examination

of hallucination. arXiv. https://doi.org/10.48550/arXiv.2311.15548

European Securities and Markets Authority. (2023). Guidelines on certain aspects of the MiFID II suitability

requirements (ESMA35-43-3172). ESMA. https://www.esma.europa.eu

Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., … Mann, G. (2023). BloombergGPT:

A large language model for finance. arXiv. https://doi.org/10.48550/arXiv.2303.17564

Christiano, P. F., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement

learning from human preferences. arXiv. https://doi.org/10.48550/arXiv.1706.03741

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … Lowe, R. (2022). Training

language models to follow instructions with human feedback. In Advances in Neural Information Processing

Systems, 35, 27730–27744.

Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., … Le, Q. V. (2021). Finetuned language

models are zero-shot learners. arXiv. https://doi.org/10.48550/arXiv.2109.01652

Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., … Rush, A. M. (2021). Multitask

prompted training enables zero-shot task generalization. arXiv. https://doi.org/10.48550/arXiv.2110.08207

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., … Wei, J. (2024). Scaling instructionfinetuned language models. Journal of Machine Learning Research, 25(70), 1–53.

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., … Amodei, D. (2022). Constitutional

AI: Harmlessness from AI feedback. arXiv. https://doi.org/10.48550/arXiv.2212.08073

Miehling, E., Desmond, M., Natesan Ramamurthy, K., Daly, E. M., Varshney, K. R., Farchi, E., Dognin, P.,

Rios, J., Bouneffouf, D., Liu, M., & Sattigeri, P. (2025). Evaluating the prompt steerability of large language

models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association

for Computational Linguistics: Human Language Technologies (pp. 7874–7900). Association for

Computational Linguistics.

Yang, H., Liu, X.-Y., & Wang, C. D. (2023). FinGPT: Open-source financial large language models. arXiv.

https://doi.org/10.48550/arXiv.2306.06031.

Wallace, E., Xiao, K., Leike, R., Weng, L., Heidecke, J., & Beutel, A. (2024). The instruction hierarchy:

Training LLMs to prioritize privileged instructions. arXiv. https://doi.org/10.48550/arXiv.2404.13208/

Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., Zhao, S., Hong,

L., Tian, R., Xie, R., Zhou, J., Gerstein, M., Li, D., Liu, Z., & Sun, M. (2024). ToolLLM: Facilitating large

language models to master 16000+ real-world APIs. In Proceedings of the International Conference on

Learning Representations.

Wen, J., Zhong, R., Khan, A., Perez, E., Steinhardt, J., Huang, M., Bowman, S. R., He, H., & Feng, S.

(2024). Language models learn to mislead humans via RLHF. arXiv.

https://doi.org/10.48550/arXiv.2409.12822

Cao, B., Wang, S., Lin, X., Wu, X., Zhang, H., Ni, L. M., & Guo, J. (2025). From deep learning to LLMs: A

survey of AI in quantitative investment. arXiv. https://doi.org/10.48550/arXiv.2503.21422

Yu, Z., et al. (2020). AVA: A conversational assistant for financial services. In Proceedings of the AAAI

Conference on Artificial Intelligence.

Deng, Y., Liao, L., Lei, W., Yang, G. H., Lam, W., & Chua, T.-S. (2025). Proactive conversational AI: A

comprehensive survey of advancements and opportunities. ACM Transactions on Information Systems,

(3), 1–45. https://doi.org/10.1145/3715097

Takayanagi, T., Izumi, K., Sanz-Cruzado, J., McCreadie, R., & Ounis, I. (2025). Are generative AI agents

effective personalized financial advisors? In Proceedings of the 48th International ACM SIGIR Conference

on Research and Development in Information Retrieval (pp. 286–295).

https://doi.org/10.1145/3726302.3729897

Biyani, P., Xu, J., & Carenini, G. (2024). RUBICON: Rubric-based evaluation of domain-specific human–AI

conversations. Microsoft Research. https://www.microsoft.com/en-us/research/publication/rubicon/

Lizée, T., et al. (2024). Evaluating conversational AI systems in healthcare: A multi-stage validation

framework. npj Digital Medicine.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … Riedel, S. (2020). Retrievalaugmented generation for knowledge-intensive NLP tasks. arXiv. https://doi.org/10.48550/arXiv.2005.11401

van der Lee, C., Gatt, A., van Miltenburg, E., Wubben, S., & Krahmer, E. (2019). Best practices for the

human evaluation of automatically generated text. In Proceedings of the 12th International Conference on

Natural Language Generation (pp. 355–368). Association for Computational Linguistics.

Liu, C.-W., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., & Pineau, J. (2016). How NOT to

evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response

generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

(pp. 2122–2132). Association for Computational Linguistics.

Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other

measures of statistical accuracy. Statistical Science, 1(1), 54–75. https://doi.org/10.1214/ss/1177013815

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology,

(2), 77–101. https://doi.org/10.1191/1478088706qp063oa

van der Lee, C., Gatt, A., van Miltenburg, E., & Krahmer, E. (2021). Human evaluation of automatically

generated text: Current trends and best practice guidelines. Computer Speech & Language, 67, Article

https://doi.org/10.1016/j.csl.2020.101151

Khashabi, D., Stanovsky, G., Bragg, J., Lourie, N., Kasai, J., Choi, Y., Smith, N. A., & Weld, D. S. (2022).

GENIE: Toward reproducible and standardized human evaluation for text generation. In Proceedings of the

Conference on Empirical Methods in Natural Language Processing (pp. 11444–11458). Association

for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.787

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Анотація

##plugins.themes.bootstrap3.article.details##

Олександр ПІСКУН , Черкаський національний університет імені Богдана Хмельницького

Посилання

Статті цього автора (авторів), які найбільше читають