Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance 文章介绍

Jul 10, 2023

这篇文章的标题是”Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance”。以下是对其主要方法的理解：

问题背景：在量化金融中，订单执行是一个基本任务，目标是完成特定资产的交易订单的购买或清算。最近，模型无关的强化学习（RL）为订单执行问题提供了一种数据驱动的解决方案。然而，现有的工作总是针对单个订单进行优化，忽视了多个订单同时执行的实践，导致了次优和偏差。
多智能体强化学习（MARL）方法：首先，文章提出了一种考虑实际约束的多订单执行的多智能体强化学习（MARL）方法。具体来说，我们将每个智能体视为一个独立的操作员来交易一个特定的订单，同时保持彼此的通信并协作以最大化总体利润。
改进的通信协议：然而，现有的MARL算法通常通过仅交换他们的部分观察信息来实现智能体之间的通信，这在复杂的金融市场中效率不高。为了改进协作，我们提出了一个可学习的多轮通信协议，让智能体之间可以交流预期的行动并相应地进行调整。这通过一种新的动作价值归因方法进行优化，这种方法与原始的学习目标一致，但效率更高。
实验结果：在两个真实世界市场的数据上的实验表明，我们的方法具有优越的性能，显著提高了协作效率。

总的来说，这篇文章提出了一种新的多智能体强化学习方法，通过引入可学习的多轮通信协议和新的动作价值归因方法，有效地解决了金融中的多订单执行问题。

Problem One: what is intention modeling?

The agents can know each other’s intention for the next action.

Problem 2: How to model multi-order excution as a MDP?

There are n agents, and each of them are charge of a single asset.

So how we design the action space?

It’s discreted extent is determined by experiments.

The paper define the action as follows:

at means proportion of the target order 𝑀𝑖.

Pay attention that we need to ensure the order Mi allocated for agent i should by excuted thoroughly. So we have

奖励如何设计呢？

第一、订单执行过程获得的奖励。

di代表的是买入或者卖出的方向。

第二、过大的订单会造成市场冲击成本。alpha 是一个超参数可以控制惩罚的程度。

第三、无论何时，如果在时间步 t钱被用完的时候，所有的智能体会得到惩罚。

最终每个智能体的奖励是三个奖励加起来，总奖励是 n 个智能体在一起的平均。

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Achang

0 Followers

No responses yet

What are your thoughts?

Also publish to my profile

Recommended from Medium

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30

Common side effects of not drinking

Karolina Kozmana

Common side effects of not drinking

By rejecting alcohol, you reject something very human, an extra limb that we have collectively grown to deal with reality and with each…

Jan 22

Lists

Staff picks

785 stories1496 saves

Stories to Help You Level-Up at Work

19 stories891 saves

Self-Improvement 101

20 stories3129 saves

Productivity 101

20 stories2645 saves

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

In

DataDrivenInvestor

by

Austin Starks

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Sep 16

Dark web a place of risk: My Dark web story

In

ILLUMINATION

by

Kallol Mazumdar

I Went on the Dark Web and Instantly Regretted It

Accessing the forbidden parts of the World Wide Web, only to realize the depravity of humanity

Mar 13

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26

Laziness Does Not Exist

In

Human Parts

by

Devon Price

Laziness Does Not Exist

Psychological research is clear: when people procrastinate, there's usually a good reason

Mar 24, 2018

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams