基于改进TD3算法的无人机轨迹规划

doi:10.15888/j.cnki.csa.009687

微信公众号

网站二维码

2025年1月8日 8:00 星期三

首页 > 过刊浏览>2024年第33卷第12期 >197-209. DOI:10.15888/j.cnki.csa.009687

PDF HTML阅读 XML下载导出引用引用提醒

基于改进TD3算法的无人机轨迹规划
DOI:
                        10.15888/j.cnki.csa.009687
                    
CSTR:
                        32024.14.csa.009687
                    
作者:
                        牟文心牟文心
四川大学 计算机学院, 成都 610065
在期刊界中查找
在百度中查找
在本站中查找
时宏伟时宏伟
四川大学 计算机学院, 成都 610065
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

UAV Trajectory Planning Based on Improved TD3 Algorithm

Author:

MU Wen-Xin
MU Wen-Xin
College of Computer Science, Sichuan University, Chengdu 610065, China
在期刊界中查找
在百度中查找
在本站中查找
SHI Hong-Wei
SHI Hong-Wei
College of Computer Science, Sichuan University, Chengdu 610065, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [15]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

深度强化学习算法在无人机的航迹规划任务中的应用越来越广泛, 但是许多研究没有考虑随机变化的复杂场景, 针对以上问题, 本文提出一种基于TD3改进的PP-CMNTD3算法, 提出了一种简单有效的先验策略并且借鉴人工势场的思想设计了密集奖励, 能够更好地引导无人机有效避开障碍物并且快速接近目标点. 仿真结果表明, 算法的改进可以有效提高网络的训练效率以及在复杂场景中的航迹规划表现, 同时能够在不同初始电量的情况下都能够灵活调整策略, 做到在能耗和迅速抵达目的地之间的有效平衡.

关键词:深度强化学习;无人机;航迹规划;人工势场;双延迟深度确定性策略梯度算法

Abstract:

Deep reinforcement learning algorithms are more and more widely used in UAV trajectory planning tasks, but many studies do not consider complex scenarios of random changes. To address the above problems, this study proposes an improved PP-CMNTD3 algorithm based on TD3, which puts forward a simple and effective prior strategy and draws on the idea of artificial potential fields to design dense rewards. UAVs are better guided to effectively avoid obstacles and swiftly approach target points. Simulation results show that the algorithm improvement can effectively improve the training efficiency of the network and the trajectory planning performance in complex scenarios. At the same time, the strategy can be flexibly adjusted under different initial power levels, achieving an effective balance between energy consumption and rapid arrival at the destination.

Key words:deep reinforcement learning;unmanned aerial vehicle (UAV);trajectory planning;artificial potential field;twin delayed deep deterministic policy gradient (TD3) algorithm

参考文献

[1] Raptis EK, Krestenitis M, Egglezos K, et al. End-to-end precision agriculture UAV-based functionalities tailored to field characteristics. Journal of Intelligent & Robotic Systems, 2023, 107(2): 23.

[2] 张宏宏, 甘旭升, 毛亿, 等. 无人机避障算法综述. 航空兵器, 2021, 28(5): 53–63.

[3] Tang MQ, Sheng JW, Sun SY. A coverage optimization algorithm for underwater acoustic sensor networks based on Dijkstra method. IEEE/CAA Journal of Automatica Sinica, 2023, 10(8): 1769–1771.

[4] 顾子侣, 刘宇, 岳广, 等. 基于改进RRT算法的快速路径规划. 兵器装备工程学报, 2022, 43(10): 294–299.

[5] 赵丽华, 万晓冬. 基于改进A*算法的多无人机协同路径规划. 电子测量技术, 2020, 43(7): 72–75, 166.

[6] 贺勇, 侯体成, 曾子望. 融合改进A*和动态窗口法的无人机路径规划. 机械科学与技术. https://link.cnki.net/urlid/61.1114.th.20231017.1732.016. (2023-10-19).

[7] Tian Y, Zhu XJ, Meng DS, et al. An overall configuration planning method of continuum hyper-redundant manipulators based on improved artificial potential field method. IEEE Robotics and Automation Letters, 2021, 6(3): 4867–4874.

[8] Tyler B. Research on obstacle avoidance path selection of AGV based on improved ant colony algorithm. Computer Informatization and Mechanical System, 2023, 6(2): 1–5.

[9] 王雷, 王艺璇, 李东东, 等. 基于改进遗传算法的移动机器人路径规划研究. 华中科技大学学报(自然科学版), 2024, 52(5): 158–164.

[10] 周彬, 郭艳, 李宁, 等. 基于导向强化Q学习的无人机路径规划. 航空学报, 2021, 42(9): 325109.

[11] Moon J, Papaioannou S, Laoudias C, et al. Deep reinforcement learning multi-UAV trajectory control for target tracking. IEEE Internet of Things Journal, 2021, 8(20): 15441–15455.

[12] 张森, 代强强. 改进型深度确定性策略梯度的无人机路径规划. 系统仿真学报. https://link.cnki.net/urlid/11.3092.V.20240402.1242.004. (2024-04-02).

[13] Grando RB, de Jesus JC, Kich VA, et al. Double Critic deep reinforcement learning for mapless 3D navigation of unmanned aerial vehicles. Journal of Intelligent & Robotic Systems, 2022, 104: 1–20.

[14] Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in Actor-Critic methods. Proceedings of the 35th International Conference on Machine Learning. Stockholm: ICML, 2018. 1582–1591.

[15] Haarnoja T, Zhou A, Abbeel P, et al. Soft Actor-Critic: Off-policy maximum entropy deep reinforcement learning with a stochastic Actor. Proceedings of the 35th International Conference on Machine Learning. Stockholm: ICML, 2018. 1856–1865.

引用本文

牟文心,时宏伟.基于改进TD3算法的无人机轨迹规划.计算机系统应用,2024,33(12):197-209

复制

文章指标

点击次数:68
下载次数: 308
HTML阅读次数: 71
引用次数: 0

历史

收稿日期:2024-04-28
最后修改日期:2024-06-17
录用日期:
在线发布日期: 2024-10-31
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码