-
摘要:
在高性能处理器开发中,准确而快速的性能估算是设计决策和参数选择的基础. 现有工作通过采样算法和RTL的体系结构检查点加速了处理器RTL仿真,使得在数天内测算复杂高性能处理器的SPECCPU等基准测试的性能成为可能. 但是数天的迭代周期仍然过长,性能测算周期仍然有进一步缩短的空间. 在处理器RTL仿真过程中,预热过程的时间占比很大. HyWarm框架的提出是为了加速性能测算过程中的预热过程. HyWarm通过微结构模拟器分析负载预热需求,为每个负载定制预热方案. 对于缓存预热需求较大的负载,HyWarm通过总线协议进行RTL缓存的功能预热;对于RTL全细节仿真,HyWarm利用CPU分簇和LJF调度缩短最大完成时间. HyWarm相较于现有最好的RTL采样仿真方法,在与基准方法准确率相似的前提下,将仿真完成时间缩短了53%.
Abstract:When developing high-performance processors, accurate and fast performance estimation is the basis for design decisions and parameter exploration. Prior work accelerates processor RTL emulation through workload sampling and architectural checkpoints for RTL, which makes it possible to estimate the performance of benchmarks such as SPECCPU running on complex high-performance processors within a few days. However, waiting a few days for performance results is still too long for architecture iteration, and there is still room for further shortening the performance measurement cycle. During RTL emulation of processors, the warm up phase consumes a significant amount of time. As a solution to expedite the warm up phase during performance evaluation, the HyWarm framework is developed. HyWarm analyzes the warm up demand of workloads with the micro-architectural simulator, and adaptively customizes the warm up scheme for each workload. For workloads with high warm up demand on caches, HyWarm performs functional warm up through the caches’ bus protocol on RTL. For detailed emulation part, HyWarm utilizes CPU clustering and LJF scheduling to reduce the maximum completion time. Compared with the best existing sampling-based RTL emulation method, HyWarm reduces the emulation completion time by 53% under the premise of similar accuracy to the baseline method.
-
-
表 1 在AMD EPYC 7H12 64核服务器上运行不同并行任务数的Verilator的仿真速度
Table 1 Emulation Speed of Verilator with Different Parallelism on AMD EPYC 7H12 Server with 64 Cores
仿真速度/IPS 4线程单任务 4线程16任务 满载性能损失 单任务 2153.13 1189.31 每核 538.28 297.33 45% 表 2 常用的RTL性能评估方法对比
Table 2 Comparison of Commonly Used RTL Performance Evaluation Methods
RTL性能评估方法 仿真频率 典型价格/CNY 是否可租用 典型可容纳设计 RTL软件仿真器 ⩽1kHz 5−10万 是 可容纳商业级SoC 公有云FPGA \leqslant 100MHz 每天240−3600 是 Boom处理器 私有FPGA \leqslant 100MHz \leqslant 40万 否 香山处理器 硬件仿真加速器 \leqslant 1MHz >1000万 否 可容纳商业级SoC 表 3 服务器低负载时Verilator仿真的多线程扩展效率对比
Table 3 Comparison of Multi-threading Scaling Efficiency of Verilator Emulation When Server Load is Low
线程数量 1 4 8 16 每核 IPS 190.82 538.28 450.94 321.27 表 4 服务器满载时Verilator仿真的多线程扩展效率对比
Table 4 Comparison of Multi-threading Scaling Efficiency of Verilator Emulation When Server is Fully Loaded
线程数量 4 8 16 每核IPS 297.33 389.27 335.50 表 5 微结构配置
Table 5 Microarchitectural Configuration
部件 配置 分支预测器 16KB TAGE-SC + ITTAGE + RAS + 4KB BTB 一级数据缓存 128KB, 8路数据缓存 一级指令缓存 128KB, 8路指令缓存 二级缓存 1MB 8路 非包含 三级缓存 6MB 6路 非包含 一级指令TLB 40项 一级数据TLB 136(128 × 4k页 + 8 × 2M页) 二级TLB 2K项 取指宽度 每周期8×4B指令 译码重命名宽度 每周期6条指令 ROB/LQ/SQ 256/80/64 物理寄存器堆 192整数;192浮点 执行单元 Int: 4×ALU, 2×MDU, 1×Misc
Mem: 2×Ld AGU, 2×St AGU
Float: 4×FMA, 2×Misc表 6 预热配置
Table 6 Warm up Configurations
方案 功能预热的
M条指令数全细节预热的
M条指令数性能测量的
M条指令数0+100 100 5 0+50 50 5 0+25 20 5 0+10 10 5 0+5 5 5 Ada 100−DW 自适应(DW) 5 FixedFW
(95+5)95 5 5 表 7 不同功能预热方案的总仿真时长对比 h
Table 7 Comparison of Total Simulation Time for Different Functional Warm up Schemes
子项 0+5 0+10 0+25 FixedFW (95+5) Ada GemsFDTD 0.37 0.55 1.04 0.42 0.29 astar.bi 0.57 0.91 1.65 0.58 0.64 astar.ri 0.69 0.95 1.97 0.66 0.79 bwaves 0.57 0.92 1.68 0.60 0.43 bzip2.chi 0.30 0.43 0.81 0.30 0.22 bzip2.com 1.00 1.52 2.71 1.01 0.72 bzip2.htm 0.30 0.43 0.92 0.34 0.31 bzip2.lib 0.30 0.42 0.89 0.30 0.21 bzip2.pro 1.01 1.60 3.19 0.98 0.68 bzip2.sou 0.95 1.49 2.92 1.08 0.96 cactusADM 0.41 0.60 1.35 0.47 0.32 calculix 0.35 0.60 1.12 0.36 0.26 dealII 0.33 0.51 1.10 0.40 1.20 gamess.cy 0.33 0.49 1.00 0.36 3.46 gamess.gra 0.35 0.51 1.06 0.38 1.09 gamess.tri 0.33 0.50 0.92 0.34 1.10 gcc.166 0.42 0.61 1.33 0.48 1.34 gcc.200 0.90 1.17 2.72 0.89 0.71 gcc.cpde 0.54 0.86 1.63 0.62 1.75 gcc.expr2 0.58 0.86 1.76 0.63 1.03 gcc.expr 0.63 0.89 1.75 0.61 0.70 gcc.g23 0.55 0.76 1.54 0.66 0.43 gcc.s04 0.57 0.93 1.66 0.67 0.69 gcc.scil 0.90 1.10 2.34 0.94 2.48 gcc.type 0.92 1.44 2.62 0.91 1.57 gobmk.13x 0.94 1.51 3.08 0.99 1.66 gobmk.nn 0.85 1.28 2.61 0.92 0.61 gobmk.sco 0.97 1.34 2.70 0.98 0.66 gobmk.tr 0.95 1.30 2.63 0.87 0.98 gobmk.tr 0.71 1.07 2.26 0.73 1.17 gromacs 0.72 1.00 2.25 0.72 0.48 h264ref.f 0.44 0.58 1.21 0.47 0.45 h264ref.s 0.38 0.50 1.04 0.38 2.23 hmmer.nph 0.77 1.25 2.52 0.85 1.45 hmmer.re 0.80 1.21 2.43 0.92 0.79 lbm 0.67 1.02 2.08 0.74 0.57 leslie3d 0.51 0.78 1.43 0.51 0.35 libquantum 0.56 0.78 1.55 0.98 0.39 mcf 3.14 4.18 9.35 3.34 2.32 milc 0.42 0.59 1.26 0.46 0.34 namd 0.52 0.77 1.38 0.48 0.31 omnetpp 1.08 1.66 3.19 1.27 1.06 perl.che 0.46 0.68 1.29 0.47 0.83 perl.di 0.55 0.83 1.37 0.52 1.56 perl.spli 0.43 0.66 1.31 0.43 0.32 povray 0.55 0.88 1.65 0.54 5.39 sjeng 0.72 1.05 2.00 0.67 2.14 soplex.p 1.15 1.59 3.57 1.36 0.87 soplex.r 1.11 1.70 3.05 1.14 0.71 sphinx3 0.46 0.72 1.33 0.59 1.49 tonto 0.37 0.55 1.19 0.41 0.48 xalancbmk 0.89 1.42 2.56 1.17 1.03 zeusmp 0.51 0.75 1.53 0.58 0.39 总计 35.8 52.7 105.5 38.5 54.4 注:黑体数字表示mcf是25M全细节预热下的时间最长的子项,而povray是Ada配置下的时间最长子项. 表 8 不同方案准确率对比
Table 8 Accuracy Comparison of Different Schemes
% 方案 CPI 分支MPKI L1MP Ada 99.6 91.6 95.1 0+50 99.8 98.9 97.5 0+25 99.7 94.1 91.3 0+10 99.1 85.2 82.8 表 9 WarmProfiler的分支MPKI预测误差(增高)
Table 9 Branch MPKI Prediction Error Caused by WarmProfiler (increase)
子项 完美预测
MPKIMPKI
增高MPKI
增高百分比/%gcc_expr2 0.443 0.177 39.9 gcc_g23 0.973 0.172 17.7 tonto 0.506 0.117 23.1 gamess_g 0.430 0.112 26.1 gcc_scilab 7.687 0.090 1.2 xalancbmk 2.003 0.079 3.9 gcc_s04 0.163 0.070 42.8 perl_di 0.669 0.066 9.8 h264ref_f 0.042 0.064 151.9 astar_rivers 3.422 0.053 1.6 注:计算MPKI误差的方法是用WarmProfiler指导预热所得的MPKI减去用RTL的真实预热需求进行预热所得到的MPKI. 黑体数字标识出了MPKI误差超过0.1的子项. 表 10 簇的数量对调度均衡度的影响
Table 10 Impact of Cluster Count on Scheduling Balance
调度均衡度 随机调度 LJF调度 4 簇 × 16核 0.93 0.99 8 簇 × 8核 0.76 0.98 16 簇 × 4核 0.54 0.63 表 11 LJF调度与随机调度的仿真时间对比
Table 11 Comparison of Simulation Time Between LJF Scheduling and Random Scheduling
仿真 随机调度/h LJF调度/h 提升率/% Ada,8核×8簇 8.71 6.91 20.61 Ada,8核×16簇 6.25 5.38 13.89 25+5,8核×8簇 15.98 13.54 15.26 25+5,8核×16簇 11.29 9.35 17.23 注:Ada结合LJF调度是HyWarm提出的方案;25+5结合随机调度是基线方案. 表 12 采用模拟器IPC和RTL的真实IPC指导LJF调度的最大完成时间
Table 12 Maximum Completion Time of LJF Scheduling Guided by Simulator IPC and Real IPC of RTL
h Ada仿真 模拟器预测IPC 真实IPC 8核 × 4 簇 13.77 13.67 8核 × 8 簇 6.91 6.92 8核 × 16 簇 5.38 5.38 注:黑体数字标识出8簇下模拟器预测IPC获得了更短的完成时间,这是因为LJF是贪心算法,完成时间的预测误差可能导致更好的调度结果. -
[1] Bachrach J, Vo H, Richards B, et al. Chisel: Constructing hardware in a scala embedded language[C] //Proc of the 49th Annual Design Automation Conf. New York: ACM, 2012: 1212–1221
[2] Nikhil R. Bluespec systemVerilog: Efficient, correct RTL from high-level specifications[C] //Proc of the 2nd Int Conf on Formal Methods and Models for Co-Design. Piscataway, NJ: IEEE, 2004: 69–70
[3] Asanovic K, Avizienis R, Bachrach J, et al. The Rocket Chip Generator[R]. Berkeley, CA: UC Berkeley, 2016
[4] Xu Yinan, Yu Zihao, Tang Dan, et al. Towards developing high performance RISC-V processors using agile methodology[C] //Proc of the 55th Annual Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2022: 1178–1199
[5] Lockhart D, Zibrat G, Batten C. PyMTL: A unified framework for vertically integrated computer architecture research[C] //Proc of the 47th Annual Int Symp on Microarchitecture (MICRO). Los Alamitos, CA: IEEE Computer Society, 2014: 280–292
[6] Celio C, Chiu P F, Asanović K, et al. Broom: An open-source out-of-order processor with resilient low-voltage operation in 28-nm CMOS[J]. IEEE Micro, 2019, 39(2): 52−60 doi: 10.1109/MM.2019.2897782
[7] Celio C, Patterson D, Asanovi K. The Berkeley Out-of-Order Machine ( BOOM ) Design Specification[R]. Berkeley, CA: UC Berkeley, 2016
[8] 王凯帆,徐易难,余子濠等. 香山开源高性能 RISC-V 处理器设计与实现[J]. 计 算 机 研 究 与 发 展,2023,60(3):476−493 Wang Kaifan, Xu Yinan, Yu Zihao, et al. XiangShan open-source high performance RISC-V processor design and implementation[J]. Journal of Computer Research and Development, 2023, 60(3): 476−493 (in Chinese)
[9] Veripool. Verilator, the fastest Verilog/SystemVerilog simulator. [EB/OL]. [2022-10-20]. https://www.veripool.org/verilator/
[10] Sherwood T, Perelman E, Calder B. Basic block distribution analysis to find periodic behavior and simulation points in applications[C] //Proc of the 2001 Int Conf on Parallel Architectures and Compilation Techniques. Los Alamitos, CA: IEEE Computer Society, 2001: 3–14
[11] Wunderlich R E, Wenisch T F, Falsafi B, et al. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling[C] //Proc of the 30th Annual Int Symp on Computer Architecture, ISCA. Los Alamitos, CA: IEEE Computer Society, 2003: 84–95
[12] Binkert N, Beckmann B, Black G, et al. The gem5 simulator[C] //Proc of the 16th Int Conf on Architectural Support for Programming Languages and Operating Systems.New York: ACM, 2011, 39(2): 1–7
[13] Kabylkas N, Thorn T, Srinath S, et al. Effective processor verification with logic fuzzer enhanced co-simulation[C] //Proc of the 54th Annual Int Symp on Microarchitecture. New York: ACM, 2021: 667–678
[14] Eeckhout L, Luo Y, De Bosschere K, et al. BLRL: Accurate and efficient warmup for sampled processor simulation[J]. Computer Journal, 2005, 48(4): 451−459 doi: 10.1093/comjnl/bxh103
[15] Wenisch T F, Wunderlich R E, Falsafi B, et al. TurboSMARTS: Accurate microarchitecture simulation sampling in minutes[C] //Proc of the Int Conf on Measurements and Modeling of Computer Systems.New York: ACM, 2005: 408–409
[16] Nikoleris N, Sandberg A, Hagersten E, et al. CoolSim: Statistical techniques to replace cache warming with efficient, virtualized profiling[C] //Proc of the Int Conf on Embedded Computer Systems: Architectures, Modeling and Simulation. Piscataway, NJ: IEEE, 2017: 106–115
[17] Nikoleris N, Eeckhout L, Hagersten E, et al. Directed statistical warming through time traveling[C] //Proc of the 52nd Annual Int Symp on Microarchitecture. New York: ACM, 2019: 1037–1049
[18] Patil H, Isaev A, Heirman W, et al. ELFies: executable region checkpoints for performance analysis and simulation[C] // Proc of the Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2021: 126–136
[19] Haskins J W, Skadron K. Memory reference reuse latency: accelerated warmup for sampled microarchitecture simulation[C] //Proc of the Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2003: 195–203
[20] Yue Luo, John L K, Eeckhout L. Self-monitored adaptive cache warm-up for microprocessor simulation[C] //Proc of the 16th Symp on Computer Architecture and High Performance Computing. Los Alamitos, CA: IEEE Computer Society, 2004: 10–17
[21] ARM. Learn the architecture-introducing AMBA CHI[EB/OL]. [2022-11-24]. https://developer.arm.com/documentation/102407/0100
[22] Cook H, Terpstra W, Lee Y. Diplomatic design patterns: A TileLink case study[C] //Proc of the First Workshop on Computer Architecture Research with RISC-V. Berkeley, CA: UC Berkeley, 2017: 23
[23] Coffman E G, Sethi R. A generalized bound on LPT sequencing[C] //Proc of the Int Symp on Computer Modeling, Measurement and Evaluation. New York: ACM, 1976: 306–310
[24] Xiao Xin. A direct proof of the 4/3 bound of LPT scheduling rule[C] //Proc of Int Conf on Frontiers of Manufacturing Science and Measuring Technology. Amsterdam, The Netherlands: Atlantis, 2017: 486–489
[25] Tan Zhangxi, Waterman A, Cook H, et al. A case for FAME: FPGA architecture model execution[C] //Proc of the 37th Int Symp on Computer Architecture. New York: ACM, 2010: 290–301
[26] Karandikar S, Mao H, Kim D, et al. FireSim : FPGA-accelerated cycle-exact scale-out system simulation in the public cloud[C] //Proc of the 45th Annual Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2018: 29-42
[27] Kim D, Izraelevitz A, Celio C, et al. Strober: Fast and accurate sample-based energy simulation for arbitrary RTL[C] //Proc of the 43rd Int Symp on Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2016: 128–139
[28] Hung W N N, Sun R. Challenges in large FPGA-based logic emulation systems[C] //Proc of the Int Symp on Physical Design. New York: ACM, 2018: 26–33
[29] Agnesina A, Lim S K, Lepercq E, et al. Improving FPGA-based logic emulation systems through machine learning[J].ACM Trans on Design Automation of Electronic Systems, 2020, 25(5): 46:1-46:20
[30] Cadence. Palladium Emulation [EB/OL]. [2022-12-22]. https://www.cadence.com/en_US/home/tools/system-design-and-verification/emulation-and-prototyping/palladium.html
[31] Siemens Software. Veloce Hardware-assisted Verification System[EB/OL]. [2023-01-08]. https://eda.sw.siemens.com/en-US/ic/veloce/
[32] Synopsys. Synopsys Emulation Systems[EB/OL]. [2023-01-08]https://www.synopsys.com/verification/emulation.html
[33] Beamer S, Donofrio D. Efficiently exploiting low activity factors to accelerate RTL simulation[C] //Proc of the Design Automation Conf. Piscataway, NJ: IEEE, 2020: 1-6
[34] Sandberg A, Nikoleris N, Carlson T E, et al. Full speed ahead: Detailed architectural simulation at near-native speed[C] //Proc of the Int Symp on Workload Characterization. Los Alamitos, CA: IEEE Computer Society, 2015: 183–192
[35] Hassani S, Southern G, Renau J. LiveSim: Going live with microarchitecture simulation[C] //Proc of the Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2016: 606–617
[36] Vengalam U K R, Sharma A, Huang M C. LoopIn: A Loop-Based Simulation Sampling Mechanism[C] //Proc of the Int IEEE Symp on Performance Analysis of Systems and Software. Piscataway, NJ: IEEE, 2022: 224–226
[37] Carlson T E, Heirman W, Van Craeynest K, et al. BarrierPoint: Sampled simulation of multi-threaded applications[C] //Proc of the Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2014: 2–12
[38] Grass T, Carlson T E, Rico A, et al. Sampled simulation of task-based programs[J]. IEEE Trans on Computers, 2019, 68(2): 255−269 doi: 10.1109/TC.2018.2860012
[39] Ardestani E K, Renau J. ESESC: A fast multicore simulator using time-based sampling[C] //Proc of the Int Symp on High-Performance Computer Architecture. Los Alamitos, CA: IEEE Computer Society, 2013: 448–459
[40] Pestel S De, Eyerman S, Eeckhout L. Micro-architecture independent branch behavior characterization[C] //Proc of the Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2015: 135–144
[41] RISC-V International. RISC-V Debug Support Version 1.0.0-STABLE[EB/OL]. [2023-01-26]. https://github.com/riscv/riscv-debug-spec
[42] Standard Performance Evaluation Corporation. SPEC CPU® 2006[EB/OL]. [2023-01-26]. https://www.spec.org/cpu2006/
[43] Barr K C, Pan H, Zhang M, et al. Accelerating multiprocessor simulation with a memory timestamp record[C] //Proc of the Int Symp on Performance Analysis of Systems and Software. Los Alamitos, CA: IEEE Computer Society, 2005: 66–77
[44] Black B, Shen J P. Calibration of microprocessor performance models[J]. Computer, 1998, 31(5): 59−65 doi: 10.1109/2.675637
[45] Barr K C, Pan H, Zhang M, et al. Accelerating multiprocessor simulation with a memory timestamp record[C] //Proc of the Int Symp on Performance Analysis of Systems and Software. Austin, Texas, USA: IEEE Computer Society, 2005: 66–77.
[46] Seznec A. A 256 Kbits L-TAGE branch predictor[J]. Journal of Instruction-Level Parallelism Special Issue: The Second Championship Branch Prediction Competition, 2007, 9: 1−6
[47] Predictors T B, Irisa I. TAGE-SC-L Branch Predictors [J]. 5th JILP Workshop on Computer Architecture Competitions: Championship Branch Prediction, 2016:267175
[48] Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques[J]. ACM Transaction on Information Systems, 2002, 20(4): 422−446 doi: 10.1145/582415.582418
[49] Khan T A, Brown N, Sriraman A, et al. Twig: Profile-guided BTB prefetching for data center applications[C] //Proc of the 54th Annual Int Symp on Microarchitecture. New York: ACM, 2021: 816–829
[50] Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches[C] //Proc of the 43rd Annual Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 2006: 423–432
[51] Delimitrou C, Kozyrakis C. IBench: Quantifying interference for datacenter applications[C] //Proc of the Int Symp on Workload Characterization. Los Alamitos, CA: IEEE Computer Society, 2013: 23–33
[52] Leverich J, Kozyrakis C. Reconciling high server utilization and sub-millisecond quality-of-service[C] //Proc of the European Conf on Computer Systems. New York: ACM, 2014: 1-14
[53] Muralidhara S P, Subramanian L, Mutlu O, et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning[C] //Proc of the 44th Annual Int Symp on Microarchitecture. New York: ACM, 2011: 374–385
[54] Kasture H, Sanchez D. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads[C] //Proc of the Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2014: 729–742
[55] Ma Jiayue, Sui Xiufeng, Sun Ninghui, et al. Supporting differentiated services in computers via programmable architecture for resourcing-on-demand (PARD)[C] //Proc of the Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2015, 50(4): 131–143
[56] Krause K L, Shen V Y, Schwetman H D. Analysis of several task-scheduling algorithms for a model of multiprogramming computer systems[J]. Journal of the ACM, 1975, 22(4): 522−550 doi: 10.1145/321906.321917
[57] Hochbaum D S, Shmoys D B. Polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach[J]. SIAM Journal on Computing, 1988, 17(3): 539−551 doi: 10.1137/0217033
[58] Horowitz E, Sahni S. Exact and approximate algorithms for scheduling nonidentical processors[J]. Journal of the ACM, 1976, 23(2): 317−327 doi: 10.1145/321941.321951
[59] Graham, Ronald L. Bounds for certain multiprocessing anomalies[J]. Bell System Technical Journal, 1966, 45(9): 1563−1581 doi: 10.1002/j.1538-7305.1966.tb01709.x
[60] Sifive. Block-Inclusivecache-Sifive[EB/OL]. [2023-01-25]. https://github.com/sifive/block-inclusivecache-sifive
-
期刊类型引用(13)
1. 孙文举,李清勇,张靖,王丹羽,王雯,耿阳李敖. 基于深度神经网络的增量学习研究综述. 数据分析与知识发现. 2025(01): 1-30 . 百度学术
2. 谢家晨,刘波,林伟伟,郑剑文. 联邦增量学习研究综述. 计算机科学. 2025(03): 377-384 . 百度学术
3. 徐岸,吴永明,郑洋. 自适应特征整合与参数优化的类增量学习方法. 计算机工程与应用. 2024(03): 220-227 . 百度学术
4. 马旭淼,徐德. 机器人增量学习研究综述. 控制与决策. 2024(05): 1409-1423 . 百度学术
5. 姚红革,邬子逸,马姣姣,石俊,程嗣怡,陈游,喻钧,姜虹. 避免近期偏好的自学习掩码分区增量学习. 软件学报. 2024(07): 3428-3453 . 百度学术
6. 徐岸,吴永明,郑洋. 基于自监督与蒸馏约束的正则化类增量学习方法. 计算机辅助设计与图形学学报. 2024(05): 775-785 . 百度学术
7. 朱觐镳,吴一帆,王东署. 智能体记忆引导的学习与决策:海马体记忆回放的视角. 控制理论与应用. 2024(10): 1753-1764 . 百度学术
8. 王伟,张志莹,郭杰龙,兰海,俞辉,魏宪. 基于脑启发的类增量学习. 计算机应用研究. 2023(03): 671-675+688 . 百度学术
9. 朱飞,张煦尧,刘成林. 类别增量学习研究进展和性能评价. 自动化学报. 2023(03): 635-660 . 百度学术
10. 吴楚,王士同. 任务相似度引导的渐进深度神经网络及其学习. 计算机科学与探索. 2023(05): 1126-1138 . 百度学术
11. 孙家辉,马骊溟. 持续学习算法在车辆目标识别上的应用. 汽车实用技术. 2023(15): 73-81 . 百度学术
12. 孙泽群,崔员宁,胡伟. 基于链接实体回放的多源知识图谱终身表示学习. 软件学报. 2023(10): 4501-4517 . 百度学术
13. 郭广慧,钟世华,李三忠,丰成友,戴黎明,索艳慧,刘嘉情,牛警徽,黄宇,薛梓萌. 运用机器学习和锆石微量元素构建花岗岩成矿潜力判别图解:以东昆仑祁漫塔格为例. 西北地质. 2023(06): 57-70 . 百度学术
其他类型引用(16)