Applied Energy

Volume 363, 1 June 2024, 123080
Applied Energy

Towards a fossil-free urban transport system: An intelligent cross-type transferable energy management framework based on deep transfer reinforcement learning

https://doi.org/10.1016/j.apenergy.2024.123080Get rights and content

Highlights

  • An enhanced SAC algorithm combined with standard SAC and PER is formulated.
  • A novel DTRL method is designed by integrating enhanced SAC with transfer learning.
  • A transferable energy management framework is proposed based on the DTRL method.
  • PER buffer and SumTree together with DNN parameters are transferred and reused.
  • Both fuel economy and real-time performance are evaluated through online testing.

Abstract

Deep reinforcement learning (DRL) is now a research focus for the energy management of fuel cell vehicles (FCVs) to improve hydrogen utilization efficiency. However, since DRL-based energy management strategies (EMSs) need to be retrained when the types of FCVs are changed, it is a laborious task to develop DRL-based EMSs for different FCVs. Given that, this article introduces transfer learning (TL) into DRL to design a novel deep transfer reinforcement learning (DTRL) method and then innovatively proposes an intelligent transferable energy management framework between two different urban FCVs based on the designed DTRL method to achieve the reuse of well-trained EMSs. To begin, an enhanced soft actor-critic (SAC) algorithm integrating prioritized experience replay (PER) is formulated to be the studied DRL algorithm in this article. Then, an enhanced-SAC based EMS of a light fuel cell hybrid electric vehicle (FCHEV) is pre-trained by using massive real-world driving data. After that, the learned knowledge stored in the FCHEV's well-trained EMS is captured and then transferred into the EMS of a heavy-duty fuel cell hybrid electric bus (FCHEB). Finally, the FCHEB's EMS is fine-tuned in a stochastic environment to ensure adaptability to real driving conditions. Simulation results indicate that, compared to the state-of-the-art baseline EMS, the proposed DTRL-based EMS accelerates the convergence speed by 91.55% and improves the fuel economy by 6.78%. This article contributes to shortening the development cycle of DRL-based EMSs and improving the utilization efficiency of hydrogen energy in the urban transport sector.

Introduction

Under the background of fossil fuel shortage and global climate change, popularizing new energy vehicles (NEVs) is a promising way to promote a transition to electrification and decarbonization for the road transport sector [1]. Since road transport accounts for at least 70% of emissions of China's overall transport sector, the development of NEVs can not only help ease the energy crisis but also reduce carbon emissions caused by fossil fuel utilization, which is needed to achieve carbon neutrality by 2060 in China [2,3].
Among three typical kinds of NEVs, hydrogen fuel cell vehicles (FCVs) are regarded to be the most promising since they have not only faster filling than battery electric vehicles (BEVs) but also zero-emission compared to conventional hybrid electric vehicles (HEVs). According to [4], China plans to have 1 million FCVs including hydrogen-powered commercial vehicles between 2030 and 2035. Due to the dynamic response lag of fuel cell systems (FCSs), the powertrain systems of FCVs typically contain more than one power source, in which FCSs are primary power sources while lithium batteries (LIBs) or ultracapacitors are used as auxiliary power sources to assist FCSs and conduct regenerative braking [5]. Therefore, FCVs are typically named fuel cell hybrid electric vehicles (FCHEVs), of which energy management strategies (EMSs) are needed to efficiently allocate power flow among different power sources, so as to improve the utilization efficiency of hydrogen energy and ensure the durability of hybrid powertrain systems.
According to control methodologies, EMSs of FCHEVs can be generally divided into three categories: rule-based EMSs, optimization-based EMSs, and reinforcement learning-based EMSs [5]. The rule-based EMSs are widely deployed in commercial FCHEVs thanks to their simplicity and efficient computation. However, the intuitive extraction and laborious calibration of control rules conducted by engineers make rule-based EMSs lack adaptability and far from optimality [6]. The optimization-based EMSs utilize specific optimization algorithms to seek the global optimal or near-optimal solutions of power allocation based on the modeled optimization problems. The most representative global optimal EMS is dynamic programming (DP) which needs to know the entire future driving cycles and thereby can only be used as an offline benchmark EMS with huge computational cost [7]. Near-optimal EMSs convert global optimization problems into instantaneous optimization or local optimization ones for online optimization, such as equivalent consumption minimization strategy (ECMS) [8] and model predictive control (MPC) [9]. Nevertheless, the tedious calibration of equivalent factors and prediction models makes it difficult to guarantee the optimality and adaptability of ECMS and MPC all the time. As for the control of FCHEVs equipped with multi-stack FCSs, decentralized optimization-based EMSs are effective solutions, but both the technical threshold and maintenance costs are pretty high [10].
Reinforcement learning (RL) is a key technology of artificial intelligence and has been developed with prosperity in recent years. It intends to use the agent to explore near-optimal control policies through continuous interactions between the agent and the environment via multiple trial-and-error [11]. Moreover, by leveraging deep neural networks (DNNs) to fit the multi-dimensional state input, deep reinforcement learning (DRL) is formulated and is more efficient than RL, which has been successfully deployed in the industrial community such as Alpha Go [12] and racing cars [13]. Perspicacious scholars have successfully explored RL and DRL for the development of HEVs' EMSs, proving that DRL-based EMSs are superior to rule-based and MPC-based EMSs with respect to both optimality and adaptability [14,15].
The Q-learning (QL) algorithm is a milestone of RL and is first utilized for the energy management of FCHEVs. In 2016, Hsu et al. [16] first designed a QL-based EMS for an FCHEV and demonstrated its superior optimization effect through a comparison with a rule-based EMS. After that, some variants of QL were designed to develop EMSs for FCHEVs and effectively enhanced the efficacy of QL, such as dual-reward QL [17], ECMS-QL [18], and decentralized QL [19]. However, since QL is a tabular algorithm to discretize all input states and output actions into grids, all these QL- and variant QL-based EMSs can only be used for cases with very few states and actions to avoid the “curse of dimensionality”. This undoubtedly limits the further improvement of optimization performance and thereby motivates the research of DRL-based EMSs for FCHEVs. Li et al. [20] first adopted DRL for the energy management of the FCHEV, in which an EMS based on the deep Q-learning (DQL) algorithm was proposed for a fuel cell hybrid electric bus (FCHEB) and improved the fuel economy by >10%. Afterward, EMSs based on prioritized experience replay (PER)-DQL [21] and double DQL [22] were proposed for FCHEVs and effectively improved the optimization effect of DQL-based EMSs. Nevertheless, DQL and its variants can only deal with continuous input states, but their output actions are still discrete, which usually leads to discretization errors. Since energy management problems for FCHEVs are typically continuous, the deep deterministic policy gradient (DDPG) algorithm then replaces DQL and becomes the state-of-the-art energy management method for FCHEVs [[23], [24], [25], [26]]. DDPG is a policy-based continuous DRL algorithm within the actor-critic framework and significantly improves the efficacy of DQL-based EMSs with respect to training efficiency, optimization effect, and self-learning ability. Furthermore, two of DDPG's successor algorithms including twin delayed deep deterministic policy gradient (TD3) and soft actor-critic (SAC) are successfully explored to develop more efficient EMSs for FCHEVs by dealing with DDPG-based EMSs' several inherent defects, including eliminating the overestimation in DDPG and boosting the exploration competence of DDPG [27,28]. However, although there is a significant breakthrough in DRL-based EMSs for FCHEVs, all DRL-based EMSs are suffering from time-consuming offline training through iterative interactions between the agent and the environment. Moreover, the offline training process needs to be conducted again even when encountering a new but similar energy management task.
Inspired by the knowledge transfer in psychology science, transfer learning (TL) is emerged, which is a machine learning technique indicating that knowledge gained from solving one problem (i.e., source domain) can be reused and applied to a different but related one (i.e., target domain) [29]. TL weakens the requirement of the independent and identically distribution condition that training data and test data must meet, thus avoiding retraining from scratch when encountering new similar tasks. TL has been applied in many fields, such as computer vision [30], natural language processing [31], and even smart buildings [32]. Inspired by this, some pioneer scholars have integrated TL with DRL and proposed the deep transfer reinforcement learning (DTRL) method to facilitate the development of cross-type DRL-based EMSs for different HEVs. Lian et al. [33] introduced TL into DDPG-based EMSs among different types of HEVs and achieved knowledge transfer between two HEVs with significantly different structures. He et al. [34] integrated a DDPG-based EMS with TL and multi-state traffic information within a cyber-physical system framework, and efficiently accelerated the convergence speed by transferring knowledge between two different power-split HEVs. Besides DTRL-based cross-type EMSs, both Guo et al. [35] and Xu et al. [36] adopted DTRL to transfer and reuse the converged DDPG-based EMSs in different new driving cycles of HEVs, which not only improved the energy efficiency but also enhanced the adaptability of transferred EMSs.
According to previous research, there is significant progress made by scholars in DRL-based EMSs for FCHEVs and even DTRL-based EMSs for HEVs, which is favorable for fuel conservation and emission reduction in the transport sector. Nevertheless, relevant research can be further perfected due to some major deficiencies as follows.
  • (1)
    Even the cutting-edge DRL algorithm such as SAC used for energy management still has inherent drawbacks including difficult hyperparameters tuning and unstable convergence, which is unfavorable for the improvement of energy efficiency. Moreover, current research lacks the study of the SAC algorithm, and only a few papers related to SAC-based EMSs for FCHEVs have been published. Therefore, the utility value of SAC needs further exploration, and the research on SAC-based EMSs awaits to be enriched.
  • (2)
    The DTRL-based EMSs for NEVs at present are only developed for conventional engine-battery HEVs, and the research on transferability and reusability of EMSs for FCHEVs has not been conducted yet. Besides, current DTRL methods for energy management are all designed based on DDPG which is inferior to other cutting-edge DRL algorithms such as SAC. Therefore, superior DRL algorithms are needed to design more intelligent DTRL methods for the development of transferable EMSs for FCHEVs.
  • (3)
    Current DTRL-based cross-type EMSs only transfer the DNNs' mature parameters to initialize new DNNs but ignore the transfer of experience replay buffers which contain abundant learned knowledge. This leads to empty experience replay buffers of new energy management tasks at first and thereby is unfavorable for the training efficiency improvement of the transferred EMSs.
  • (4)
    Most DRL-based EMSs for FCHEVs are trained using standard driving cycles different from real driving data. This usually causes unsatisfactory performances, especially the adaptability to real-world speed profiles. Since the purpose of developing DRL-based EMSs is for online application and adaptability is the premise of online application, the adaptability of DRL-based EMSs needs to be especially verified.
To bridge the aforementioned research gaps, this paper integrates DRL with TL and then proposes a DTRL-based transferable energy management framework between two different urban FCVs to reuse the well-trained DRL-based EMSs. Compared to previous research, the main contributions of this paper lie in five aspects as follows.
  • (1)
    An enhanced SAC algorithm combined SAC with the PER mechanism is innovatively formulated as a more intelligent DRL method to both accelerate the convergence speed and improve the learning ability of SAC.
  • (2)
    A novel DTRL method is designed by integrating the enhanced SAC algorithm with TL, and then a cross-type transferable energy management framework is proposed based on the designed DTRL method to shorten the development cycle of DRL-based EMSs for different types of urban FCVs.
  • (3)
    In contrast to previous research that only transfers the mature parameters of DNNs, this paper transfers not only the DNN parameters but also the PER buffer and the SumTree to fully reuse the learned knowledge.
  • (4)
    Both the source domain and the target domain of the proposed transferable energy management framework are trained using massive real-world collected driving data in stochastic training environments, to obtain a robust representation model in the source domain and ensure the adaptability of the compensation model in the target domain.
  • (5)
    The adaptability of the proposed DTRL-based EMS is especially verified under a synthetic driving cycle through online testing. The testing results indicate that the proposed EMS achieves 96.81% fuel economy of the global optimum with an impressive real-time performance for the online application.
To the best of our knowledge, this paper is a pioneer research work to design an enhanced SAC algorithm for the development of the DTRL method. More importantly, this paper is also the first attempt to develop the DTRL-based transferable EMS crossing different types of FCVs.
The remainder of this paper is organized as follows. Section 2 models two different FCVs including a light FCHEV and a heavy-duty FCHEB. Section 3 formulates the designed enhanced SAC algorithm and presents the proposed DTRL-based transferable energy management framework. Section 4 conducts detailed comparative experimental simulations, and major conclusions are summarized in Section 5.

Access through your organization

Check access to the full text by signing in through your organization.

Access through your organization

Section snippets

Configuration and parameters

The FCHEV and FCHEB studied in this paper share the same configuration but different parameters. The powertrain topology is shown in Fig. 1, which is a hybrid powertrain system consisting of a proton exchange membrane FCS, a LIB pack, a DC/DC converter, a DC/AC inverter, a driving motor, as well as a final drive. The FCS is the main power source and is connected to the DC-bus through a unidirectional DC/DC converter which regulates the FCS's output power and thereby protects the longevity of

Preliminaries of transfer learning

TL is a machine learning technique that reuses common knowledge gained from solving the problem in the source domain to solve a different but related new problem in the target domain [29]. The source domain represents the training task that needs to be pre-trained for the transfer, and the well-pre-trained model is named the representation model containing all learned knowledge that awaits to be transferred. The target domain is the new training task that reuses the knowledge transferred from

Experimental setup for verification

The experimental simulations are designed to verify the superiority of the proposed DTRL-based transferable energy management framework from three aspects including pre-training, fine-tuning, and online testing. The driving cycle datasets in stochastic environments for pre-training and fine-tuning are described in Section 2.2 and Section 3.3. The test driving cycle for online testing of the FCHEB's fine-tuned EMS is shown in Fig. 11, which is a synthetic full-sample cycle that is reconstructed

Conclusion

This paper designs a novel DTRL method by integrating DRL with TL and then proposes a DTRL-based cross-type transferable energy management framework between a light FCHEV and a heavy-duty FCHEB to shorten the development cycle of DRL-based EMSs and improve the utilization efficiency of hydrogen energy. The main conclusions are summarized as follows.
  • (1)
    An enhanced SAC algorithm is formulated by combining the standard SAC algorithm with the PER mechanism, which accelerates the convergence speed by

CRediT authorship contribution statement

Ruchen Huang: Writing – original draft, Visualization, Validation, Software, Methodology, Formal analysis, Data curation, Funding acquisition. Hongwen He: Writing – review & editing, Supervision, Conceptualization, Funding acquisition. Qicong Su: Software, Investigation, Formal analysis.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant No. 52172377), the BIT Research and Innovation Promoting Project (Grant No. 2022YCXZ018), and the China Scholarship Council Project (Grant No. 202306030016).

References (50)

  • X. Tang et al.

    Longevity-conscious energy management strategy of fuel cell hybrid electric vehicle based on deep reinforcement learning[J]

    Energy

    (2022)
  • C. Zheng et al.

    Reinforcement learning-based energy management strategies of fuel cell hybrid vehicles with multi-objective control[J]

    J Power Sources

    (2022)
  • W. Huo et al.

    Lifespan-consciousness and minimum-consumption coupled energy management strategy for fuel cell hybrid vehicles via deep reinforcement learning[J]

    Int J Hydrogen Energy

    (2022)
  • J. Zhou et al.

    Total travel costs minimization strategy of a dual-stack fuel cell logistics truck enhanced with artificial potential field and deep reinforcement learning[J]

    Energy

    (2022)
  • R. Huang et al.

    A novel data-driven energy management strategy for fuel cell hybrid electric bus based on improved twin delayed deep deterministic policy gradient algorithm[J]

    Int J Hydrogen Energy

    (2024)
  • L. Deng et al.

    Battery thermal-and cabin comfort-aware collaborative energy management for plug-in fuel cell electric vehicles based on the soft actor-critic algorithm[J]

    Energ Conver Manage

    (2023)
  • L.T. Duong et al.

    Detection of tuberculosis from chest X-ray images: boosting the performance with vision transformer and transfer learning[J]

    Exp Syst Appl

    (2021)
  • D. Coraci et al.

    Online transfer learning strategy for enhancing the scalability and deployment of deep reinforcement learning control in smart buildings[J]

    Appl Energy

    (2023)
  • J. Xu et al.

    A transferable energy management strategy for hybrid electric vehicles via dueling deep deterministic policy gradient[J]

    Green Energy Intellig Transp

    (2022)
  • M. Huang et al.

    Research on hybrid ratio of fuel cell hybrid vehicle based on ADVISOR[J]

    Int J Hydrogen Energy

    (2016)
  • Y. Zhou et al.

    Real-time cost-minimization power-allocating strategy via model predictive control for fuel cell hybrid electric vehicles[J]

    Energ Conver Manage

    (2021)
  • M. Yan et al.

    Hierarchical predictive energy management of fuel cell buses with launch control integrating traffic information[J]

    Energ Conver Manage

    (2022)
  • Q. Zhou et al.

    Transferable representation modelling for real-time energy management of the plug-in hybrid vehicle based on k-fold fuzzy learning and Gaussian process regression[J]

    Appl Energy

    (2022)
  • M. Li et al.

    Hierarchical predictive energy management of hybrid electric buses based on driver information[J]

    J Clean Prod

    (2020)
  • R. Huang et al.

    Battery health-aware and naturalistic data-driven energy management for hybrid electric bus based on TD3 deep reinforcement learning algorithm[J]

    Appl Energy

    (2022)
  • Cited by (19)

    • An enhanced salp swarm algorithm with chaotic mapping and dynamic learning for optimizing purge process of proton exchange membrane fuel cell systems

      2024, Energy
      Citation Excerpt :

      The research applying artificial intelligence methods to fuel cell systems primarily focuses on two aspects. On the one hand, researchers employ techniques such as deep reinforcement learning in the vehicle energy management system to optimize the power distribution of the fuel cell and the storage battery of the hybrid vehicles according to the driving conditions, thereby ensuring that the fuel cell operates in the high-efficiency zone and reduce the consumption of hydrogen [42–45]. On the other hand, researchers improve fuel cell system efficiency by optimizing the operational parameters of various internal modules within the fuel cell system.

    • A data-driven solution for intelligent power allocation of connected hybrid electric vehicles inspired by offline deep reinforcement learning in V2X scenario

      2024, Applied Energy
      Citation Excerpt :

      Recent research progress has witnessed the utilization of the DRL algorithms to enhance fuel economy and robustness. The typical DRL algorithms consist of deep Q-network (DQN), deep deterministic policy gradient (DDPG) [20], twin delayed deep deterministic policy gradient (TD3) [21,22], proximal policy optimization (PPO) [23,24], etc. The structures of the networks and principles are more complex with the modification of the algorithms [25].

    • Enabling cross-type full-knowledge transferable energy management for hybrid electric vehicles via deep transfer reinforcement learning

      2024, Energy
      Citation Excerpt :

      Unlike conventional experience replay, PER surpasses the simple storage of samples in a replay buffer by additionally preserving each sample's priority in a leaf node of a SumTree. In the process of PER-based sampling, samples with higher priorities are more likely to be selected, then allowing for efficient retrieval from the buffer [48]. However, according to the literature survey presented in Section 1, it is observed that existing research on TL-based EMSs for HEVs has been predominantly focused on the standard DRL algorithm.

    View all citing articles on Scopus
    View full text