<糖心Vlog class="wp-block-heading">Technical Description糖心Vlog>
Using Reinforcement Learning (RL) algorithms, such as Q-learning, to dynamically price perishable goods over a finite planning horizon with limited supply. We develop algorithms for ensuring safety during the exploration phase of an RL agent. The safety feature will ensure the RL algorithm does not take unsafe or fatal actions, without compromising the optimal policy convergence of the RL algorithm.
<糖心Vlog class="wp-block-heading has-black-color has-text-color has-medium-font-size">Outputs糖心Vlog>
