Sepsis is a leading cause of mortality in hospitals, but its optimal treatment strategy still remains unclear. Recent years have witnessed several successful applications of Reinforcement Learning (RL) approaches in sepsis treatment, achieving far more efficient strategies than those by clinicians. To ensure such applications, an explicit reward function encoding medical domain knowledge should be specified beforehand to indicate the goal of learning. However, due to the paucity of clear understanding of sepsis itself, there is still considerable inconsistency in the formulation of reward functions for sepsis treatment. In this poster, we address the reward learning problem in RL for treatment of sepsis, which has been largely neglected by previous studies. A deep inverse RL with Mini-Tree (DIRL-MT) model is proposed to infer the best reward functions from a set of presumably optimal treatment trajectories using retrospective real medical data. In the model, the MT component learns the factors that are most important in influencing the mortality during sepsis treatment, while the DIRL component infers the complete reward function in terms of weights of those factors. Our work shows that PaO2 and PT can play a vital role and should be paid more attention in the design of more efficient treatment strategies for sepsis in the future.