Integrating media elements of various medium, multimedia is capable of expressing complex information in a neat and compact way. Early studies have linked different sensory presentation in multimedia with the perception of human-like concepts. Yet, the richness of information in multimedia makes understanding and predicting user perceptions in multimedia content a challenging task both to the machine and the human mind. This paper presents a novel multi-task feature extraction method for accurate prediction of user perceptions in multimedia content. Differentiating from the conventional feature extraction algorithms which focus on perfecting a single task, the proposed model recognizes the commonality between different perceptions (e.g., interestingness and emotional impact), and attempts to jointly optimize the performance of all the tasks through uncovered commonality features. Using both a media interestingness dataset and a media emotion dataset for user perception prediction tasks, the proposed model attempts to simultaneously characterize the individualities of each task and capture the commonalities shared by both tasks, and achieves better accuracy in predictions than other competing algorithms on real-world datasets of two related tasks: MediaEval 2017 Predicting Media Interestingness Task and MediaEval 2017 Emotional Impact of Movies Task.