Image-set-based face recognition has recently attracted much attention due to widespread of surveillance and video retrieval applications. Extraction of partial and misaligned face images from a video is relatively common in unconstrained scenarios and in the presence of detection/localization error, respectively. However, existing face recognition techniques that consider holistic image-set representation would not perform well under such conditions. In this paper, we introduce a local image-set-based face recognition approach to address this issue, where each image set is represented by a cluster set of keypoint descriptors and similarity between image sets is measured by the distance between the corresponding sets of clusters. Our representation is robust to misalignment because the extraction of descriptors is carried out without respect to the absolute face position. Additionally, our approach is robust to partial face occlusion due to that (1) descriptors corresponding to non-occluded keypoints are not affected by the occluded keypoints, (2) matching decision is contributed only by distances between the matched cluster pairs corresponding to the non-occluded facial parts. Extensive experiment evaluation shows that our approach is able to achieve very promising recognition rates.