This paper addresses the scalability and robustness issues of estimating labels from imbalanced unlabeled data for unsupervised video-based person re-identification (re-ID). To achieve it, we propose a novel Robust AnChor Embedding (RACE) framework via deep feature representation learning for large-scale unsupervised video re-ID. Within this framework, anchor sequences representing different persons are firstly selected to formulate an anchor graph which also initializes the CNN model to get discriminative feature representations for later label estimation. To accurately estimate labels from unlabeled sequences with noisy frames, robust anchor embedding is introduced based on the regularized affine hull. Efficiency is ensured with kNN anchors embedding instead of the whole anchor set under manifold assumptions. After that, a robust and efficient top-k counts label prediction strategy is proposed to predict the labels of unlabeled image sequences. With the newly estimated labeled sequences, the unified anchor embedding framework enables the feature learning process to be further facilitated. Extensive experimental results on the large-scale dataset show that the proposed method outperforms existing unsupervised video re-ID methods.