The past few years have witnessed significant growth in the computational capabilities of GPUs. The race for computing performance makes the uses of many-core accelerators more necessary. However, GPUs consume a significant amount of energy as compared with CPUs. One way to reduce the energy consumption is to scale the speed and/or voltage of the processor. Typically, the faster the processor runs, the faster we finish jobs, but the more power is required by the processor. It is hence important to balance between performance and power consumption. In this paper, we consider the following scheduling problem. We have a set of jobs to be assigned to different processors. Each job may have different characteristics depending on the type of processor that it is assigned to. The goal is to minimize the total energy consumption. After proving the NP-hardness of this problem, we propose a constant approximation algorithm for the case when processors can scale to any continuous speed. When processors have a set of discrete speeds, we propose a heuristic algorithm and compare with some classical scheduling algorithms experimentally. Then, we extend this heuristic to the online case where jobs arrive over time. Our simulation results show that the proposed heuristic algorithms are effective and can achieve near-optimal performance.