Neural surface reconstruction aims to reconstruct accurate 3D surfaces based on multi-view images. Previous methods based on neural volume rendering mostly train a fully implicit model, and they require hours of training for a single scene. Recent efforts explore the explicit volumetric representation, which substantially accelerates the optimization process by memorizing significant information in learnable voxel grids. However, these voxel-based methods often struggle in reconstructing fine-grained geometry. Through empirical studies, we found that high-quality surface reconstruction hinges on two key factors: the capability of constructing a coherent shape and the precise modeling of color-geometry dependency. In particular, the latter is the key to the accurate reconstruction of fine details. Inspired by these findings, we develop Voxurf, a voxel-based approach for efficient and accurate neural surface reconstruction, which consists of two stages: 1) leverage a learnable feature grid to construct the color field and obtain a coherent coarse shape, and 2) refine detailed geometry with a dual color network that captures precise color-geometry dependency. We further introduce a hierarchical geometry feature to enable information sharing across voxels. Our experiments show that Voxurf achieves high efficiency and high quality at the same time. On the DTU benchmark, Voxurf achieves higher reconstruction quality compared to state-of-the-art methods, with 20x speedup in training.