Hmm, both of the suggested approaches work well for picking triangles, but may not be quite adequate for vertices.
Since vertices have no area at all, picking them boils down to checking if mouse is in vicinity of their 2D projections.
Ray cast approach that Calinou suggested may result in some erratic misses when picking vertices that appear on the mesh contour. If user clicks near enough to a vertex but still outside the mesh - ray hit test would fail. In some cases (e.g. spikes) the area to click becomes frustratingly small.
With second approach; drawing vertices as differently colored circles and then reading a mouse pixel from buffer, handling non-convex meshes becomes a problem. Depth buffer can't really help us here so we manually need to determine which vertices are obscured and then not render them. This could be problematic performance-wise since we need to iterate through all vertices and discriminate based on their visibility. This is a lot of ray casting per frame and on top of that we have no other choice but to render in some sort of immediate mode.
So after some pondering I came up with this:
For each vertex, project it into screen space and check if mouse is near enough to be picked. If yes, see if it's visible by casting a ray from camera to that vertex. If ray doesn't hit the mesh before reaching it, the vertex is visible and we add it to the list of hit vertices. If we end up with more than one vertex, pick the one nearest to the camera.
It's still a lot of iteration per frame (depending on vertex count), but costly visibility checks are brought down to minimum. Doing it in C probably wouldn't cause much of a performance hit.
In the end it boils down to general picking problem, but engine can at least handle visibility checks.