Here is an image explaining several aspects of a CG pinhole camera and how the physical lenses are simulated to achieve the depth of field effect.
If you look at the image above you can see a little symbol of a camera on the left side. Basically it's a pyramid with it's top at the center of the green circle and it's base slighty to the right of it. An additional triangle indicates the up-vector of the camera.
So in the real world we would have a little hole in a box where the center of the green circle on the left is. The rest of the box would be left to that green circle and we would fix some film material opposite that pinhole and expose it to light falling through that little opening into the box.
Independent of what names people use to describe a pinhole camera in computer graphics (CG), basically all we need to know is the aspect ratio of the image (which in the simplest case we can derive from the number of pixels in x- and y-direction) and the field of view (FOV) angle (either horizontal FOV or vertical FOV). Even though the real film material would be located in the box left of the green circle, we can misuse the cyan rectangle as the area where we would calculate the resulting image for our camera.
If we calculate our image by shooting rays into a virtual scene by using a pinhole camera we miss one important aspect of real photography, the depth of field, where certain parts of the image seem to be in focus and other parts of the image seem to be blurry or out of focus.
So let me explain how this can be simulated in CG. Instead of shooting one or several rays per pixel from the origin of the camera we first define a focus plane. Let's misuse the cyan rectangle again for that purpose, even though the focus plane is different from the plane our image is calculated. This is part of the confusion, because some people use the aperture and focal length to define the field of view angle. The focal plane has a certain distance from the camera origin along the axis the camera is looking. And that distance is different from the focal length or the distance to the image plane. Use whatever name you are familiar with but I will refer to the focal plane.
Let's look at the image again. I want to shoot a ray into the scene from the origin of the camera (in the center of the green circle on the left side) to the center pixel on that cyan area (which is used for both, the image and the focal plane in this screenshot). You see a green and a red line crossing in the camera location. The ray I want to shoot is going along the red line and hits the cyan plane just below the center of the magenta circle, goes through that plane and the center of the lower green circle on the right. For the depth of field effect we now calculate several origins for other rays which are all hitting the focal plane at exacly the same point (the center of the cyan plane and just below the center of the magenta circle). The results of all those rays will be averaged and stored at the center of our image plane.
Now consider an object to be close to our focal plane. If the point we hit on that object is exactly on the focal plane, all our rays will hit exactly the same geometry, have the same normal, but slighlty different incident rays. So the lighting calculations will be more or less the same and the average will not differ much from each individual result of those rays. The impression we will get for those points close to the focus plane is that we perceive them as being in focus. Whereas the different rays hitting objects in front or behind the focus plane hit different points in space, whether they belong to the same object or not. So they will not only hit different points in space, but also potentially different objects, very likely will result in different surface normals at those points, use different textures and/or uv-coordinates, and therefore the lighting calculations for the individual points in space might be very different. Of course we can store only an average for all the rays we shoot, but the perception will be that those objects are out of focus.
So this basically describes how a lens is simulated with a pinhole camera and a lens shader averaging several rays shot into the scene for a single pixel in our resulting image.
So now let's predict which areas of our resulting image might be influenced by a single point in space.
As described before our first rays shot for the center of our image (and focus) plane cover only a single pixel but the green circle around our camera origin (left side) will be mirrored to the lower green circle on the right side of the focal plane. In theory we should even talk about cones (left and right of the focus plane) which cover the area of potential object hits. Let's examine the center of the blue circle on the right side. It's part of the cones for the center pixel, but it's also part of those cones for other pixels, for example for the top green circle on the right side, which corresponds to the pixel just above the center of the magenta circle.
So the whole blue circle area can be described by rolling the lower green circle around the center of the blue circle. The blue circle therefore has twice the radius of the green circle, and the radius of the circles on the right side are dependent on the circle around the camera (defining our lens) and the distance from the focal plane. Once we calculated the area of the blue circle, we can project it back onto the focus plane (and the image plane) and the magenta circle describes basically the area and pixels in the image plane which are influenced by the point in space where the center of the blue circle is. Obviously it's contribution is dependent on how many rays we shoot per pixel and we have no guarantee that we actually hit that particular point in space, because our ray origins were selected randomly within the green circular area around the camera location.