Convolutional Neural Networks (CNNs) are a class of deep learning models primarily used for processing structured grid data, such as images. The way they capture and represent local vs. global information is through their architecture, which consists of layers of convolutional filters, pooling operations, and sometimes fully connected layers. Here’s a breakdown of how local and global information is handled:

  1. Local Information:

    • Convolutional Layers: Each convolutional filter in a CNN processes a small, fixed-sized patch of the input data (like a 3x3 or 5x5 patch of an image). This filter is designed to learn and detect local patterns or features, such as edges, textures, or colors.
    • Shared Weights: The same filter is applied across the entire image, making the CNN translation invariant. That is, if the network learns to recognize a pattern in one part of the image, it can also recognize it in another part.
    • Depth of the CNN: As you go deeper into a CNN, the receptive field of neurons increases. This means that while initial layers might capture simple and very local patterns (like edges), deeper layers capture more complex and slightly broader patterns (like shapes).
  2. Global Information:

    • Pooling Layers: Pooling operations (often max-pooling) reduce the spatial dimensions of the feature maps. This means that as you move through the layers of a CNN, the spatial resolution decreases, but the depth (number of channels or feature maps) often increases. This process allows the network to gradually transition from local features to more global, abstract representations.
    • Fully Connected Layers: Near the end of many CNN architectures, there are one or more fully connected (dense) layers. At this stage, the spatial structure is generally lost, and the network makes decisions based on global information extracted from the entire input. These layers integrate information from all parts of the image.
    • Skip Connections/Residual Connections: In more advanced architectures like ResNets, skip or residual connections allow information to bypass certain layers. This can help in retaining both local and global information, as well as addressing the vanishing gradient problem in very deep networks.
  3. Combining Local and Global Information:

    • Dilated Convolutions: These allow the network to have a larger receptive field without increasing the number of parameters. It helps in capturing global context with fewer layers.
    • U-Net Architecture: Used often in image segmentation tasks, the U-Net structure captures both local and global information by contracting and then expanding the feature maps, with skip connections between corresponding layers.
    • Attention Mechanisms: Introduced in models like the Transformer and later adapted for CNNs, attention mechanisms allow a model to focus on different parts of the input based on context, which can help in capturing both local and global dependencies.

To summarize, while the initial layers of a CNN are primarily concerned with local features, the combination of pooling and fully connected layers allows the network to also consider global context. Through the interplay of these layers and architectural innovations, CNNs effectively represent and integrate both local and global information from input data.

what’s a good citation that discusses the relationship between depth and receptive field in CNNs?

One of the classic papers that discusses the relationship between depth and receptive field in CNNs, and more generally the advantages of depth in convolutional architectures, is:

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR).

This paper introduced the VGG networks, which are known for their simplicity and depth. The authors systematically increased the depth of the network to understand its effects, making it a fitting reference for understanding the impact of depth on receptive fields and network performance.

If you’re looking into how receptive field changes with depth and how to calculate it, you might also consider other works or online resources that provide intuitive explanations and formulas for receptive field calculations.

The concept of receptive fields is fundamental in CNNs, and many papers touch on it. Here are a few that delve more deeply into receptive fields or utilize the concept in novel ways:

  1. “Network Dissection: Quantifying Interpretability of Deep Visual Representations” by Bolei Zhou, David Bau, Aude Oliva, and Antonio Torralba.

    • This paper aims to understand the interpretability of CNN features by examining the receptive fields and their relationship to specific semantic concepts.
  2. “A guide to convolution arithmetic for deep learning” by Vincent Dumoulin and Francesco Visin.

    • This work offers a visual guide to convolution arithmetic in the context of deep learning. It provides insight into how the receptive field grows with different convolution and pooling operations.
  3. “Receptive Field Block Net for Accurate and Fast Object Detection” by Songtao Liu, Di Huang, and Yunhong Wang.

    • This paper introduces a new module to adjust the receptive fields in CNNs for better object detection, emphasizing the significance of capturing appropriate context for detection tasks.
  4. “Dilated Convolutions and Kronecker Factored Convolutions” by Fisher Yu and Vladlen Koltun.

    • Dilated convolutions introduce a way to expand the receptive field without increasing the number of parameters. This paper discusses the concept in detail.
  5. “Understanding Deep Image Representations by Inverting Them” by Aravindh Mahendran and Andrea Vedaldi.

    • While not exclusively about receptive fields, this paper dives into understanding deep image representations, which implicitly requires an understanding of how receptive fields operate.
  6. “Aggregated Residual Transformations for Deep Neural Networks” by Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He.

    • The authors delve into the architecture of ResNeXt, touching upon the receptive field as part of their exploration.

When diving into any of these papers, it’s always a good idea to also explore their cited references and any subsequent works that cite them. The concept of receptive fields is pervasive and continually explored, so there are numerous valuable resources out there.