[AI 기초 다지기] ResNet identity mappings & Dilated convolutions 논문 분석 및 코드 구현

머신러닝 & 딥러닝/딥러닝

[AI 기초 다지기] ResNet identity mappings & Dilated convolutions 논문 분석 및 코드 구현

Haru_29 2024. 11. 10. 13:31

Identity Mappings in Deep Residual Networks

1. 연구 배경과 핵심 아이디어

1.1 기존 ResNet의 구조

class OriginalResidualUnit(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        
        # 기존 ResNet의 기본 구조
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
    def forward(self, x):
        identity = x
        
        # 기존의 순서: conv -> bn -> relu
        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu(out)
        
        out = self.conv2(out)
        out = self.bn2(out)
        
        out += identity
        out = F.relu(out)  # 덧셈 후 활성화
        
        return out

1.2 항등 매핑의 중요성

수식적 분석:

class ResNetAnalysis:
    def forward_propagation(self, x_l, L):
        """순전파 분석"""
        # x_L = x_l + sum(F(x_i, W_i)) for i from l to L-1
        propagation = {
            'direct_path': 'x_l',  # 항등 매핑
            'residual_path': 'sum(F(x_i))',  # 잔차 함수
            'total': 'x_l + sum(F(x_i))'
        }
        return propagation
        
    def backward_propagation(self, x_l, L):
        """역전파 분석"""
        # ∂E/∂x_l = ∂E/∂x_L * (1 + ∂/∂x_l sum(F(x_i)))
        gradient = {
            'direct_term': '∂E/∂x_L',  # 직접적 그래디언트
            'residual_term': '∂E/∂x_L * ∂/∂x_l sum(F(x_i))',  # 잔차 그래디언트
        }
        return gradient

2. 새로운 잔차 유닛 설계

2.1 Pre-activation 구조

class PreActivationResUnit(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        
        # 새로운 순서: bn -> relu -> conv
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1)
        
    def forward(self, x):
        identity = x
        
        # pre-activation 순서
        out = self.bn1(x)
        out = F.relu(out)
        out = self.conv1(out)
        
        out = self.bn2(out)
        out = F.relu(out)
        out = self.conv2(out)
        
        out += identity  # 순수한 항등 매핑
        return out

2.2 다양한 활성화 함수 위치 실험

class ActivationExperiments:
    def __init__(self):
        self.variants = {
            'original': {
                'order': ['conv', 'bn', 'relu'],
                'error_rate': 6.61
            },
            'bn_after_addition': {
                'order': ['conv', 'bn', 'relu', 'addition', 'bn'],
                'error_rate': 8.17
            },
            'relu_before_addition': {
                'order': ['conv', 'bn', 'relu', 'conv', 'bn', 'relu', 'addition'],
                'error_rate': 7.84
            },
            'full_preactivation': {
                'order': ['bn', 'relu', 'conv', 'bn', 'relu', 'conv', 'addition'],
                'error_rate': 6.37
            }
        }

3. 실험 결과와 분석

3.1 CIFAR-10/100 실험 결과

class ExperimentalResults:
    def cifar_results(self):
        return {
            'CIFAR-10': {
                'ResNet-110_original': 6.61,
                'ResNet-110_preact': 6.37,
                'ResNet-164_original': 5.93,
                'ResNet-164_preact': 5.46,
                'ResNet-1001_original': 7.61,
                'ResNet-1001_preact': 4.92
            },
            'CIFAR-100': {
                'ResNet-164_original': 25.16,
                'ResNet-164_preact': 24.33,
                'ResNet-1001_original': 27.82,
                'ResNet-1001_preact': 22.71
            }
        }

3.2 학습 동적 분석

class TrainingDynamics:
    def analyze_convergence(self):
        metrics = {
            'optimization_speed': {
                'original': 'Slower initial convergence',
                'preactivation': 'Faster initial convergence'
            },
            'gradient_flow': {
                'original': 'Potential gradient degradation',
                'preactivation': 'Improved gradient propagation'
            },
            'stability': {
                'original': 'Less stable for very deep networks',
                'preactivation': 'More stable training process'
            }
        }
        return metrics

4. 주요 발견과 통찰

4.1 깊이에 따른 성능 변화

class DepthAnalysis:
    def depth_impact(self):
        findings = {
            'shallow_networks': {
                'difference': 'Minimal impact',
                'reason': 'Gradient propagation still manageable'
            },
            'deep_networks': {
                'difference': 'Significant improvement',
                'reason': 'Better gradient flow in pre-activation'
            },
            'very_deep_networks': {
                'improvement': 'Dramatic (e.g., ResNet-1001)',
                'key_factors': [
                    'Clean information path',
                    'Better gradient propagation',
                    'Improved regularization'
                ]
            }
        }
        return findings

4.2 정규화 효과

class RegularizationEffect:
    def analyze_regularization(self):
        effects = {
            'batch_normalization': {
                'original': 'Applied after convolution',
                'preactivation': 'Applied to residual branches only'
            },
            'training_loss': {
                'original': 'Lower training loss',
                'preactivation': 'Higher training loss but better generalization'
            },
            'feature_statistics': {
                'original': 'Less normalized features',
                'preactivation': 'Better normalized features'
            }
        }
        return effects

5. 실제 구현 가이드

5.1 네트워크 구성

class ResNetImplementation:
    def __init__(self, depth=1001):
        super().__init__()
        
        # 네트워크 구성
        self.layers = self._make_layers(depth)
        
    def _make_layers(self, depth):
        n = (depth - 2) // 9  # bottleneck blocks 수
        layers = []
        
        # 각 스테이지별 레이어 구성
        layers.extend(self._make_stage(64, n))
        layers.extend(self._make_stage(128, n, stride=2))
        layers.extend(self._make_stage(256, n, stride=2))
        
        return nn.Sequential(*layers)

5.2 학습 설정

class TrainingConfig:
    def __init__(self):
        self.config = {
            'optimizer': {
                'type': 'SGD',
                'learning_rate': 0.1,
                'momentum': 0.9,
                'weight_decay': 0.0001
            },
            'learning_rate_schedule': {
                'milestones': [32000, 48000],
                'gamma': 0.1
            },
            'batch_size': 128,
            'epochs': 100
        }

6. 결론과 향후 연구 방향

6.1 주요 기여점

깊은 신경망에서의 정보 흐름 분석
Pre-activation 구조의 이점 입증
1001층 네트워크의 성공적 학습

6.2 실용적 권장사항

class PracticalRecommendations:
    def get_recommendations(self):
        return {
            'architecture': 'Use pre-activation residual units',
            'depth': 'Can go deeper with pre-activation',
            'optimization': 'Standard SGD works well',
            'regularization': 'Batch normalization is sufficient',
            'initialization': 'Standard He initialization'
        }

Dilated Convolutions를 활용한 Multi-scale Context Aggregation

1. 서론과 핵심 개념

1.1 기존 문제점

def analyze_dense_prediction_challenges():
    return {
        'resolution_loss': {
            'problem': '기존 CNN의 pooling으로 인한 해상도 손실',
            'impact': '정밀한 segmentation 어려움'
        },
        'context_reasoning': {
            'problem': '다중 스케일 문맥 정보 처리 필요',
            'solution': 'Dilated convolution 도입'
        },
        'current_approaches': {
            'upsampling': '해상도 복구 시도',
            'multi_scale_input': '여러 스케일의 입력 이미지 사용'
        }
    }

1.2 Dilated Convolution의 수학적 정의

class DilatedConvolution:
    def __init__(self, dilation_factor):
        self.dilation_factor = dilation_factor
        
    def compute(self, F, k, p):
        """
        F: 입력 함수
        k: 필터
        p: 위치
        """
        result = 0
        for t in range(len(k)):
            for s in range(len(F)):
                if s + self.dilation_factor * t == p:
                    result += F[s] * k[t]
        return result

2. 시스템 아키텍처

2.1 기본 모듈 구조

class ContextModule(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        
        # 기본 컨텍스트 모듈 구성
        self.layers = nn.ModuleList([
            DilatedConvBlock(dilation=1),
            DilatedConvBlock(dilation=1),
            DilatedConvBlock(dilation=2),
            DilatedConvBlock(dilation=4),
            DilatedConvBlock(dilation=8),
            DilatedConvBlock(dilation=16),
            DilatedConvBlock(dilation=1),
            nn.Conv2d(num_classes, num_classes, 1)
        ])

2.2 수용 영역(Receptive Field) 확장

class ReceptiveFieldAnalysis:
    def compute_receptive_field(self, layer_idx):
        """각 레이어의 수용 영역 크기 계산"""
        if layer_idx == 0:
            return 3 # 첫 레이어의 3x3 필터
        
        prev_rf = self.compute_receptive_field(layer_idx - 1)
        dilation = 2 ** (layer_idx - 1) if layer_idx < 6 else 1
        current_rf = prev_rf + 2 * (dilation * 2)
        
        return current_rf

3. 구현 세부사항

3.1 Dilated Convolution Block

class DilatedConvBlock(nn.Module):
    def __init__(self, dilation):
        super().__init__()
        self.conv = nn.Conv2d(
            in_channels=C,  # 입력 채널 수
            out_channels=C, # 출력 채널 수
            kernel_size=3,
            padding=dilation,
            dilation=dilation
        )
        self.bn = nn.BatchNorm2d(C)
        self.relu = nn.ReLU(inplace=True)
    
    def forward(self, x):
        return self.relu(self.bn(self.conv(x)))

3.2 Front-end 모듈

class FrontEnd(nn.Module):
    def __init__(self):
        super().__init__()
        # VGG16 기반 구조에서 마지막 pooling layers 제거
        self.features = models.vgg16(pretrained=True).features
        # pooling layers 제거 및 dilation 적용
        self.modify_for_dilation()
        
    def modify_for_dilation(self):
        """VGG16 수정하여 dilated convolution 적용"""
        # pooling layers 제거
        # subsequent convolutions에 dilation 적용

4. 실험 결과

4.1 Pascal VOC 2012 결과

class VOCResults:
    def get_performance_metrics(self):
        return {
            'mean_iou': {
                'front_end_only': 71.3,
                'front_end_context': 73.5,
                'front_end_context_crf': 74.7,
                'front_end_context_crfrnn': 75.3
            },
            'improvements': {
                'context_module': '+2.2%',
                'structured_prediction': '+1.8%'
            }
        }

4.2 실제 예시 분석

class QualitativeAnalysis:
    def analyze_results(self):
        observations = {
            'context_benefits': {
                'boundary_accuracy': '객체 경계 더 정확',
                'global_consistency': '전역적 문맥 고려 향상',
                'small_objects': '작은 객체 인식 개선'
            },
            'failure_cases': {
                'complex_scenes': '복잡한 장면에서 어려움',
                'rare_classes': '희귀 클래스 인식 한계',
                'occlusions': '가려진 객체 처리 문제'
            }
        }
        return observations

5. 도시 장면 이해

5.1 CamVid 데이터셋

class CamVidExperiment:
    def setup_training(self):
        config = {
            'image_size': (640, 480),
            'num_classes': 11,
            'context_layers': 8,
            'training_steps': {
                'frontend': 20000,
                'joint_training': {
                    'crop_size': 852,
                    'batch_size': 1,
                    'learning_rate': 1e-5
                }
            }
        }
        return config

5.2 Cityscapes 데이터셋

class CityscapesExperiment:
    def get_network_config(self):
        return {
            'image_size': (2048, 1024),
            'context_layers': 10,  # 추가 레이어
            'training_stages': {
                'frontend': {
                    'iterations': 40000,
                    'batch_size': 8
                },
                'context': {
                    'iterations': 24000,
                    'learning_rate': 1e-4
                },
                'joint': {
                    'iterations': 60000,
                    'learning_rate': 1e-5
                }
            }
        }

6. 구현 팁과 모범 사례

6.1 메모리 최적화

class MemoryOptimization:
    def get_optimization_tips(self):
        return {
            'batch_size': '이미지 크기에 따라 조정',
            'feature_maps': '메모리 사용량 모니터링',
            'gradient_checkpointing': '필요시 적용'
        }

6.2 학습 전략

class TrainingStrategy:
    def get_best_practices(self):
        return {
            'initialization': {
                'frontend': 'ImageNet 사전 학습 가중치 사용',
                'context': '신중한 초기화 필요'
            },
            'learning_rate': {
                'schedule': 'multi-stage 훈련',
                'warmup': '낮은 학습률로 시작'
            },
            'data_augmentation': {
                'cropping': '큰 크기로 크롭',
                'reflection_padding': '경계 처리에 활용'
            }
        }

저작자표시

'머신러닝 & 딥러닝 > 딥러닝' 카테고리의 다른 글

[AI 기초 다지기] Neural Turing Machine & Deep Speech 2 논문 분석 및 코드 구현 (3)	2024.11.12
[AI 기초 다지기] RNN & LSTM 논문 분석 및 코드 구현 (10)	2024.11.11
[AI 기초 다지기] 스탠포드 대학 딥러닝 기초(4) - Convolutional Neural Networks (6)	2024.11.08
[AI 기초 다지기] 스탠포드 대학 딥러닝 기초(3) - Neural Networks (1)	2024.11.07
[AI 기초 다지기] 스탠포드 대학 딥러닝 기초(2) - Optimization: Stochastic Gradient Descent & Backpropagation, Intuitions (2)	2024.11.05

현재글[AI 기초 다지기] ResNet identity mappings & Dilated convolutions 논문 분석 및 코드 구현

Haru's 개발 블로그

티스토리챌린지, 샘 올트먼, 스탠포드, 경사 하강법, torchao, Objective-C, 챗봇, flux 모델 최적화, 프롬프트 엔지니어, Image Classification, 딥러닝, flux.1-dev, TCA, Neural Networks, Diffusion, linear classification, 오블완, SwiftUI, torch.compile(), OpenAI,

Today :
Yesterday :

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

[AI 기초 다지기] ResNet identity mappings & Dilated convolutions 논문 분석 및 코드 구현

Identity Mappings in Deep Residual Networks

1. 연구 배경과 핵심 아이디어

1.1 기존 ResNet의 구조

1.2 항등 매핑의 중요성

2. 새로운 잔차 유닛 설계

2.1 Pre-activation 구조

2.2 다양한 활성화 함수 위치 실험

3. 실험 결과와 분석

3.1 CIFAR-10/100 실험 결과

3.2 학습 동적 분석

4. 주요 발견과 통찰

4.1 깊이에 따른 성능 변화

4.2 정규화 효과

5. 실제 구현 가이드

5.1 네트워크 구성

5.2 학습 설정

6. 결론과 향후 연구 방향

6.1 주요 기여점

6.2 실용적 권장사항

Dilated Convolutions를 활용한 Multi-scale Context Aggregation

1. 서론과 핵심 개념

1.1 기존 문제점

1.2 Dilated Convolution의 수학적 정의

2. 시스템 아키텍처

2.1 기본 모듈 구조

2.2 수용 영역(Receptive Field) 확장

3. 구현 세부사항

3.1 Dilated Convolution Block

3.2 Front-end 모듈

4. 실험 결과

4.1 Pascal VOC 2012 결과

4.2 실제 예시 분석

5. 도시 장면 이해

5.1 CamVid 데이터셋

5.2 Cityscapes 데이터셋

6. 구현 팁과 모범 사례

6.1 메모리 최적화

6.2 학습 전략

'머신러닝 & 딥러닝 > 딥러닝' 카테고리의 다른 글

'머신러닝 & 딥러닝/딥러닝'의 다른글

관련글

티스토리툴바