[LLM] Finetuning시 세부 파라미터 정리

머신러닝 & 딥러닝/LLM

[LLM] Finetuning시 세부 파라미터 정리

Haru_29 2024. 4. 8. 21:13

1) ModelArguments

model_name_or_path : Path to the model weight or identifier from huggingface.co/models or modelscope.cn/models.
adapter_name_or_path : Path to the adapter weight or identifier from huggingface.co/models.
cache_dir : Where to store the pre-trained models downloaded from huggingface.co or modelscope.cn.
use_fast_tokenizer : Whether or not to use one of the fast tokenizer (backed by the tokenizers library).
1. 빠른 토큰화: rust 기반으로 구현되어 기존 python 토크나이저보다 훨씬 빠른 속도를 자랑합니다.
2. 적은 메모리 사용량: 메모리 효율적인 디자인으로 대용량 데이터셋 처리 시 유리합니다.
3. 다양한 전처리 기능: 텍스트 정규화, 트러커팅, 패딩 등 다양한 전처리 기능을 제공합니다.
4. 배치 처리 지원: 여러 개의 샘플을 한번에 토큰화할 수 있어 효율적입니다.
5. 모델 가중치 로딩 지원: 사전 훈련된 토크나이저 모델 가중치를 로드할 수 있습니다.
resize_vocab : Whether or not to resize the tokenizer vocab and the embedding layers.
1. 새로운 토큰 추가: 기존 vocabulary에 없던 새로운 토큰을 추가할 수 있습니다.
2. 기존 토큰 제거: 불필요한 기존 토큰을 제거할 수 있습니다.
3. 토큰 임베딩 초기화: 새로 추가된 토큰에 대한 임베딩 벡터를 초기화합니다.
4. 모델 가중치 조정: 추가/제거된 토큰에 맞춰 모델의 가중치를 조정합니다.
split_special_tokens : Whether or not the special tokens should be split during the tokenization process.
1. special tokens의 위치 정보를 보존할 수 있습니다.
2. 모델의 입출력에서 special tokens을 구분하기 쉽습니다.
3. 일부 모델들은 special tokens을 별도 처리하도록 설계되어 있습니다.
model_revision : The specific model version to use (can be a branch name, tag name or commit id).
quantization_bit : The number of bits to quantize the model.
quantization_type : Quantization data type to use in int4 training.
double_quantization : Whether or not to use double quantization in int4 training.
shift_attn : Enable shift short attention (S^2-Attn) proposed by LongLoRA.
1. 입력 시퀀스를 겹치는 여러 개의 작은 windows로 나눕니다.
2. 각 윈도우 내에서는 기존 Transformer처럼 모든 위치 쌍에 대한 attention을 계산합니다.
3. windows간에는 attention을 일부만 계산하고, 나머지는 이전 layer로부터 가져옵니다.
4. 이를 통해 전체 시퀀스에 대한 attention을 근사적으로 계산할 수 있습니다.
Flash_attention2 : The attention calculation is performed by dividing it into small chunks that fit the memory, and this is calculated in parallel. This allows fast and memory-efficient attention calculations even for long sequences.
1. 입력 시퀀스를 작은 chunk로 나눕니다.
2. 각 chunk 내에서는 기존 방식과 동일하게 주의력을 계산합니다.
3. chunk 간 attention은 근사적인 계산 방식을 사용하여 연산량을 줄입니다.
4. 이 과정을 병렬로 수행하여 계산 속도를 높입니다.
use_unsloth : Whether or not to use unsloth's optimization for the LoRA training.
hf_hub_token : Auth token to log in with Hugging Face Hub.
export_dir : Path to the directory to save the exported model.
export_size : The file shard size (in GB) of the exported model.
export_quantization_bit : The number of bits to quantize the exported model.
export_quantization_dataset : Path to the dataset or dataset name to use in quantizing the exported model.
export_quantization_nsamples : The number of samples used for quantization.
export_quantization_maxlen : The maximum length of the model inputs used for quantization.
export_legacy_format : Whether or not to save the .bin files instead of .safetensors.
export_hub_model_id : The name of the repository if push the model to the Hugging Face hub.

2) DataArguments

dataset : The name of provided dataset(s) to use. Use commas to separate multiple datasets.
split : Which dataset split to use for training and evaluation.
cutoff_len : The maximum length of the model inputs after tokenization.
- 512 : BERT, RoBERTa 등 많은 사전 훈련 언어 모델에서 사용
- 1024, 2048 : 더 긴 문맥을 필요로 하는 모델
- 128, 256 : 단문 분류 등 작은 입력에서 사용
reserved_label_len : The maximum length reserved for label after tokenization.
mix_strategy : Strategy to use in dataset mixing (concat/interleave) (undersampling/oversampling).
eval_num_beams : Number of beams to use for evaluation. This argument will be passed to model.generate
ignore_pad_token_for_loss : Whether to ignore the tokens corresponding to padded labels in the loss computation or not.
num_workers : The number of processes to use for the preprocessing.

3) LoraArguments

additional_target : Name(s) of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint.
lora_alpha : The scale factor for LoRA fine-tuning (default: lora_rank * 2).
lora_dropout : Dropout rate for the LoRA fine-tuning.
lora_rank : The intrinsic dimension for LoRA fine-tuning.
create_new_adapter : Whether or not to create a new adapter with randomly initialized weight.

4) RLHFArguments

dpo_beta: The beta parameter for the DPO loss.
dpo_loss : "The type of DPO loss to use." ex) "sigmoid", "hinge", "ipo", "kto"
1. Sigmoid: 시그모이드 함수는 로지스틱 회귀나 인공 신경망에서 자주 사용되는 활성화 함수입니다. S자 모양의 곡선을 가지며, 입력값을 0과 1 사이의 값으로 압축합니다. 이진 분류 문제에서 출력층에 자주 사용됩니다.
2. Hinge: 힌지 손실 함수는 서포트 벡터 머신(SVM)에서 사용되는 비용 함수입니다. 마진 오류에 기반하여 손실을 측정합니다. 최적의 결정 경계를 찾는데 사용됩니다.
3. IPO: IPO는 Inverse Power of Odds의 약자로, 로지스틱 회귀 모델의 출력값을 변환하는 방법입니다. IPO는 모델의 신뢰도에 대한 정보를 제공합니다.
4. KTO: KTO는 Kernel Trick Optimization의 약자로, 커널 방법을 사용하는 알고리즘의 최적화 기술입니다. 서포트 벡터 머신과 같은 커널 기반 알고리즘의 계산 복잡도를 줄이는데 사용됩니다.
dpo_ftx : The supervised fine-tuning loss coefficient in DPO training.
ppo_buffer_size : The number of mini-batches to make experience buffer in a PPO optimization step.
- Typical range: PPO: 2048 - 409600; SAC: 50000 - 100000
- 주로 10240을 사용합니다.
ppo_epochs : The number of epochs to perform in a PPO optimization step.
ppo_logger: Log with either "wandb" or "tensorboard" in PPO training.
- wandb
  - 클라우드 기반 플랫폼으로 코드 변경 없이 실험 추적이 가능
  - 하이퍼파라미터 스위핑, 앙상블, 협업 등 추가 기능 제공
  - 모델 아티팩트 추적 및 버전 관리 지원
  - 실험 비교, 보고서 생성 등 직관적인 UI 제공
  - 웹 대시보드를 통해 언제 어디서나 실험 모니터링 가능
- TensorBoard
  - TensorFlow에서 기본적으로 제공하는 시각화 도구
  - 로그 데이터를 기반으로 스칼라, 이미지, 오디오, 텍스트 등 다양한 시각화 기능 제공
  - 단일 머신에서 실행되며, 로컬 환경에서 사용하기 적합
  - 코드에 TensorBoard 로깅 명령어를 추가해야 함
  - 실험 관리나 협업 등의 기능은 제한적
ppo_score_norm : Use score normalization in PPO training.
- True: advantage 값을 Normalization 합니다. 평균 0, 분산 1이 되도록 스케일링합니다.
- False: Normalization 하지 않습니다.
ppo_target : Target KL value for adaptive KL control in PPO training.
ppo_whiten_rewards : Whiten the rewards before compute advantages in PPO training.
ref_model : Path to the reference model used for the PPO or DPO training.
ref_model_adapters : Path to the adapters of the reference model.
ref_model_quantization_bit : The number of bits to quantize the reference model.
reward_model : Path to the reward model used for the PPO training.
reward_model_adapters : Path to the adapters of the reward model.
reward_model_quantization_bit : The number of bits to quantize the reward model.
reward_model_type : The type of the reward model in PPO training. Lora model only supports lora training. ex) "lora", "full", "api"

저작자표시

'머신러닝 & 딥러닝 > LLM' 카테고리의 다른 글

챗봇 시각화 솔루션 3종 Gradio, Streamlit, Dash (1)	2024.11.20
[LLM] 챗봇 시각화 솔루션 3종 Gradio, Streamlit, Dash (1)	2024.04.19
[LLM] 기업용 특화 LLM 생성 방법 (0)	2024.04.05
[LLM]llama.cpp 실행방법 (0)	2024.03.08
[LLM]LLM 파일 형식 GGML & GGUF이란? (1)	2024.03.07

현재글[LLM] Finetuning시 세부 파라미터 정리

Haru's 개발 블로그

Neural Networks, Objective-C, flux.1-dev, 챗봇, TCA, 티스토리챌린지, 딥러닝, 오블완, linear classification, torch.compile(), flux 모델 최적화, 경사 하강법, torchao, OpenAI, 프롬프트 엔지니어, Image Classification, 샘 올트먼, 스탠포드, SwiftUI, Diffusion,

Today :
Yesterday :

Haru's 개발 블로그

[LLM] Finetuning시 세부 파라미터 정리

1) ModelArguments

2) DataArguments

3) LoraArguments

4) RLHFArguments

'머신러닝 & 딥러닝 > LLM' 카테고리의 다른 글

'머신러닝 & 딥러닝/LLM'의 다른글

티스토리툴바

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

[LLM] Finetuning시 세부 파라미터 정리

1) ModelArguments

2) DataArguments

3) LoraArguments

4) RLHFArguments

'머신러닝 & 딥러닝 > LLM' 카테고리의 다른 글

'머신러닝 & 딥러닝/LLM'의 다른글

관련글

티스토리툴바