小视频网站怎么做,设计感很强的中文网站,太原网站建设培训学校,网站更改备案信息在哪里本文参考视频教程#xff1a;赋范课堂 – 只需20G显存#xff0c;QwQ-32B高效微调实战#xff01;4大微调工具精讲#xff01;知识灌注问答风格微调#xff0c;DeepSeek R1类推理模型微调Cot数据集创建实战打造定制大模型#xff01; https://www.bilibili.com/video/BV1…本文参考视频教程赋范课堂 – 只需20G显存QwQ-32B高效微调实战4大微调工具精讲知识灌注问答风格微调DeepSeek R1类推理模型微调Cot数据集创建实战打造定制大模型 https://www.bilibili.com/video/BV1YoQoYQEwF/ 课件资料https://kq4b3vgg5b.feishu.cn/wiki/LxI9wmuFmiaLCkkoiCIcKvOan7Q 在此之上有删改
赋范课堂 有非常好的课程推荐大家去学习观看 文章目录 一、基本准备1、安装unsloth2、wandb 安装与注册3、下载模型安装 huggingface_hub使用screen开启持久化会话设置模型国内访问镜像下载模型修改模型默认下载地址 二、模型调用测试modelscope 调用Ollama 调用vLLM 调用请求测试 三、下载微调数据集下载 NuminaMath CoT 数据集下载 medical-o1-reasoning-SFT数据集 四、加载模型五、微调前测试基本问答测试复杂问题测试原始模型的医疗问题问答 六、最小可行性实验定义提示词定义数据集处理函数整理数据开启微调微调说明相关库模型微调 **参数解析**① SFTTrainer 部分② TrainingArguments 部分 设置 wandb、开始微调查看效果模型合并保存为 GGUF 七、完整高效微调实验测试 一、基本准备
1、安装unsloth
pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps githttps://github.com/unslothai/unsloth.git2、wandb 安装与注册
wandb 类似于 tensorboard但比它稳定
注册https://wandb.ai/site API Key : https://wandb.ai/ezcode/t0322?productmodels
注册和使用详见https://blog.csdn.net/lovechris00/article/details/146437418 安装 库
pip install wandb登录输入 API key
wandb login3、下载模型
https://huggingface.co/unsloth/QwQ-32B-unsloth-bnb-4bit 安装 huggingface_hub
pip install huggingface_hub使用screen开启持久化会话
模型下载时间可能持续0.5-1个小时。避免因为关闭会话导致下载中断 安装 screen
sudo apt install screenscreen -S qwq设置模型国内访问镜像
Linux 上 ~/.bashrc 添加环境变量
export HF_ENDPOINThttps://hf-mirror.com 下载模型
huggingface-cli download --resume-download unsloth/QwQ-32B-unsloth-bnb-4bit修改模型默认下载地址
模型默认下载到 ~/.cache/huggingface/hub/如果想改到其它地方可以设置 HF_HOME 键
export HF_HOME/root/xx/HF_download二、模型调用测试
modelscope 调用
from modelscope import AutoModelForCausalLM, AutoTokenizermodel_name unsloth/QwQ-32B-unsloth-bnb-4bitmodel AutoModelForCausalLM.from_pretrained(model_name,torch_dtypeauto,device_mapauto
)
tokenizer AutoTokenizer.from_pretrained(model_name)prompt 你好好久不见
messages [{role: user, content: prompt}
]
text tokenizer.apply_chat_template(messages,tokenizeFalse,add_generation_promptTrue
)model_inputs tokenizer([text], return_tensorspt).to(model.device)
generated_ids model.generate(**model_inputs,max_new_tokens32768
)generated_ids [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]response tokenizer.batch_decode(generated_ids, skip_special_tokensTrue)[0]
print(response)Ollama 调用
from openai import OpenAIclient OpenAI(base_urlhttp://localhost:11434/v1/,api_keyollama, # required but ignored
)prompt 你好好久不见
messages [{role: user, content: prompt}
]response client.chat.completions.create(messagesmessages,modelqwq-32b-bnb,
)print(response.choices[0].message.content) 模型注册 查看是否注册成功
ollama list 使用 openai 库请求
from openai import OpenAIclient OpenAI(base_urlhttp://localhost:11434/v1/,api_keyollama, # required but ignored
)prompt 你好好久不见
messages [{role: user, content: prompt}
] vLLM 调用
vllm serve /root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit \
--quantization bitsandbytes \
--load-format bitsandbytes \
--max-model-len 2048请求测试
from openai import OpenAI
openai_api_key EMPTY
openai_api_base http://localhost:8000/v1client OpenAI(api_keyopenai_api_key,base_urlopenai_api_base,
)prompt 你好好久不见
messages [{role: user, content: prompt}
]response client.chat.completions.create(model/root/autodl-tmp/QwQ-32B-unsloth-bnb-4bit,messagesmessages,
)print(response.choices[0].message.content)三、下载微调数据集
推理类模型 回复结构 与 微调数据集结构 要求
QwQ-32B模型和DeepSeek R1类似推理过程的具体体现就是 在回复内容中会同时包含推理部分内容 和 最终回复部分内容并且其推理部分内容会通过一种在模型训练过程中注入的特殊标记来进行区分。 下载 NuminaMath CoT 数据集
https://huggingface.co/datasets/AI-MO/NuminaMath-CoT
huggingface-cli download AI-MO/NuminaMath-CoT --repo-type dataset除了NuminaMath CoT数据集外还有APPs编程数据集、TACO编程数据集、long_form_thought_data_5k通用问答数据集等都是CoT数据集均可用于推理模型微调。相关数据集介绍详见公开课《借助DeepSeek R1进行模型蒸馏模型蒸馏入门实战》| https://www.bilibili.com/video/BV1X1FoeBEgW/ 下载 medical-o1-reasoning-SFT数据集
https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT
huggingface-cli download FreedomIntelligence/medical-o1-reasoning-SFT --repo-type dataset你也可以 使用 Python - datasets 库来下载
from datasets import load_dataset# 此处先下载前500条数据即可完成实验
dataset load_dataset(FreedomIntelligence/medical-o1-reasoning-SFT,en, split train[0:500],trust_remote_codeTrue)# 查看数据集情况
dataset[0]四、加载模型
from unsloth import FastLanguageModel max_seq_length 2048
dtype None
load_in_4bit Truemodel, tokenizer FastLanguageModel.from_pretrained(model_name unsloth/QwQ-32B-unsloth-bnb-4bit,max_seq_length max_seq_length,dtype dtype,load_in_4bit load_in_4bit,
)
此时消耗 GPU : 22016MB 五、微调前测试
查看模型信息 model
Qwen2ForCausalLM((model): Qwen2Model((embed_tokens): Embedding(152064, 5120, padding_idx151654)(layers): ModuleList((0): Qwen2DecoderLayer(...(62): Qwen2DecoderLayer(...)(63): Qwen2DecoderLayer(...)(norm): Qwen2RMSNorm((5120,), eps1e-05)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features5120, out_features152064, biasFalse)
)tokenizer 信息 tokenizer
Qwen2TokenizerFast(name_or_pathunsloth/QwQ-32B-unsloth-bnb-4bit, vocab_size151643, model_max_length131072, is_fastTrue, padding_sideleft, truncation_sideright, special_tokens{eos_token: |im_end|, pad_token: |vision_pad|, additional_special_tokens: [|im_start|, |im_end|, |object_ref_start|, |object_ref_end|, |box_start|, |box_end|, |quad_start|, |quad_end|, |vision_start|, |vision_end|, |vision_pad|, |image_pad|, |video_pad|]}, clean_up_tokenization_spacesFalse, added_tokens_decoder{151643: AddedToken(|endoftext|, rstripFalse, lstripFalse, single_wordFalse, normalizedFalse, specialTrue),151644: AddedToken(|im_start|, rstripFalse, lstripFalse, single_wordFalse, normalizedFalse, specialTrue),...151667: AddedToken(think, rstripFalse, lstripFalse, single_wordFalse, normalizedFalse, specialFalse),151668: AddedToken(/think, rstripFalse, lstripFalse, single_wordFalse, normalizedFalse, specialFalse),
}
)基本问答测试
# 将模型调整为推理模式
FastLanguageModel.for_inference(model) # 带入问答模板进行回答 prompt_style_chat 请写出一个恰当的回答来完成当前对话任务。
***
### Instruction:
你是一名助人为乐的助手。
***
### Question:
{}
***
### Response:
think{}question 你好好久不见
prompt [prompt_style_chat.format(question, )] inputs tokenizer(prompt, return_tensorspt).to(cuda)outputs model.generate(input_idsinputs.input_ids,max_new_tokens2048,use_cacheTrue,
)# GPU 消耗到 22412 mb outputs
tensor([[ 14880, 112672, 46944, 112449, 111423, 36407, 60548, 67949, 105051,...35946, 106128, 99245, 101037, 11319, 144236, 151645]],devicecuda:0)
response tokenizer.batch_decode(outputs)
# response -- [请写出一个恰当的回答来完成当前对话任务。\n***\n### Instruction:\n你是一名助人为乐的助手。\n***\n### Question:\n你好好久不见\n***\n### Response:\nthink:\n好的用户发来问候“你好好久不见”我需要回应并延续对话。首先应该友好回应他们的问候比如“你好确实很久没联系了希望你一切都好”这样既回应了对方也表达了关心。接下来可能需要询问对方近况或者引导对话继续下去。比如可以问“最近有什么新鲜事吗或者你有什么需要帮助的吗”这样可以让对话更自然也符合助人为乐的角色设定。还要注意语气要亲切保持口语化避免过于正式。另外用户可能希望得到情感上的回应所以需要体现出关心和愿意帮助的态度。检查有没有语法错误确保句子流畅。最后确定回应简洁但足够友好符合对话的流程。\n/think\n\n你好确实好久不见了希望你一切都好最近有什么新鲜事分享或者需要我帮忙什么吗|im_end|]print(response[0].split(### Response:)[1]) 复杂问题测试
question 请证明根号2是无理数。inputs tokenizer([prompt_style_chat.format(question, )], return_tensorspt).to(cuda)outputs model.generate(input_idsinputs.input_ids,max_new_tokens1200,use_cacheTrue,
)# GPU 用到 22552MiBresponse tokenizer.batch_decode(outputs)print(response[0].split(### Response:)[1]) 原始模型的医疗问题问答
# 重新设置问答模板
prompt_style Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
think{}question_1 A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?question_2 Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?inputs1 tokenizer([prompt_style.format(question_1, )], return_tensorspt).to(cuda)outputs1 model.generate(input_idsinputs1.input_ids,max_new_tokens1200,use_cacheTrue,
)response1 tokenizer.batch_decode(outputs1)print(response1[0].split(### Response:)[1]) inputs2 tokenizer([prompt_style.format(question_2, )], return_tensorspt).to(cuda)outputs2 model.generate(input_idsinputs2.input_ids,max_new_tokens1200,use_cacheTrue,
)
# GPU 22842 MiB response2 tokenizer.batch_decode(outputs2)print(response2[0].split(### Response:)[1])六、最小可行性实验
接下来我们尝试进行模型微调
对于当前数据集而言我们可以带入 原始数据集 的部分数据 进行微调也可以带入 全部数据 并遍历多次进行微调。
对于大多数的微调实验我们都可以从 最小可行性实验 入手进行微调也就是先尝试带入少量数据进行微调并观测微调效果。
若微调可以顺利执行并能够获得微调效果再考虑带入更多的数据进行更大规模微调。 定义提示词
import os
from datasets import load_datasettrain_prompt_style Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
think
{}
/think
{}EOS_TOKEN tokenizer.eos_token # |im_end| 定义数据集处理函数
用于对medical-o1-reasoning-SFT数据集进行修改Complex_CoT 列 和 Response 列 进行拼接并加上文本结束标记
def formatting_prompts_func(examples):inputs examples[Question]cots examples[Complex_CoT]outputs examples[Response]texts []for input, cot, output in zip(inputs, cots, outputs):text train_prompt_style.format(input, cot, output) EOS_TOKENtexts.append(text)return {text: texts,}整理数据
dataset load_dataset(FreedomIntelligence/medical-o1-reasoning-SFT,en, split train[0:500],trust_remote_codeTrue) {Question: A 61-year-old ... contractions?,Complex_CoT: Okay, lets ... incontinence.,Response: Cystometry in ... the test.
}
# 结构化处理
dataset dataset.map(formatting_prompts_func, batched True,) # 查看
dataset[text][0]Below is an instruction that ... response.
***
### Instruction:
You are a medical ... medical question.
***
### Question:
A 61-year-old woman ... contractions?
***
### Response:
think
Okay,...Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.
/think
Cystometry ... is primarily related to physical e开启微调
model FastLanguageModel.get_peft_model(model,r16, target_modules[q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj,],lora_alpha16,lora_dropout0, biasnone, use_gradient_checkpointingunsloth, # True or unsloth for very long contextrandom_state3407,use_rsloraFalse, loftq_configNone,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer SFTTrainer(modelmodel,tokenizertokenizer,train_datasetdataset,dataset_text_fieldtext,max_seq_lengthmax_seq_length, dataset_num_proc2,argsTrainingArguments(per_device_train_batch_size2,gradient_accumulation_steps4,# Use num_train_epochs 1, warmup_ratio for full training runs!warmup_steps5,max_steps60,learning_rate2e-4,fp16not is_bfloat16_supported(),bf16is_bfloat16_supported(),logging_steps10,optimadamw_8bit,weight_decay0.01,lr_scheduler_typelinear,seed3407,output_diroutputs,),
) 微调说明
这段代码主要是用 SFTTrainer 进行 监督微调Supervised Fine-Tuning, SFT适用于 transformers 和 Unsloth 生态中的模型微调
相关库
SFTTrainer来自 trl 库 trlTransformer Reinforcement Learning是 Hugging Face 旗下的 trl 库提供 监督微调SFT 和 强化学习RLHF 相关的功能。SFTTrainer 主要用于 有监督微调Supervised Fine-Tuning适用于 LoRA 等低秩适配微调方式。 TrainingArguments来自 transformers 库 这个类用于定义 训练超参数比如批量大小、学习率、优化器、训练步数等。 is_bfloat16_supported()来自 unsloth 这个函数检查 当前 GPU 是否支持 bfloat16BF16如果支持则返回 True否则返回 False。bfloat16 是一种更高效的数值格式在 新款 NVIDIA A100/H100 等 GPU 上表现更优。 模型微调 参数解析 ① SFTTrainer 部分
参数作用modelmodel指定需要进行微调的 预训练模型tokenizertokenizer指定 分词器用于处理文本数据train_datasetdataset传入 训练数据集dataset_text_fieldtext指定数据集中哪一列包含 训练文本在 formatting_prompts_func 里处理max_seq_lengthmax_seq_length最大序列长度控制输入文本的最大 Token 数量dataset_num_proc2数据加载的并行进程数提高数据预处理效率 ② TrainingArguments 部分
参数作用per_device_train_batch_size2每个 GPU/设备 的训练批量大小较小值适合大模型gradient_accumulation_steps4梯度累积步数相当于 batch_size2 × 4 8warmup_steps5预热步数初始阶段学习率较低然后逐步升高max_steps60最大训练步数控制训练的总步数此处总共约消耗60*8480条数据learning_rate2e-4学习率2e-4 0.0002控制权重更新幅度fp16not is_bfloat16_supported()如果 GPU 不支持 bfloat16则使用 fp1616位浮点数bf16is_bfloat16_supported()如果 GPU 支持 bfloat16则启用 bfloat16训练更稳定logging_steps10每 10 步记录一次训练日志optimadamw_8bit使用 adamw_8bit8-bit AdamW优化器减少显存占用weight_decay0.01权重衰减L2 正则化防止过拟合lr_scheduler_typelinear学习率调度策略线性衰减seed3407随机种子保证实验结果可复现output_diroutputs训练结果的输出目录 设置 wandb、开始微调
import wandb
wandb.login(key8c7...242bd)
run wandb.init(projectFine-tune-QwQ-32B-4bit on Medical COT Dataset, )# 开始微调
trainer_stats trainer.train() 如果 出现 CUDA out of memory 的情况可以酌情修改参数。
试试如下代码(仅用于测试不保证效果)
import torch
torch.cuda.empty_cache()import os
os.environ[PYTORCH_CUDA_ALLOC_CONF] expandable_segments:True from unsloth import FastLanguageModel max_seq_length 1024
dtype None
load_in_4bit Truemodel, tokenizer FastLanguageModel.from_pretrained(model_name unsloth/QwQ-32B-unsloth-bnb-4bit,max_seq_length max_seq_length,dtype dtype,load_in_4bit load_in_4bit,
)import os
from datasets import load_datasettrain_prompt_style Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
think
{}
/think
{}EOS_TOKEN tokenizer.eos_token # |im_end| def formatting_prompts_func(examples):inputs examples[Question]cots examples[Complex_CoT]outputs examples[Response]texts []for input, cot, output in zip(inputs, cots, outputs):text train_prompt_style.format(input, cot, output) EOS_TOKENtexts.append(text)return {text: texts,}dataset load_dataset(FreedomIntelligence/medical-o1-reasoning-SFT,en, split train[0:200],trust_remote_codeTrue) # 结构化处理
dataset dataset.map(formatting_prompts_func, batched True,) # 开启微调
model FastLanguageModel.get_peft_model(model,r8, target_modules[q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj,],lora_alpha8,lora_dropout0, biasnone, use_gradient_checkpointingunsloth, # True or unsloth for very long contextrandom_state3407,use_rsloraFalse, loftq_configNone,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 创建有监督微调对象
trainer SFTTrainer(modelmodel,tokenizertokenizer,train_datasetdataset,dataset_text_fieldtext,max_seq_lengthmax_seq_length, dataset_num_proc2,argsTrainingArguments(per_device_train_batch_size1,gradient_accumulation_steps8,# Use num_train_epochs 1, warmup_ratio for full training runs!warmup_steps5,max_steps60,learning_rate2e-4,fp16not is_bfloat16_supported(),bf16is_bfloat16_supported(),logging_steps20,optimadamw_8bit,weight_decay0.01,lr_scheduler_typelinear,seed3407,output_diroutputs,),
)import wandb
wandb.login(key8c7b98e4f525793b228b04fcc3596acd9e7242bd)
run wandb.init(projectFine-tune-QwQ-32B-4bit on Medical COT Dataset, )# 开始微调
trainer_stats trainer.train() 查看效果
unsloth在微调结束后会自动更新模型权重在缓存中因此无需手动合并模型权重 即可直接调用微调后的模型
trainer_stats
# TrainOutput(global_step60, training_loss1.3152311007181803, metrics{train_runtime: 709.9004, train_samples_per_second: 0.676, train_steps_per_second: 0.085, total_flos: 6.676294205826048e16, train_loss: 1.3152311007181803})# 到推理状态
FastLanguageModel.for_inference(model)# 再次查看问答效果
inputs tokenizer([prompt_style.format(question_1, )], return_tensorspt).to(cuda)outputs model.generate(input_idsinputs.input_ids,attention_maskinputs.attention_mask,max_new_tokens2048,use_cacheTrue,
)
response tokenizer.batch_decode(outputs)
print(response[0].split(### Response:)[1])inputs tokenizer([prompt_style.format(question_2, )], return_tensorspt).to(cuda)outputs model.generate(input_idsinputs.input_ids,attention_maskinputs.attention_mask,max_new_tokens2048,use_cacheTrue,
)
response tokenizer.batch_decode(outputs)
print(response[0].split(### Response:)[1]) 模型合并
save_path QwQ-Medical-COT-Tiny
model.save_pretrained_merged(save_path, tokenizer, save_method merged_4bit,) 保存为 GGUF
方便使用ollama进行推理
导出与合并需要较长时间约20分钟左右
save_path QwQ-Medical-COT-Tiny-GGUF
model.save_pretrained_gguf(save_path, tokenizer, quantization_method q4_k_m) 七、完整高效微调实验
最后带入全部数据进行高效微调以提升模型微调效果。 # 设置训练的提示词模板
train_prompt_style Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
***
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
***
### Question:
{}
***
### Response:
think
{}
/think
{}EOS_TOKEN tokenizer.eos_token # Must add EOS_TOKENdef formatting_prompts_func(examples):inputs examples[Question]cots examples[Complex_CoT]outputs examples[Response]texts []for input, cot, output in zip(inputs, cots, outputs):text train_prompt_style.format(input, cot, output) EOS_TOKENtexts.append(text)return {text: texts,}# 读取全部数据
dataset load_dataset(FreedomIntelligence/medical-o1-reasoning-SFT,en, split train,trust_remote_codeTrue)
dataset dataset.map(formatting_prompts_func, batched True,)
dataset[text][0]# 加载模型
model FastLanguageModel.get_peft_model(model,r16, target_modules[q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj,],lora_alpha16,lora_dropout0, biasnone, use_gradient_checkpointingunsloth, # True or unsloth for very long contextrandom_state3407,use_rsloraFalse, loftq_configNone,
)from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported# 设置epoch为3遍历3次数据集
trainer SFTTrainer(modelmodel,tokenizertokenizer,train_datasetdataset,dataset_text_fieldtext,max_seq_lengthmax_seq_length,dataset_num_proc2,argsTrainingArguments(per_device_train_batch_size2,gradient_accumulation_steps4,num_train_epochs 3,warmup_steps5,# max_steps60,learning_rate2e-4,fp16not is_bfloat16_supported(),bf16is_bfloat16_supported(),logging_steps10,optimadamw_8bit,weight_decay0.01,lr_scheduler_typelinear,seed3407,output_diroutputs,),
)# Map (num_proc2): 0%| | 0/25371 [00:00?, ? examples/s] trainer_stats trainer.train()[ 389/9513 13:44 5:24:01, 0.47 it/s, Epoch 0.12/3]
StepTraining Loss101.285900201.262500……3701.2012003801.215600
这里总共训练约15个小时。 trainer_statsTrainOutput(global_step9513, training_loss1.0824475168592858, metrics{train_runtime: 20193.217, train_samples_per_second: 3.769, train_steps_per_second: 0.471, total_flos: 2.7936033274397737e18, train_loss: 1.0824475168592858, epoch: 2.9992117294655527})测试
带入两个问题进行测试均有较好的回答效果 question A 61-year-old ... contractions?FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference!
inputs tokenizer([prompt_style.format(question, )], return_tensorspt).to(cuda)outputs model.generate(input_idsinputs.input_ids,attention_maskinputs.attention_mask,max_new_tokens1200,use_cacheTrue,
)
response tokenizer.batch_decode(outputs)
print(response[0].split(### Response:)[1]) question Given a patient who experiences sudden-onset chest pain radiating to the neck and left arm, with a past medical history of hypercholesterolemia and coronary artery disease, elevated troponin I levels, and tachycardia, what is the most likely coronary artery involved based on this presentation?FastLanguageModel.for_inference(model) # Unsloth has 2x faster inference!
inputs tokenizer([prompt_style.format(question, )], return_tensorspt).to(cuda)outputs model.generate(input_idsinputs.input_ids,attention_maskinputs.attention_mask,max_new_tokens1200,use_cacheTrue,
)
response tokenizer.batch_decode(outputs)
print(response[0].split(### Response:)[1]) 2025-03-22六