当前位置: 首页 > news >正文

怎么用手机建设网站正规网站建设详细方案

怎么用手机建设网站,正规网站建设详细方案,免费h5模板,seo代码优化有哪些方法Fine-Tuning Mixtral 8x7B with QLoRA#xff1a;Enhancing Model Performance #x1f680; 编者按#xff1a;最近#xff0c;混合专家(Mixture of Experts,MoE)这种模型设计策略展现出了卓越的语言理解能力#xff0c;如何在此基础上进一步提升 MoE 模型的性能成为业界…Fine-Tuning Mixtral 8x7B with QLoRAEnhancing Model Performance  编者按最近混合专家(Mixture of Experts,MoE)这种模型设计策略展现出了卓越的语言理解能力如何在此基础上进一步提升 MoE 模型的性能成为业界热点。 本文作者使用一种名为 QLoRA 的方法通过量化和 LoRA 技术对 MoE 模型 Mixtral-8x7B 进行微调以期大幅提高其性能。 作者详细阐明这种方法的诸多优势包括显著增强 MoE 模型的理解生成能力、计算效率更高等。文中还逐步介绍了使用 QLoRA 微调 Mixtral-8x7B 的全过程。 本文探索了使用 QLoRA 推动 MoE 模型的性能改进这一技术方案。期待未来更多关于 MoE 模型的性能改进方案出现 一、简介 目前整个业界都希望经过优化的模型能够表现出卓越的性能这一追求不断推动着自然语言理解natural language understanding的发展。Mixtral-8x7B Mixture of ExpertsMoE模型就是其中之一该模型在各种基准测试benchmarks中表现出优于同类产品的性能尤其是优于 Llama 2 70B。 本教程采用一种名为 QLoRA 的创新方法对 Mixtral-8x7B 模型进行微调该方法结合了量化quantization和 LoRALocal Representation Adaptation技术。期望通过这两种技术的结合来进一步增强Mixtral-8x7B模型的能力。 Source: Mixtral[1] 二、相关定义 ● Mixtral 8x7B一种混合专家模型因其架构设计在自然语言处理任务中表现出色而闻名。 ● QLoRAQuantization 和 LoRA 技术相结合的缩写。量化涉及降低模型权重的精度从而优化内存使用并加快计算速度。LoRA 可调整模型中的局部表征增强模型对特定上下文的理解。 三、优势 ● 增强性能使用 QLoRA 对 Mixtral 8x7B 进行微调可提高其性能从而更好地理解和生成各种领域的文本。 ● 能效比高量化的整合降低了内存需求和计算复杂度使模型更节省资源。 ● 针对垂直领域进行微调通过微调该模型可针对特定任务进行定制从而提高其在特定领域的准确性和相关性。 四、代码实现说明 本教程在 Notebook 环境中译者注使用Jupyter notebook 或白海IDP自研notebook使用 Python。整个过程包括使用 bitsandbytes 库加载 4 位精度的大型 Mixtral 模型。随后在训练阶段使用 Hugging Face 的 PEFT 库实现 LoRA。 4.1 步骤 1安装相关库 # You only need to run this once per machine, even if you stop/restart it !pip install --upgrade pip !pip install -q -U bitsandbytes !pip install -q -U githttps://github.com/huggingface/transformers.git !pip install -q -U githttps://github.com/huggingface/peft.git !pip install -q -U githttps://github.com/huggingface/accelerate.git !pip install -q -U datasets scipy ipywidgets matplotlib4.2 步骤 2设置 Accelerator from accelerate import FullyShardedDataParallelPlugin, Accelerator from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfigfsdp_plugin FullyShardedDataParallelPlugin(state_dict_configFullStateDictConfig(offload_to_cpuTrue, rank0_onlyFalse),optim_state_dict_configFullOptimStateDictConfig(offload_to_cpuTrue, rank0_onlyFalse), )accelerator Accelerator(fsdp_pluginfsdp_plugin)4.3 步骤 3使用Weights  Biases追踪性能指标 !pip install -q wandb -Uimport wandb, os wandb.login()wandb_project viggo-finetune if len(wandb_project) 0:os.environ[WANDB_PROJECT] wandb_project4.4 步骤 4加载数据集 from datasets import load_datasetdataset_name databricks/databricks-dolly-15ktrain_dataset load_dataset(dataset_name, splittrain[0:800]) eval_dataset load_dataset(dataset_name, splittrain[800:1000])4.5 步骤 5加载基础模型 import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigbase_model_id mistralai/Mixtral-8x7B-v0.1 bnb_config BitsAndBytesConfig(load_in_4bitTrue,bnb_4bit_use_double_quantTrue,bnb_4bit_compute_dtypetorch.bfloat16 )model AutoModelForCausalLM.from_pretrained(base_model_id, quantization_configbnb_config, device_mapauto)# Tokenization tokenizer AutoTokenizer.from_pretrained(base_model_id,padding_sideleft,add_eos_tokenTrue,add_bos_tokenTrue, ) tokenizer.pad_token tokenizer.eos_tokendef tokenize(prompt):result tokenizer(prompt)result[labels] result[input_ids].copy()return resultdef generate_and_tokenize_prompt(data_point):full_prompt fGiven a question and some additional context, provide an answer### Target sentence:Question: {data_point[instruction]}Additional Context: {fHere is some context: {data_point[context]} if len(data_point[context]) 0 else }Response: [/INST] {data_point[response]}/stokenized_prompt tokenizer(full_prompt)return tokenized_prompttokenized_train_dataset train_dataset.map(generate_and_tokenize_prompt) tokenized_val_dataset eval_dataset.map(generate_and_tokenize_prompt)untokenized_text tokenizer.decode(tokenized_train_dataset[1][input_ids]) print(untokenized_text)# Output s Given a question and some additional context, provide an answer### Target sentence:Question: Alices parents have three daughters: Amy, Jessy, and what’s the name of the third daughter?Additional Context: Response: [/INST] The name of the third daughter is Alice/s/s4.6 步骤 6获取数据集中各个样本长度的分布情况 import matplotlib.pyplot as pltdef plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset):lengths [len(x[input_ids]) for x in tokenized_train_dataset]lengths [len(x[input_ids]) for x in tokenized_val_dataset]print(len(lengths))# Plotting the histogramplt.figure(figsize(10, 6))plt.hist(lengths, bins20, alpha0.7, colorblue)plt.xlabel(Length of input_ids)plt.ylabel(Frequency)plt.title(Distribution of Lengths of input_ids)plt.show()plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset)Source: Image created by Author 4.7 步骤 7在数据的左侧添加 padding 以减少内存的使用 max_length 320 # This was an appropriate max length for my dataset# redefine the tokenize function and tokenizertokenizer AutoTokenizer.from_pretrained(base_model_id,padding_sideleft,add_eos_tokenTrue, add_bos_tokenTrue, ) tokenizer.pad_token tokenizer.eos_tokendef tokenize(prompt):result tokenizer(prompt,truncationTrue,max_lengthmax_length,paddingmax_length,)result[labels] result[input_ids].copy()return resulttokenized_train_dataset train_dataset.map(generate_and_tokenize_prompt) tokenized_val_dataset eval_dataset.map(generate_and_tokenize_prompt)untokenized_text tokenizer.decode(tokenized_train_dataset[4][input_ids]) print(untokenized_text)# Output s Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.This function should describe the target string accurately and the function must be one of the following [inform, request, give_opinion, confirm, verify_attribute, suggest, request_explanation, recommend, request_attribute].The attributes must be one of the following: [name, exp_release_date, release_year, developer, esrb, rating, genres, player_perspective, has_multiplayer, platforms, available_on_steam, has_linux_release, has_mac_release, specifier]### Target sentence:When did Virgin Australia start operating?Here is some context: Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australias domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.[/INST] Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route./s/splot_data_lengths(tokenized_train_dataset, tokenized_val_dataset)Source: Image created by Author 4.8 步骤 8设置 LoRA from peft import prepare_model_for_kbit_trainingmodel.gradient_checkpointing_enable() model prepare_model_for_kbit_training(model)def print_trainable_parameters(model):Prints the number of trainable parameters in the model.trainable_params 0all_param 0for _, param in model.named_parameters():all_param param.numel()if param.requires_grad:trainable_params param.numel()print(ftrainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param})from peft import LoraConfig, get_peft_modelconfig LoraConfig(r8,lora_alpha16,target_modules[q_proj,k_proj,v_proj,o_proj,w1,w2,w3,lm_head,],biasnone,lora_dropout0.05, # Conventionaltask_typeCAUSAL_LM, )model get_peft_model(model, config) print_trainable_parameters(model)# Apply the accelerator. You can comment this out to remove the accelerator. model accelerator.prepare_model(model)# Output trainable params: 120350720 || all params: 23602952192 || trainable%: 0.50989689349450014.9 步骤 9进行训练 import transformers from datetime import datetimeif torch.cuda.device_count() 1: # If more than 1 GPUmodel.is_parallelizable Truemodel.model_parallel Trueproject databricks-dolly-finetune base_model_name mixtral run_name base_model_name - project output_dir ./ run_nametokenizer.pad_token tokenizer.eos_tokentrainer transformers.Trainer(modelmodel,train_datasettokenized_train_dataset,eval_datasettokenized_val_dataset,argstransformers.TrainingArguments(output_diroutput_dir,warmup_steps5,per_device_train_batch_size1,gradient_checkpointingTrue,gradient_accumulation_steps4,max_steps500,learning_rate2.5e-5, logging_steps25,fp16True, optimpaged_adamw_8bit,logging_dir./logs, # Directory for storing logssave_strategysteps, # Save the model checkpoint every logging stepsave_steps50, # Save checkpoints every 50 stepsevaluation_strategysteps, # Evaluate the model every logging stepeval_steps50, # Evaluate and save checkpoints every 50 stepsdo_evalTrue, # Perform evaluation at the end of trainingreport_towandb, # Comment this out if you dont want to use weights baisesrun_namef{run_name}-{datetime.now().strftime(%Y-%m-%d-%H-%M)} # Name of the WB run (optional)),data_collatortransformers.DataCollatorForLanguageModeling(tokenizer, mlmFalse), )model.config.use_cache False # silence the warnings. Please re-enable for inference! trainer.train()4.10 步骤 10使用训练完毕的模型 import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigbase_model_id mistralai/Mixtral-8x7B-v0.1 bnb_config BitsAndBytesConfig(load_in_4bitTrue,bnb_4bit_use_double_quantTrue,bnb_4bit_compute_dtypetorch.bfloat16 )base_model AutoModelForCausalLM.from_pretrained(base_model_id, # Mixtral, same as beforequantization_configbnb_config, # Same quantization config as beforedevice_mapauto,trust_remote_codeTrue,use_auth_tokenTrue )eval_tokenizer AutoTokenizer.from_pretrained(base_model_id,add_bos_tokenTrue,trust_remote_codeTrue, )from peft import PeftModelft_model PeftModel.from_pretrained(base_model, mixtral-databricks-dolly-finetune/checkpoint-100)eval_prompt Given a question and some additional context, provide an answer### Target sentence: Question: When was Tomoaki Komorida born? Here is some context: Komorida was born in Kumamoto Prefecture on July 10, 1981. After graduating from high school, he joined the J1 League club Avispa Fukuoka in 2000. Although he debuted as a midfielder in 2001, he did not play much and the club was relegated to the J2 League at the end of the 2001 season. In 2002, he moved to the J2 club Oita Trinita. He became a regular player as a defensive midfielder and the club won the championship in 2002 and was promoted in 2003. He played many matches until 2005. In September 2005, he moved to the J2 club Montedio Yamagata. In 2006, he moved to the J2 club Vissel Kobe. Although he became a regular player as a defensive midfielder, his gradually was played less during the summer. In 2007, he moved to the Japan Football League club Rosso Kumamoto (later Roasso Kumamoto) based in his local region. He played as a regular player and the club was promoted to J2 in 2008. Although he did not play as much, he still played in many matches. In 2010, he moved to Indonesia and joined Persela Lamongan. In July 2010, he returned to Japan and joined the J2 club Giravanz Kitakyushu. He played often as a defensive midfielder and center back until 2012 when he retired.### Response: model_input eval_tokenizer(eval_prompt, return_tensorspt).to(cuda)ft_model.eval()with torch.no_grad():print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens100)[0], skip_special_tokensTrue))Given a question and some additional context, provide an answer### Target sentence: Question: When was Tomoaki Komorida born? Here is some context: Komorida was born in Kumamoto Prefecture on July 10, 1981. After graduating from high school, he joined the J1 League club Avispa Fukuoka in 2000. Although he debuted as a midfielder in 2001, he did not play much and the club was relegated to the J2 League at the end of the 2001 season. In 2002, he moved to the J2 club Oita Trinita. He became a regular player as a defensive midfielder and the club won the championship in 2002 and was promoted in 2003. He played many matches until 2005. In September 2005, he moved to the J2 club Montedio Yamagata. In 2006, he moved to the J2 club Vissel Kobe. Although he became a regular player as a defensive midfielder, his gradually was played less during the summer. In 2007, he moved to the Japan Football League club Rosso Kumamoto (later Roasso Kumamoto) based in his local region. He played as a regular player and the club was promoted to J2 in 2008. Although he did not play as much, he still played in many matches. In 2010, he moved to Indonesia and joined Persela Lamongan. In July 2010, he returned to Japan and joined the J2 club Giravanz Kitakyushu. He played often as a defensive midfielder and center back until 2012 when he retired.### Response: Tomoaki Komorida was born on July 10, 1981.五、结论 利用 QLoRA 对 Mixtral-8x7B 模型进行微调是自然语言处理 (NLP) 领域的一个重要进展它将模型性能提升到了新的高度。这一缜密的过程融合了量化和 LoRA 等前沿技术为超越基准benchmarks提供了一条稳健的途径甚至在各种评估指标上超越了强大的 Llama 2 70B 模型。 本教程的核心在于使用QLoRA进行微调利用bitsandbytes以4位精度实例化模型并运用Hugging Face 的PEFT库。该指南不仅概述了微调方法还揭示了实践过程中可能遇到的问题如OutOfMemory errors为用户提供了精确的解决途径。 从本质上讲该教程并非是一个技术指南更像一个倡导模型微调最佳实践的指引。它倡导协作式微调请邀请其他研究人员和从业者一同踏上推动语言理解模型发展的旅程。 前沿技术、详细的指导以及合作共赢的态度使得该教程对于NLP社区来说是一个非常重要且不可或缺的资源期望能够引导 NLP 社区进一步提高模型性能丰富理解能力。 Resources: ● Mixtral-8x7b[2] ● Thanks to Harper Carroll[2] 文中链接 [1]https://mistral.ai/news/mixtral-of-experts/ [2]https://huggingface.co/blog/mixtral [3]https://twitter.com/HarperSCarroll
http://www.pierceye.com/news/634518/

相关文章:

  • 做网站用哪个服务器不用备案宣传网站设计
  • 网站建设哪种语言好电子商务型网站建设
  • 广州网站建设平台网站怎么做必须交钱吗
  • 做网站费免图片网站背景图网站
  • 上海电商网站开发公司门户网站建设 总结
  • 网站产品类别顺序如果修改wordpress多城市seo
  • 做网站托管的好处公司erp系统
  • 管局备案网站高端定制网站的特点
  • 成都极客联盟网站建设公司有没有帮别人做网站
  • 宝安专业网站设计公司公众号小程序怎么做
  • 郑州网站优化公司爱范儿 wordpress 主题
  • 电商网站建设书宣传片拍摄技巧
  • 珠海的门户网站有哪些app开发是什么专业
  • 网站建设推广报价简单网页素材
  • 建设企业官方网站的流程37玩手游官网平台
  • 南通网站建设方案开发网站建设运营公众号运营合同
  • 制作网站语言seo推广软件怎样
  • 企业网站建设的三种方式wordpress 导航高亮
  • 个人 建设图片分享网站网站开发设计步骤
  • 温州做阀门网站公司网站的建设时间怎么查
  • 好看的个人网站主页网站建设选择什么模式
  • 做内衣的网站校园网站建设网站
  • 学做网站论坛vip共享wordpress分类下文章排序
  • 文章内容网站系统网页编辑怎么打开
  • 建网站难吗查看关键词被搜索排名的软件
  • 同学会网站建设方案全免费无代码开发平台
  • 做网站给女朋友溧阳网站制作
  • 怎么注册电力建设公司网站wordpress用户注册邮箱验证
  • 用asp做的网站如何发布上海公司网站备案
  • 金华企业网站建设公司知识付费小程序源码