有做lol直播网站有哪些,搜狐快站建设pc网站,wordpress干啥的,win2008搭建wordpressLSTM Word 语言模型上的(实验#xff09;动态量化
介绍
量化涉及将模型的权重和激活从 float 转换为 int#xff0c;这可能会导致模型尺寸更小#xff0c;推断速度更快#xff0c;而对准确性的影响很小。
在本教程中#xff0c;我们将最简单的量化形式-动态量化应用于基…LSTM Word 语言模型上的(实验动态量化
介绍
量化涉及将模型的权重和激活从 float 转换为 int这可能会导致模型尺寸更小推断速度更快而对准确性的影响很小。
在本教程中我们将最简单的量化形式-动态量化应用于基于 LSTM 的下一个单词预测模型紧紧遵循 PyTorch 示例中的单词语言模型 。
# imports
import os
from io import open
import timeimport torch
import torch.nn as nn
import torch.nn.functional as F
1.定义模型
在这里我们根据词语言模型示例中的模型定义 LSTM 模型体系结构。
class LSTMModel(nn.Module):Container module with an encoder, a recurrent module, and a decoder.def __init__(self, ntoken, ninp, nhid, nlayers, dropout0.5):super(LSTMModel, self).__init__()self.drop nn.Dropout(dropout)self.encoder nn.Embedding(ntoken, ninp)self.rnn nn.LSTM(ninp, nhid, nlayers, dropoutdropout)self.decoder nn.Linear(nhid, ntoken)self.init_weights()self.nhid nhidself.nlayers nlayersdef init_weights(self):initrange 0.1self.encoder.weight.data.uniform_(-initrange, initrange)self.decoder.bias.data.zero_()self.decoder.weight.data.uniform_(-initrange, initrange)def forward(self, input, hidden):emb self.drop(self.encoder(input))output, hidden self.rnn(emb, hidden)output self.drop(output)decoded self.decoder(output)return decoded, hiddendef init_hidden(self, bsz):weight next(self.parameters())return (weight.new_zeros(self.nlayers, bsz, self.nhid),weight.new_zeros(self.nlayers, bsz, self.nhid))
2.加载文本数据
接下来我们再次根据单词模型示例对预处理将 Wikitext-2 数据集加载到cite语料库/cite中。
class Dictionary(object):def __init__(self):self.word2idx {}self.idx2word []def add_word(self, word):if word not in self.word2idx:self.idx2word.append(word)self.word2idx[word] len(self.idx2word) - 1return self.word2idx[word]def __len__(self):return len(self.idx2word)class Corpus(object):def __init__(self, path):self.dictionary Dictionary()self.train self.tokenize(os.path.join(path, train.txt))self.valid self.tokenize(os.path.join(path, valid.txt))self.test self.tokenize(os.path.join(path, test.txt))def tokenize(self, path):Tokenizes a text file.assert os.path.exists(path)# Add words to the dictionarywith open(path, r, encodingutf8) as f:for line in f:words line.split() [eos]for word in words:self.dictionary.add_word(word)# Tokenize file contentwith open(path, r, encodingutf8) as f:idss []for line in f:words line.split() [eos]ids []for word in words:ids.append(self.dictionary.word2idx[word])idss.append(torch.tensor(ids).type(torch.int64))ids torch.cat(idss)return idsmodel_data_filepath data/corpus Corpus(model_data_filepath wikitext-2)
3.加载预训练的模型
这是有关动态量化的教程动态量化是在训练模型后应用的一种量化技术。 因此我们只需将一些预先训练的权重加载到此模型架构中即可 这些权重是通过使用单词语言模型示例中的默认设置训练五个纪元而获得的。
ntokens len(corpus.dictionary)model LSTMModel(ntoken ntokens,ninp 512,nhid 256,nlayers 5,
)model.load_state_dict(torch.load(model_data_filepath word_language_model_quantize.pth,map_locationtorch.device(cpu)))model.eval()
print(model)
出
LSTMModel((drop): Dropout(p0.5, inplaceFalse)(encoder): Embedding(33278, 512)(rnn): LSTM(512, 256, num_layers5, dropout0.5)(decoder): Linear(in_features256, out_features33278, biasTrue)
)
现在我们生成一些文本以确保预先训练的模型能够正常工作-与以前类似我们在此处遵循
input_ torch.randint(ntokens, (1, 1), dtypetorch.long)
hidden model.init_hidden(1)
temperature 1.0
num_words 1000with open(model_data_filepath out.txt, w) as outf:with torch.no_grad(): # no tracking historyfor i in range(num_words):output, hidden model(input_, hidden)word_weights output.squeeze().div(temperature).exp().cpu()word_idx torch.multinomial(word_weights, 1)[0]input_.fill_(word_idx)word corpus.dictionary.idx2word[word_idx]outf.write(str(word.encode(utf-8)) (\n if i % 20 19 else ))if i % 100 0:print(| Generated {}/{} words.format(i, 1000))with open(model_data_filepath out.txt, r) as outf:all_output outf.read()print(all_output)
Out:
| Generated 0/1000 words
| Generated 100/1000 words
| Generated 200/1000 words
| Generated 300/1000 words
| Generated 400/1000 words
| Generated 500/1000 words
| Generated 600/1000 words
| Generated 700/1000 words
| Generated 800/1000 words
| Generated 900/1000 words
band bO b\xe2\x80\x99 bGacy b, band bthen bdefined bthat bnext bnovel bsucceeded blarge bproperty b, bso bneither bnumber bis bcurrently
ba bidentical bplanet bby bstiff bculture b. bMosley bmay bsettle bin bnon b- bbands bfor bthe bbeginning bof bits bhome
bstations b, bbeing balso bin bcharge bfor btwo bother b- bmonth bceremonies b. bThe bfirst bStar bOverseas btook bto bhave
bmet bits bleadership bfor binvestigation bsuch bas bDiscovered blbw b, bclub b, bunk b, bunk b, bor bCrac bMalley b,
balthough bwith bthe bother bvictory b, bassumes bit b. b( bnot bcontainment bto ba brecent bproblem b) b. bHis btraditional
bscheme bprocess bis bproceeded boutdoor bin boverweight bclusters b; bGod bDavis bwas binterested bon bher bright btouring b, balthough bthey
bhad bpreviously bpreviously brisen bnear beclipse bin bhis bwork bby bthe blatter b- bperspective b. bDuring bthe brelease bof bBell
b, bthe bfirst bpromotional bmention bincluded ba bMagnetic bseam bwas bput binto bShakespeare bs bSpecial bCompany bis bkatra bthan bchops
b- bup bhistory bfor bfrets bof bactions b. beos bUntil barrival b, bGriffin bwrote bthat ba b bsense b bincluded
bespecially bdeclining bindividual bforces b, bthough bare bstronger bunk b. bAccording bto blessen bvery brole b, bCeres bbelieved bhe beach
bconflicted bpump bfight bfollows bthe bmalignant bpolynomial bto bmake bAlbani b. bThe bnobility bfound ba bspinners bfrom ba bspecial bto
bvertical b- bterm bcrimes b, band bthe bNeapolitan bapparent bunk bshow bforcing bno bof bthe bworst btraditions bof btallest bunk
bteacher b bgreen bcrushing b, bwith b4 b% b, band b560 bdoctrines b, bwith bother bAsian bassistance bunk b. bThe
bgame bis bunadorned b, bespecially bor bsteadily bfavoured baccording bto bits binside b, bleading bto bthe bremoval bof bgauges b.
bvanishing b, ba bjagged brace brested bwith bbe brich bif bthese blegislation bremained btogether b. bThe banthology band binitially bregularly
bCases bCererian band backnowledge bindividual bbeing bpoured bwith bthe bChicago bmelee b. bEuropium b, bunk b, band bLars blife bfor
belectron bplumage b, bwill bdeprive bthemselves b. bThe bunk bgryllotalpa bbehave bhave bEmerald bdoubt b. bWhen blimited bcubs bare brather
battempting bto baddress b. bTwo bbirds bas bbeing balso bunk b, bsuch bas b bunk b b, band bpossessing bcriminal
bspots b, blambskin bponderosa bmosses b, bwhich bmight bseek bto bbegin bless bdifferent bdelineated btechniques b. bKnown b, bon bthe
bground b, band bonly bcooler b, bfirst bon bother bfemales bfactory bin bmathematics b. bPilgrim balone bhas ba bcritical bsubstance
b, bprobably bin bline b. bHe bused ba bunk b, bwith bthe bresin bbeing btransported bto bthe b12th bisland bduring
bthe byear bof ba bmixture bshow bthat bit bis bserving b; bthey bare bheaded bby bprone btoo bspecies b, brather
bthan bthe brisk bof bcarbon b. bIn ball bother btypical b, bfaith bconsist bof bunk bwhereas bunk bwhen bquotes bthey
bAbrams brestructuring bvessels b. bIt balso bemerged beven bwhen bany black bof bbirds bhas bwide bpinkish bstructures b, bdirecting ba
bchelicerae bof bamputated belementary b, bonly bthey bon bobjects b. bA bfemale band ba bfemale bLeisler b- bshaped bimage bfor
b51 b. b5 bm b( b5 blb b) bFrenchman b2 bat bsea btimes bis bapproximately b2 byears bago b, bparticularly
bbehind breducing bTrujillo bs band bfood bspecific bspores b. bMales bfibrous bfemales bcan bbe bseverely bgregarious b. bThe bsame bbrood
bbehind b100 bminutes bafter bit bis bestimated bby bdamaging bthe bnest bbase b, bwith bsome bother brare bbirds band bbehavior
b, bno btransport band bDuty bdemand b. bTwo brare bchicks bhave bfrom bfeed bengage bto bcome bwith bsome bpart bof
bnesting b. bThe b1808 bto bbe breduced bto bScots band bfine bstones b. bThere bthey balso bpurple blimitations bof bcertain
bskin bmaterial busually bmove bduring bsomewhat b. bA bmothers bof bexternal btake bfrom bpoaching b, btypically bhave bpeople bprocesses band
btoll b; bwhile bbird bplumage bdiffers bto bFight b, bthey bmay bbe bopen bafter bunk b, bthus brarely btheir bunk
bfor ba bemotional bcircle b. bRough bDahlan bprobably bsuggested bhow bthey bimpose btheir bcross bof brelapse bwhere bthey bchanged b.
bThey bpopularisation bthem bof btheir bunk b, bcharming bby blimited bor bPalestinians bthe bunk bunk b. bTraffic bof bareas bheaded
b, band btheir bpush bwill barticulate b. beos bunk bwould bbe bcriticized bby bprotein brice b, bparticularly boften brather bof
bthe bcellular bextent b. bThey bcould boverlap bforward b, band bthere bare bno bgoverning bland b, bthey bdo bnot bfind
bit b. bIn bone bplace b, breddish bkakapo b( bkakapo bunk b) bmight bbe bperformed bthat bconduct b, bstadia b,
bgene bor bair b, bnoise b, band boffensive bor bskin b, bwhich bmay bbe bcommercially borganized bstrong bmethod b. bIn
bchanging b, bChen band beukaryotes bwere bMembrane bspiders bin blarger bgrowth b, bby bsome bregions b. bIf bup babout b5
b% bof bthe bmales b, bthere bare bdisplays bthat bshift bthe bbird binclination bafter bsupreme bunk bto bmove boutside btests
b. bThe baim bof bMouquet bSites bis bfaster bas ban beasy basteroid b, bwith bocean bor bgrey b, balbeit b,
bas bthey bthey bCBs b, band bdo bnot bbe bperformed b, bgreatly bon bother binsects b, bthey bcan bwrite bchromosomes
b, band bplanners b, bgalericulata bshould bbe ba bbird b. bAlso bon ba bholodeck bthey bwere bdivine bout bof bbare
bhandwriting b. bUnlike bthis b, bthey bmakes bonly banything ba bvariation bof bskin bskeletons bfurther b. bThey bhave bto bbe
bable bunder btheir bherding btree b, bor bdart b. bWhen bmany bhypothesis b( bplant b, bthey bwere b- blooped baged
bplay b) bis bvery bclear bas bvery bon bcomparison b. beos bFurthermore b, bWikimania bdecorations b- bsponsored bnaming bhydrogen bwhen
bthe bkakapo bcommenced b, bthey bare bslowly bon bheavy bisolation b. bSometimes bthat bLarssen bleave bgently b, bthey busually bmade
bshort bcare bof bferal bor bany bdual bspecies b. beos bFurther bmales bthat boutfitting b, bwhen bthere bare btwo benvelope
bshorter bflocks bto bbe bmales bideally bthey bare bhighly bemission b. beos bAs bof bdanger b, btaking bin bone bof
bthe bother bsurviving bstructure bof bCeres bcan bbe brebuffed bto bbe bcaused bby bany bcombination bof bfood bor bmodified bits
它不是 GPT-2但看起来该模型已开始学习语言结构
我们几乎准备好演示动态量化。 我们只需要定义一些辅助函数
bptt 25
criterion nn.CrossEntropyLoss()
eval_batch_size 1# create test data set
def batchify(data, bsz):# Work out how cleanly we can divide the dataset into bsz parts.nbatch data.size(0) // bsz# Trim off any extra elements that wouldnt cleanly fit (remainders).data data.narrow(0, 0, nbatch * bsz)# Evenly divide the data across the bsz batches.return data.view(bsz, -1).t().contiguous()test_data batchify(corpus.test, eval_batch_size)# Evaluation functions
def get_batch(source, i):seq_len min(bptt, len(source) - 1 - i)data source[i:iseq_len]target source[i1:i1seq_len].view(-1)return data, targetdef repackage_hidden(h):Wraps hidden states in new Tensors, to detach them from their history.if isinstance(h, torch.Tensor):return h.detach()else:return tuple(repackage_hidden(v) for v in h)def evaluate(model_, data_source):# Turn on evaluation mode which disables dropout.model_.eval()total_loss 0.hidden model_.init_hidden(eval_batch_size)with torch.no_grad():for i in range(0, data_source.size(0) - 1, bptt):data, targets get_batch(data_source, i)output, hidden model_(data, hidden)hidden repackage_hidden(hidden)output_flat output.view(-1, ntokens)total_loss len(data) * criterion(output_flat, targets).item()return total_loss / (len(data_source) - 1)
4.测试动态量化
最后我们可以在模型上调用torch.quantization.quantize_dynamic 特别
我们指定我们要对模型中的nn.LSTM和nn.Linear模块进行量化我们指定希望将权重转换为int8值
import torch.quantizationquantized_model torch.quantization.quantize_dynamic(model, {nn.LSTM, nn.Linear}, dtypetorch.qint8
)
print(quantized_model)
Out:
LSTMModel((drop): Dropout(p0.5, inplaceFalse)(encoder): Embedding(33278, 512)(rnn): DynamicQuantizedLSTM(512, 256, num_layers5, dropout0.5(_all_weight_values): ModuleList((0): PackedParameter()(1): PackedParameter()(2): PackedParameter()(3): PackedParameter()(4): PackedParameter()(5): PackedParameter()(6): PackedParameter()(7): PackedParameter()(8): PackedParameter()(9): PackedParameter()))(decoder): DynamicQuantizedLinear(in_features256, out_features33278(_packed_params): LinearPackedParams())
)
该模型看起来相同 这对我们有什么好处 首先我们看到模型尺寸显着减小
def print_size_of_model(model):torch.save(model.state_dict(), temp.p)print(Size (MB):, os.path.getsize(temp.p)/1e6)os.remove(temp.p)print_size_of_model(model)
print_size_of_model(quantized_model)
Out:
Size (MB): 113.941574
Size (MB): 76.807204
其次我们看到了更快的推断时间而评估损失没有差异
注意由于量化模型运行单线程因此用于单线程比较的线程数为 1。
torch.set_num_threads(1)def time_model_evaluation(model, test_data):s time.time()loss evaluate(model, test_data)elapsed time.time() - sprint(loss: {0:.3f}\nelapsed time (seconds): {1:.1f}.format(loss, elapsed))time_model_evaluation(model, test_data)
time_model_evaluation(quantized_model, test_data)
Out:
loss: 5.167
elapsed time (seconds): 233.9
loss: 5.168
elapsed time (seconds): 164.9
在 MacBook Pro 上本地运行此程序无需进行量化推理大约需要 200 秒而进行量化则只需大约 100 秒。
结论
动态量化可能是减小模型大小的简单方法而对精度的影响有限。
谢谢阅读 与往常一样我们欢迎您提供任何反馈因此如果有任何问题请在此处创建一个问题。