当前位置：首页 > news >正文

淘宝做导航网站有哪些功能吗电子印章在线制作生成器

news 2025/12/22 8:37:11

淘宝做导航网站有哪些功能吗,电子印章在线制作生成器,手机网站可以做公众号,深圳设计公司最新招聘在本文中#xff0c;我们将利用 Hugging Face Diffusers 库的组件实现自己的稳定扩散模型#xff0c;可以像 diffuser.diffuse() 一样简单地生成图像。在线工具推荐#xff1a; Three.js AI纹理开发包 - YOLO合成数据生成器 - GLTF/GLB在线编辑 - 3D模型格式在线转换 - 可编…在本文中我们将利用 Hugging Face Diffusers 库的组件实现自己的稳定扩散模型可以像 diffuser.diffuse() 一样简单地生成图像。在线工具推荐 Three.js AI纹理开发包 - YOLO合成数据生成器 - GLTF/GLB在线编辑 - 3D模型格式在线转换 - 可编程3D场景编辑器 1、概述在我们开始使用代码之前让我们回顾一下扩散器的推理工作原理。我们向扩散器输入提示。该提示通过文本编码器给出数学表示嵌入。产生了潜在的噪声。U-Net 结合提示来预测潜在的噪声。与调度程序一起从潜在噪声中减去预测噪声。经过多次迭代后去噪后的潜在图像被解压缩以生成最终生成的图像。使用的主要组件有文本编码器U-Net模型VAE 解码器 2、环境搭建 ! pip install -Uqq fastcore transformers diffusers import logging; logging.disable(logging.WARNING) # 1 from fastcore.all import * from fastai.imports import * from fastai.vision.all import * 3、获取组件要处理提示我们需要下载CLIP分词器和文本编码器。分词器会将提示分割成标记而文本编码器会将标记转换为数字表示嵌入。 from transformers import CLIPTokenizer, CLIPTextModeltokz CLIPTokenizer.from_pretrained(openai/clip-vit-large-patch14, torch_dtypetorch.float16) txt_enc CLIPTextModel.from_pretrained(openai/clip-vit-large-patch14, torch_dtypetorch.float16).to(cuda) float16 用于提高性能。 U-Net将预测图像中的噪声而VAE将对生成的图像进行解压缩。 from diffusers import AutoencoderKL, UNet2DConditionModelvae AutoencoderKL.from_pretrained(stabilityai/sd-vae-ft-ema, torch_dtypetorch.float16).to(cuda) unet UNet2DConditionModel.from_pretrained(CompVis/stable-diffusion-v1-4, subfolderunet, torch_dtypetorch.float16).to(cuda) 调度器scheduler将控制最初添加到图像中的噪声量还将控制从图像中减去 U-Net 预测的噪声量。 from diffusers import LMSDiscreteSchedulersched LMSDiscreteScheduler(beta_start 0.00085,beta_end 0.012,beta_schedule scaled_linear,num_train_timesteps 1000 ); sched LMSDiscreteScheduler {_class_name: LMSDiscreteScheduler,_diffusers_version: 0.16.0,beta_end: 0.012,beta_schedule: scaled_linear,beta_start: 0.00085,num_train_timesteps: 1000,prediction_type: epsilon,trained_betas: null } 4、定义生成参数生成所需的六个主要参数是 prompt提示w, h图像的宽度和高度n_inf_steps描述输出图像的噪声程度的数字推理步数g_scale描述扩散器应遵循提示的程度的数字引导尺度bs批大小seed种子 prompt [a photograph of an astronaut riding a horse] w, h 512, 512 n_inf_steps 70 g_scale 7.5 bs 1 seed 77 5、编码提示现在我们需要解析提示。为此我们首先将其分词然后对得到的标记进行编码以生成嵌入。首先让我们进行分词 txt_inp tokz(prompt,padding max_length,max_length tokz.model_max_length,truncation True,return_tensors pt ); txt_inp 结果如下 {input_ids: tensor([[49406, 320, 8853, 539, 550, 18376, 6765, 320, 4558, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407]]), attention_mask: tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0]])} 标记 49407 是一个填充标记表示 |endoftext|。这些标记的注意力掩码为 0。 tokz.decode(49407) 输出如下 |endoftext| 现在使用文本编码器我们将创建这些标记的嵌入向量 txt_emb txt_enc(txt_inp[input_ids].to(cuda))[0].half(); txt_emb 输出如下 tensor([[[-0.3884, 0.0229, -0.0523, ..., -0.4902, -0.3066, 0.0674],[ 0.0292, -1.3242, 0.3076, ..., -0.5254, 0.9766, 0.6655],[ 0.4609, 0.5610, 1.6689, ..., -1.9502, -1.2266, 0.0093],...,[-3.0410, -0.0674, -0.1777, ..., 0.3950, -0.0174, 0.7671],[-3.0566, -0.1058, -0.1936, ..., 0.4258, -0.0184, 0.7588],[-2.9844, -0.0850, -0.1726, ..., 0.4373, 0.0092, 0.7490]]],devicecuda:0, dtypetorch.float16, grad_fnNativeLayerNormBackward0) 查看txt_emb的形状 txt_emb.shape输出如下 torch.Size([1, 77, 768]) 6、CFG 的嵌入我们还需要为空提示也称为无条件提示创建嵌入。这种嵌入用于控制引导。 txt_inp[input_ids].shape torch.Size([1, 77]) max_len txt_inp[input_ids].shape[-1] # 1 uncond_inp tokz([] * bs, # 2padding max_length,max_length max_len,return_tensors pt, ); uncond_inp 我们使用提示的最大长度因此无条件提示嵌入与文本提示嵌入的大小相匹配。我们还将包含空提示的列表与批量大小相乘以便每个文本提示都有一个空提示。 {input_ids: tensor([[49406, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,49407, 49407, 49407, 49407, 49407, 49407, 49407]]), attention_mask: tensor([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0]])} uncond_inp[input_ids].shape torch.Size([1, 77]) uncond_emb txt_enc(uncond_inp[input_ids].to(cuda))[0].half() uncond_emb.shape torch.Size([1, 77, 768]) 然后我们可以将无条件嵌入和文本嵌入连接在一起。这允许根据每个提示生成图像而无需通过 U-Net 两次。 embs torch.cat([uncond_emb, txt_emb]) 7、创建噪声图像现在是时候创建我们的噪声图像了这将是生成的起点。我们将创建一个64 x 64 像素的单个潜在图像并且也有 4 个通道。对潜在图像进行去噪后我们将其解压缩为具有 3 个通道的 512 x 512 像素图像。 bs, unet.config.in_channels, h//8, w//8 (1, 4, 64, 64) print(torch.randn((2, 3, 4))) print(torch.randn((2, 3, 4)).shape) tensor([[[ 0.2818, 1.9993, -0.2554, -1.8170],[-0.5899, 0.6199, 0.4697, 0.8363],[ 0.4416, -1.1702, 0.0392, -1.3377]],[[ 1.6029, 0.2883, -0.4365, 0.5624],[-1.4361, -0.6055, 0.9542, -0.2457],[-1.4045, -0.2218, 0.3492, -0.1245]]]) torch.Size([2, 3, 4]) torch.manual_seed(seed) lats torch.randn((bs, unet.config.in_channels, h//8, w//8)); lats.shape torch.Size([1, 4, 64, 64]) 潜在张量是 4 阶张量。 1 指的是批量大小即生成的图像数量。 4 是通道数64 是高度和宽度的像素数。 lats lats.to(cuda).half(); lats tensor([[[[-0.5044, -0.4163, -0.1365, ..., -1.6104, 0.1381, 1.7676],[ 0.7017, 1.5947, -1.4434, ..., -1.5859, -0.4089, -2.8164],[ 1.0664, -0.0923, 0.3462, ..., -0.2390, -1.0947, 0.7554],...,[-1.0283, 0.2433, 0.3337, ..., 0.6641, 0.4219, 0.7065],[ 0.4280, -1.5439, 0.1409, ..., 0.8989, -1.0049, 0.0482],[-1.8682, 0.4988, 0.4668, ..., -0.5874, -0.4019, -0.2856]],[[ 0.5688, -1.2715, -1.4980, ..., 0.2230, 1.4785, -0.6821],[ 1.8418, -0.5117, 1.1934, ..., -0.7222, -0.7417, 1.0479],[-0.6558, 0.1201, 1.4971, ..., 0.1454, 0.4714, 0.2441],...,[ 0.9492, 0.1953, -2.4141, ..., -0.5176, 1.1191, 0.5879],[ 0.2129, 1.8643, -1.8506, ..., 0.8096, -1.5264, 0.3191],[-0.3640, -0.9189, 0.8931, ..., -0.4944, 0.3916, -0.1406]],[[-0.5259, 1.5059, -0.3413, ..., 1.2539, 0.3669, -0.1593],[-0.2957, -0.1169, -2.0078, ..., 1.9268, 0.3833, -0.0992],[ 0.5020, 1.0068, -0.9907, ..., -0.3008, 0.7324, -1.1963],...,[-0.7437, -1.1250, 0.1349, ..., -0.6714, -0.6753, -0.7920],[ 0.5415, -0.5269, -1.0166, ..., 1.1270, -1.7637, -1.5156],[-0.2319, 0.9165, 1.6318, ..., 0.6602, -1.2871, 1.7568]],[[ 0.7100, 0.4133, 0.5513, ..., 0.0326, 0.9175, 1.4922],[ 0.8862, 1.3760, 0.8599, ..., -2.1172, -1.6533, 0.8955],[-0.7783, -0.0246, 1.4717, ..., 0.0328, 0.4316, -0.6416],...,[ 0.0855, -0.1279, -0.0319, ..., -0.2817, 1.2744, -0.5854],[ 0.2402, 1.3945, -2.4062, ..., 0.3435, -0.5254, 1.2441],[ 1.6377, 1.2539, 0.6099, ..., 1.5391, -0.6304, 0.9092]]]],devicecuda:0, dtypetorch.float16) 我们的潜在变量具有代表噪声的随机值。这种噪声需要进行缩放以便它可以与调度程序一起工作。 #| id: DgrthbcIEzVO #| colab: {base_uri: https://localhost:8080/} #| id: DgrthbcIEzVO #| outputId: 761f0f3c-010e-4dfa-b7a3-6d94d026d4cc sched.set_timesteps(n_inf_steps); sched LMSDiscreteScheduler {_class_name: LMSDiscreteScheduler,_diffusers_version: 0.16.0,beta_end: 0.012,beta_schedule: scaled_linear,beta_start: 0.00085,num_train_timesteps: 1000,prediction_type: epsilon,trained_betas: null } lats * sched.init_noise_sigma; sched.init_noise_sigma tensor(14.6146) sched.sigmas tensor([14.6146, 13.3974, 12.3033, 11.3184, 10.4301, 9.6279, 8.9020, 8.2443,7.6472, 7.1044, 6.6102, 6.1594, 5.7477, 5.3709, 5.0258, 4.7090,4.4178, 4.1497, 3.9026, 3.6744, 3.4634, 3.2680, 3.0867, 2.9183,2.7616, 2.6157, 2.4794, 2.3521, 2.2330, 2.1213, 2.0165, 1.9180,1.8252, 1.7378, 1.6552, 1.5771, 1.5031, 1.4330, 1.3664, 1.3030,1.2427, 1.1852, 1.1302, 1.0776, 1.0272, 0.9788, 0.9324, 0.8876,0.8445, 0.8029, 0.7626, 0.7236, 0.6858, 0.6490, 0.6131, 0.5781,0.5438, 0.5102, 0.4770, 0.4443, 0.4118, 0.3795, 0.3470, 0.3141,0.2805, 0.2455, 0.2084, 0.1672, 0.1174, 0.0292, 0.0000]) sched.timesteps tensor([999.0000, 984.5217, 970.0435, 955.5652, 941.0870, 926.6087, 912.1304,897.6522, 883.1739, 868.6957, 854.2174, 839.7391, 825.2609, 810.7826,796.3043, 781.8261, 767.3478, 752.8696, 738.3913, 723.9130, 709.4348,694.9565, 680.4783, 666.0000, 651.5217, 637.0435, 622.5652, 608.0870,593.6087, 579.1304, 564.6522, 550.1739, 535.6957, 521.2174, 506.7391,492.2609, 477.7826, 463.3043, 448.8261, 434.3478, 419.8696, 405.3913,390.9130, 376.4348, 361.9565, 347.4783, 333.0000, 318.5217, 304.0435,289.5652, 275.0870, 260.6087, 246.1304, 231.6522, 217.1739, 202.6957,188.2174, 173.7391, 159.2609, 144.7826, 130.3043, 115.8261, 101.3478,86.8696, 72.3913, 57.9130, 43.4348, 28.9565, 14.4783, 0.0000],dtypetorch.float64) plt.plot(sched.timesteps, sched.sigmas[:-1]) 8、去噪降噪过程现在可以开始了 from tqdm.auto import tqdmfor i, ts in enumerate(tqdm(sched.timesteps)):inp torch.cat([lats] * 2) # 1inp sched.scale_model_input(inp, ts) # 2with torch.no_grad(): preds unet(inp, ts, encoder_hidden_statesembs).sample # 3pred_uncond, pred_txt preds.chunk(2) # 4pred pred_uncond g_scale * (pred_txt - pred_uncond) # 4lats sched.step(pred, ts, lats).prev_sample #5 我们首先创建两个潜在变量一个用于文本提示一个用于无条件提示。然后我们进一步缩放潜在的噪声。然后我们预测噪声。然后我们进行指导。然后我们从图像中减去预测的引导噪声。 9、解码我们现在可以解码潜在图像并显示它。 with torch.no_grad(): img vae.decode(1/0.18215*lats).sample img (img / 2 0.5).clamp(0, 1) img img[0].detach().cpu().permute(1, 2, 0).numpy() img (img * 255).round().astype(uint8) Image.fromarray(img) 现在你就拥有了我们使用文本编码器、VAE 和 U-Net 实现的稳定扩散原文链接组装自己的稳定扩散 - BimAnt

查看全文

http://www.pierceye.com/news/53918/