月子会所网站源码,特效视频制作软件app,网站建设属于网络还是软件,企业网站建设制作多少钱最近都在研究和学习stable diffusion和langchain的相关知识#xff0c;并且看到stable diffusion也是有类似于ChatGLM的api调用方式#xff0c;那在想有没有可能将stable diffusion也集成到langchain中来呢#xff1f;看到网上资料比较多的是可以借助chatgpt来辅助stable di…最近都在研究和学习stable diffusion和langchain的相关知识并且看到stable diffusion也是有类似于ChatGLM的api调用方式那在想有没有可能将stable diffusion也集成到langchain中来呢看到网上资料比较多的是可以借助chatgpt来辅助stable diffusion提示词的生成本文就基于此思路来尝试利用LLMLangChainstable diffusion实现一句话自动生成图片的功能。
步骤
扩充提示词
使用OpenAI来生成提示词
参照“[AI协同打工ChatGPT生成提示词AI作图]”文中的方式生成stable diffusion的提示词
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain_template
以下提示用于指导Al绘画模型创建图像。它们包括人物外观、背景、颜色和光影效果以及图像的主题和风格等各种细节。这些提示的格式通常包括带权重的数字括号用于指定某些细节的重要性或强调。例如(masterpiece:1.4)表示作品的质量非常重要。以下是一些示例
1. (8k, RAW photo, best quality, masterpiece:1.2),(realistic, photo-realistic:1.37), ultra-detailed, 1girl, cute, solo, beautiful detailed sky, detailed cafe, night, sitting, dating, (nose blush), (smile:1.1),(closed mouth), medium breasts, beautiful detailed eyes, (collared shirt:1.1), bowtie, pleated skirt, (short hair:1.2), floating hair, ((masterpiece)), ((best quality)),
2. (masterpiece, finely detailed beautiful eyes: 1.2), ultra-detailed, illustration, 1 girl, blue hair black hair, japanese clothes, cherry blossoms, tori, street full of cherry blossoms, detailed background, realistic, volumetric light, sunbeam, light rays, sky, cloud,
3. highres, highest quallity, illustration, cinematic light, ultra detailed, detailed face, (detailed eyes, best quality, hyper detailed, masterpiece, (detailed face), blue hairlwhite hair, purple eyes, highest details, luminous eyes, medium breats, black halo, white clothes, backlighting, (midriff:1.4), light rays, (high contrast), (colorful)llm OpenAI(temperature0)
prompt PromptTemplate(input_variables[desc],template_template3,
)chain LLMChain(promptprompt,llmllm)res chain.run(湖人总冠军)print(res)
生成的提示词如下 (masterpiece:1.4), ultra-detailed, 1man, strong, solo, detailed basketball court, detailed stadium, night, standing, celebrating, (fist pump), (smile:1.1), (closed mouth), muscular body, beautiful detailed eyes, (jersey:1.1), shorts, (short hair:1.2), floating hair, (trophy:1.3), (confetti:1.2), (fireworks:1.2), (crowd cheering:1.2), (high contrast), (colorful)
将提示词直接输入到stable diffusion webui中得到结果如下 格式化输出
为了确保输出的结果可以方便解析可以再加入一些引导最终的提示词如下
_template
以下提示用于指导Al绘画模型创建图像。它们包括人物外观、背景、颜色和光影效果以及图像的主题和风格等各种细节。这些提示的格式通常包括带权重的数字括号用于指定某些细节的重要性或强调。例如(masterpiece:1.4)表示作品的质量非常重要。以下是一些示例
1. (8k, RAW photo, best quality, masterpiece:1.2),(realistic, photo-realistic:1.37), ultra-detailed, 1girl, cute, solo, beautiful detailed sky, detailed cafe, night, sitting, dating, (nose blush), (smile:1.1),(closed mouth), medium breasts, beautiful detailed eyes, (collared shirt:1.1), bowtie, pleated skirt, (short hair:1.2), floating hair, ((masterpiece)), ((best quality)),
2. (masterpiece, finely detailed beautiful eyes: 1.2), ultra-detailed, illustration, 1 girl, blue hair black hair, japanese clothes, cherry blossoms, tori, street full of cherry blossoms, detailed background, realistic, volumetric light, sunbeam, light rays, sky, cloud,
3. highres, highest quallity, illustration, cinematic light, ultra detailed, detailed face, (detailed eyes, best quality, hyper detailed, masterpiece, (detailed face), blue hairlwhite hair, purple eyes, highest details, luminous eyes, medium breats, black halo, white clothes, backlighting, (midriff:1.4), light rays, (high contrast), (colorful)仿照之前的提示写一段描写如下要素的提示
{desc}你应该仅以 JSON 格式响应如下所述:
返回格式如下:
{{question:$YOUR_QUESTION_HERE,answer: $YOUR_ANSWER_HERE
}}
确保响应可以被 Python json.loads 解析。最终生成的结果如下
{question:湖人总冠军,answer: (masterpiece:1.4), ultra-detailed, 1man, strong, solo, detailed basketball court, detailed stadium, night, standing, celebrating, (fist pump), (smile:1.1), (closed mouth), muscular body, beautiful detailed eyes, (jersey:1.1), shorts, (short hair:1.2), floating hair, (trophy:1.3), (confetti:1.2), (fireworks:1.2), (crowd cheering:1.2), (high contrast), (colorful)
}
这样我们就可以比较方便的解析数据了
# 解析json
import json
result json.loads(res)
print(result:,result)
result[answer]
使用ChatGLM来生成提示词
llm ChatGLM(temperature0.1,historyprompt_history)
prompt PromptTemplate(input_variables[desc],template_template,
)chain LLMChain(promptprompt,llmllm) ChatGLM基于[ChatGLM 集成进LangChain工具]的封装 最终生成的效果不是很好这里就不展示了。主要问题包括1.没有按照指令生成json格式2.生成的描述很多都是中文形式的。
[MagicPrompt]自动续写SD提示词
from transformers import AutoModelForCausalLM, AutoTokenizer,pipelinetext_refine_tokenizer AutoTokenizer.from_pretrained(Gustavosta/MagicPrompt-Stable-Diffusion)
text_refine_model AutoModelForCausalLM.from_pretrained(Gustavosta/MagicPrompt-Stable-Diffusion)
text_refine_gpt2_pipe pipeline(text-generation, modeltext_refine_model, tokenizertext_refine_tokenizer, devicecpu)text 湖人总冠军refined_text text_refine_gpt2_pipe(text)[0][generated_text]print(refined_text)
输出如下
湖人总冠军 港子 Imoko Ikeda, Minaba hideo, Yoshitaka Amano, Ruan Jia, Kentaro Miura, Artgerm, post processed, concept
纯英文输入最终的输出如下
lakers championship winner trending on artstation, painted by greg rutkowski
可见MagicPrompt对于中文输入不是很友好如果想使用的话需要将输入先翻译成英文。
调用stable diffusion的api生成图片
参考[Mikubill/sd-webui-controlnet]。主要代码如下
import cv2
import requests
import base64
import reENDPOINT http://localhost:7860def do_webui_request(url, **kwargs):reqbody {prompt: best quality, extremely detailed,negative_prompt: longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality,seed: -1,subseed: -1,subseed_strength: 0,batch_size: 1,n_iter: 1,steps: 15,cfg_scale: 7,width: 512,height: 768,restore_faces: True,eta: 0,sampler_index: Euler a,controlnet_input_images: [],controlnet_module: canny,controlnet_model: control_canny-fp16 [e3fe7712],controlnet_guidance: 1.0,}reqbody.update(kwargs)print(reqbody:,reqbody)r requests.post(url, jsonreqbody)return r.json()
调用api
import io
from PIL import Imageprompt a cute cat
resp do_webui_request(urlENDPOINT /sdapi/v1/txt2img,promptprompt,
)image Image.open(io.BytesIO(base64.b64decode(resp[images][0])))
display(image) 如果需要使用api功能stable diffusion 需要开启api功能启动时需要加上--api 结合stable diffusionLangChainLLM自动生成图片
stable diffusionLangChainOpenAI
封装实现
import io, base64
import uuid
from PIL import Imageclass RefinePrompt:llm OpenAI(temperature0)prompt PromptTemplate(input_variables[desc],template_template,)chain LLMChain(promptprompt,llmllm)def run(self,text):res self.chain.run(text)# 解析jsonresult json.loads(res)return result[answer]class T2I:def __init__(self):self.text_refine RefinePrompt()def inference(self, text):image_filename os.path.join(output/image, str(uuid.uuid4())[0:8] .png)refined_text self.text_refine.run(text)print(f{text} refined to {refined_text})resp do_webui_request(urlENDPOINT /sdapi/v1/txt2img,promptrefined_text,)image Image.open(io.BytesIO(base64.b64decode(resp[images][0])))image.save(image_filename)print(fProcessed T2I.run, text: {text}, image_filename: {image_filename})return image_filename,image
使用封装的类并且展示图片在python的notebook中展示
t2i T2I()
image_filename,image t2i.inference(湖人总冠军)
print(filename:,image_filename)
display(image) stable diffusionMagicPrompt
封装实现
from transformers import AutoModelForCausalLM, AutoTokenizer, CLIPSegProcessor, CLIPSegForImageSegmentation
from transformers import pipeline, BlipProcessor, BlipForConditionalGeneration, BlipForQuestionAnswering
import io, base64
import uuid
from PIL import Imageclass T2I:def __init__(self, device):print(Initializing T2I to %s % device)self.device deviceself.text_refine_tokenizer AutoTokenizer.from_pretrained(Gustavosta/MagicPrompt-Stable-Diffusion)self.text_refine_model AutoModelForCausalLM.from_pretrained(Gustavosta/MagicPrompt-Stable-Diffusion)self.text_refine_gpt2_pipe pipeline(text-generation, modelself.text_refine_model, tokenizerself.text_refine_tokenizer, deviceself.device)def inference(self, text,image_pathNone):image_filename os.path.join(output/image, str(uuid.uuid4())[0:8] .png)refined_text self.text_refine_gpt2_pipe(text)[0][generated_text]print(f{text} refined to {refined_text})resp do_webui_request(urlENDPOINT /sdapi/v1/txt2img,promptrefined_text,controlnet_input_images[readImage(image_path) if image_path else None], )image Image.open(io.BytesIO(base64.b64decode(resp[images][0])))image.save(image_filename)print(fProcessed T2I.run, text: {text}, image_filename: {image_filename})return image_filename,image
使用封装的类并且展示图片在python的notebook中展示
t2i T2I(cpu)
image_filename,image t2i.inference(lakers championship)
print(filename:,image_filename)
display(image) 总结
本文使用了stable diffusionLangChainLLM来实现一句话自动生成图片的功能虽然最终的效果还不是很满意但是可以看出来方案可行的。如果还需要优化效果的话可以尝试1.针对特不同模型需要输入该模型的更多的示例来辅助和优化最终模型的生成2.尝试结合controlnet来更好的控制最终图片的生成。 ps在学习和参考[Mikubill/sd-webui-controlnet]的代码时发现了其中有一个模仿“Visual ChatGPT”的示例代码还挺有意思的接下来也会进一步分析其实现敬请期待。