当前位置：首页 > news >正文

博物馆建设网站有什么好处惠安网站建设公司

news 2025/12/21 18:37:00

博物馆建设网站有什么好处,惠安网站建设公司,深圳坪山招聘网最新招聘信息,深圳网络推广网站推广目的本文不关注如何解析网页 html 元素和各种 python 爬虫技术#xff0c;仅作为一种网页数据的预处理手段进行研究。Llamaindex 也并不是爬虫技术的集大成者#xff0c;使用它是为了后续的存查一体化。安装依赖 pip install llama-index-readers-web # pip install llam…目的本文不关注如何解析网页 html 元素和各种 python 爬虫技术仅作为一种网页数据的预处理手段进行研究。Llamaindex 也并不是爬虫技术的集大成者使用它是为了后续的存查一体化。安装依赖 pip install llama-index-readers-web # pip install llama_index.embeddings.huggingface # pip install llama_index.llms.ollama注释部分是补充安装的内容。测试一下 vim test-web-bs.py官方示例默认代码 from llama_index.core import VectorStoreIndex, download_loaderfrom llama_index.readers.web import BeautifulSoupWebReaderloader BeautifulSoupWebReader() documents loader.load_data(urls[https://google.com]) index VectorStoreIndex.from_documents(documents) index.query(What language is on this website?)上述这个代码是访问 openai 的Google 也打不开运行不了 Could not load OpenAI embedding model. If you intended to use OpenAI, please check your OPENAI_API_KEY. Original error: No API key found for OpenAI.而且单独使用 index.query(What language is on this website?) 也报错AttributeError: ‘VectorStoreIndex’ object has no attribute ‘query’大修运行 from llama_index.core import VectorStoreIndex, download_loader from llama_index.core import Settingsfrom llama_index.readers.web import BeautifulSoupWebReader from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.llms.ollama import OllamaSettings.embed_model HuggingFaceEmbedding(model_name/root/RAGAll/models/bge-large-zh-v1.5 # 替换为你的本地模型路径 ) Settings.llm Ollama(base_urlhttp://10.11.12.13:11434,modelqwen2.5_7b,context_window4096,request_timeout120.0 ) loader BeautifulSoupWebReader() documents loader.load_data(urls[https://mp.weixin.qq.com/s/xxx-yyy]) #print(documents) index VectorStoreIndex.from_documents(documents) query_engine index.as_query_engine(similarity_top_k5, streamingTrue) your_query 本文主要讲了什么 #print(query_engine.query(your_query).response) response query_engine.query(your_query) response.print_response_stream()改善一下打印上面的 documents 观察到获取到的正文内容无用字符边角料颇多。使用下面的 Loader获取到的正文效果好很多。 from llama_index.readers.web import UnstructuredURLLoader urls [https://mp.weixin.qq.com/s/xyz ]loader UnstructuredURLLoader(urlsurls, continue_on_failureFalse, headers{User-Agent: value} )documents loader.load_data() print(documents)报错 AttributeError: ‘VectorStoreIndex’ object has no attribute ‘query’ 关于这个报错查阅了官方文档VectorStoreIndex 的确是没有 query 这个方法的所以应该是官方示例 demo 写错了。 documents loader.load_data(urls[https://www.baidu.com]) index VectorStoreIndex.from_documents(documents).as_query_engine() # 然后才可调用query方法 res index.query(What language is on this website?) # The language on this website is Chinese

查看全文

http://www.pierceye.com/news/916607/