当前位置：首页 > news >正文

创业做社交网站有哪些校园网站建设需要哪些

news 2025/12/20 18:48:18

创业做社交网站有哪些,校园网站建设需要哪些,网站友链外链,企业网站的建立之前必须首先确定在自然语言处理领域#xff0c;有一个常见且重要的任务就是文本相似度搜索。文本相似度搜索是指根据用户输入的一段文本#xff0c;从数据库中找出与之最相似或最相关的一段或多段文本。它可以应用在很多场景中#xff0c;例如问答系统、推荐系统、搜索引擎等。比如#…在自然语言处理领域有一个常见且重要的任务就是文本相似度搜索。文本相似度搜索是指根据用户输入的一段文本从数据库中找出与之最相似或最相关的一段或多段文本。它可以应用在很多场景中例如问答系统、推荐系统、搜索引擎等。比如当用户在知乎上提出一个问题时系统就可以从知乎上已有的回答中找出与该问题最匹配或最有价值的回答并展示给用户。要实现类似高效的搜索我们需要使用一些特殊的数据结构和算法。其中向量相似度搜索是一种在大规模数据搜索中表现优秀的算法。而Redis作为一种高性能的键值数据库也可以帮助我们实现向量相似度搜索。在开始学习如何使用Redis实现向量相似度搜索之前需要了解向量及向量相似度搜索的基本知识和原理以便更好地理解后面的内容。什么是向量向量是数学、物理学和工程科学等多个自然科学中的基本概念它是一个具有方向和长度的量用于描述问题如空间几何、力学、信号处理等。在计算机科学中向量被用于表示数据如文本、图像或音频。此外向量还代表AI模型对文本、图像、音频、视频等非结构化数据的印象。向量相似度搜索的基本原理向量相似度搜索的基本原理是通过将数据集中的每个元素映射为向量并使用特定相似度计算算法如基于余弦相似度的、基于欧氏相似度或基于Jaccard相似度等算法找到与查询向量最相似的向量。 Redis实现向量相似度搜索了解原理后我们开始来实现如何使用Redis实现向量相似度搜索。Redis允许我们在FT.SEARCH命令中使用向量相似度查询。使我们可以加载、索引和查询作为Redis哈希或JSON文档中字段存储的向量。 //相关文档地址 Vector similarity | Redis 1、Redis Search安装关于Redis Search的安装和使用此处不再赘述如果您对此不熟悉可以参考上一篇文章 C#Redis Search如何用Redis实现高性能全文搜索 2、创建向量索引库这里我们使用NRedisStack和StackExchange.Redis两个库来与Redis进行交互操作。 //创建一个Redis连接 static ConnectionMultiplexer mux ConnectionMultiplexer.Connect(localhost); //获取一个Redis数据库 static IDatabase db mux.GetDatabase(); //创建一个RediSearch客户端 static SearchCommands ft new SearchCommands(db, null); 在进行向量搜索之前首先需要定义并创建索引并指定相似性算法。 public static async Task CreateIndexAsync() {await ft.CreateAsync(indexName,new FTCreateParams().On(IndexDataType.HASH).Prefix(prefix),new Schema().AddTagField(tag).AddTextField(content).AddVectorField(vector,VectorField.VectorAlgo.HNSW,new Dictionarystring, object(){[TYPE] FLOAT32,[DIM] 2,[DISTANCE_METRIC] COSINE})); } 这段代码的意思是使用了一个异步方法 ft.CreateAsync 来创建索引。它接受三个参数索引名称 indexName一个 FTCreateParams 对象和一个 Schema 对象FTCreateParams 类提供了一些参数选项用于指定索引的参数。这里使用 .On(IndexDataType.HASH) 方法来指定索引数据类型为哈希并使用 .Prefix(prefix) 方法来指定索引数据的前缀Schema 类用于定义索引中的字段和字段类型。这里定义了一个标签字段tag field用于区分过虑数据。定义了一个文本字段text field用于存储原始数据以及一个向量字段vector field用于存储经原始数据转化后的向量数据使用了 VectorField.VectorAlgo.HNSW 来指定向量算法为 HNSWHierarchical Navigable Small World。还传递了一个字典对象用于设置向量字段的参数。其中键为字符串类型值为对象类型。目前Redis支持两种相似度算法 HNSW分层导航小世界算法使用小世界网络构建索引具有快速查询速度和小内存占用时间复杂度为O(logn)适用于大规模索引。 FLAT暴力算法它对所有的键值对进行扫描然后根据键值对的距离计算出最短路径时间复杂度为O(n)其中n是键值对的数量。这种算法时间复杂度非常高只适用于小规模的索引。 3、添加向量到索引库索引创建后我们将数据添加到索引中。 public async Task SetAsync(string docId, string prefix, string tag, string content, float[] vector) {await db.HashSetAsync(${prefix}{docId}, new HashEntry[] {new HashEntry (tag, tag),new HashEntry (content, content),new HashEntry (vector, vector.SelectMany(BitConverter.GetBytes).ToArray())}); } SetAsync方法用于将一个具有指定文档ID、前缀、标签、内容及内容的向量存储到索引库中。并使用SelectMany()方法和BitConverter.GetBytes()方法将向量转换为一个字节数组。 4、向量搜索 Redis 支持两种类型的向量查询KNN查询和Range查询也可以将两种查询混合使用。 KNN 查询 KNN 查询用于在给定查询向量的情况下查找前 N 个最相似的向量。 public async IAsyncEnumerable(string Content, double Score) SearchAsync(float[] vector, int limit) {var query new Query($*[KNN {limit} vector $vector AS score]).AddParam(vector, vector.SelectMany(BitConverter.GetBytes).ToArray()).SetSortBy(score).ReturnFields(content, score).Limit(0, limit).Dialect(2);var result await ft.SearchAsync(indexName, query).ConfigureAwait(false);foreach (var document in result.Documents){yield return (document[content],Convert.ToDouble(document[score]));} } 这段代码的意思是创建一个查询对象 query并设置查询条件。查询条件包括 *[KNN {limit} vector $vector AS score]使用KNN算法进行向量相似度搜索限制结果数量为limit使用给定的向量vector作为查询向量将查询结果按照相似度得分进行排序AddParam(vector, vector.SelectMany(BitConverter.GetBytes).ToArray())将浮点数数组转换为字节数组并将其作为查询参数传递给查询SetSortBy(score)按照相似度得分对结果进行排序ReturnFields(content, score)将content和score两个字段从结果集中返回Limit(0, limit)限制结果集的起始位置为0结果数量为limitDialect(2)设置查询方言为2即Redis默认的查询语言Redis Protocol调用异步搜索方法 ft.SearchAsync(indexName, query)并等待搜索结果遍历搜索结果集 result.Documents将每个文档转换为 (string Content, double Score) 元组并通过 yield 语句进行迭代返回。 Range 查询 Range查询提供了一种根据 Redis 中的向量字段与基于某些预定义阈值半径的查询向量之间的距离来过滤结果的方法。类似于 NUMERIC 和 GEO 子句可以在查询中多次出现特别是可以和 KNN 进行混合搜索。 public static async IAsyncEnumerable(string Tag, string Content, double Score) SearchAsync(string tag, float[] vector, int limit) {var query new Query($(tag:{tag})[KNN {limit} vector $vector AS score]).AddParam(vector, vector.SelectMany(BitConverter.GetBytes).ToArray()).SetSortBy(score).ReturnFields(tag, content, score).Limit(0, limit).Dialect(2);var result await ft.SearchAsync(indexName, query).ConfigureAwait(false);foreach (var document in result.Documents){yield return (document[tag], document[content], Convert.ToDouble(document[score]));} } 这段代码使用了KNN和Range混合查询与上一段代码相比新增了tag参数将限制结果仅包含给定标签的内容。这样做可以增加查询的准确性提高查询效率。 5、从索引库中删除向量 public async Task DeleteAsync(string docId, string prefix) {await db.KeyDeleteAsync(${prefix}{docId}); } 这个方法通过删除与指定向量相关联的哈希缓存键来实现从索引库中删除指定向量数据。 6、删除向量索引库 public async Task DropIndexAsync() {await ft.DropIndexAsync(indexName, true); } 这个方法 await ft.DropIndexAsync接受两个参数 indexName 和 true 。indexName 表示索引库的名称 true 表示在删除索引时是否删除索引文件。 7、查询索引库信息 public async TaskInfoResult InfoAsync() {return await ft.InfoAsync(indexName); } 通过 await ft.InfoAsync(indexName) 方法我们可以获取到指定索引库的大小文档数量等相关索引库信息。完整 Demo 如下 using NRedisStack; using NRedisStack.Search; using NRedisStack.Search.DataTypes; using NRedisStack.Search.Literals.Enums; using StackExchange.Redis; using static NRedisStack.Search.Schema;namespace RedisVectorExample {class Program{//创建一个Redis连接static ConnectionMultiplexer mux ConnectionMultiplexer.Connect(localhost);//获取一个Redis数据库static IDatabase db mux.GetDatabase();//创建一个RediSearch客户端static SearchCommands ft new SearchCommands(db, null);//索引名称static string indexName test:index;//索引前缀static string prefix test:data;static async Task Main(string[] args){ //创建一个向量的索引await CreateIndexAsync();//添加一些向量到索引中await SetAsync(1, A, 测试数据A1, new float[] { 0.1f, 0.2f });await SetAsync(2, A, 测试数据A2, new float[] { 0.3f, 0.4f });await SetAsync(3, B, 测试数据B1, new float[] { 0.5f, 0.6f });await SetAsync(4, C, 测试数据C1, new float[] { 0.7f, 0.8f });//删除一个向量await DeleteAsync(4);//KUN搜索 await foreach (var (Content, Score) in SearchAsync(new float[] { 0.1f, 0.2f }, 2)){Console.WriteLine($内容{Content}相似度得分{Score});}//混合await foreach (var (Tag, Content, Score) in SearchAsync(A, new float[] { 0.1f, 0.2f }, 2)){Console.WriteLine($标签{Tag}内容{Content}相似度得分{Score});}//检查索引是否存在var info await InfoAsync();if (info ! null)await DropIndexAsync(); //存在则删除索引}public static async Task CreateIndexAsync(){await ft.CreateAsync(indexName,new FTCreateParams().On(IndexDataType.HASH).Prefix(prefix),new Schema().AddTagField(tag).AddTextField(content).AddVectorField(vector,VectorField.VectorAlgo.HNSW,new Dictionarystring, object(){[TYPE] FLOAT32,[DIM] 2,[DISTANCE_METRIC] COSINE}));}public static async Task SetAsync(string docId, string tag, string content, float[] vector){await db.HashSetAsync(${prefix}{docId}, new HashEntry[] {new HashEntry (tag, tag),new HashEntry (content, content),new HashEntry (vector, vector.SelectMany(BitConverter.GetBytes).ToArray())});}public static async Task DeleteAsync(string docId){await db.KeyDeleteAsync(${prefix}{docId});}public static async Task DropIndexAsync(){await ft.DropIndexAsync(indexName, true);}public static async TaskInfoResult InfoAsync(){return await ft.InfoAsync(indexName);}public static async IAsyncEnumerable(string Content, double Score) SearchAsync(float[] vector, int limit){var query new Query($*[KNN {limit} vector $vector AS score]).AddParam(vector, vector.SelectMany(BitConverter.GetBytes).ToArray()).SetSortBy(score).ReturnFields(content, score).Limit(0, limit).Dialect(2);var result await ft.SearchAsync(indexName, query).ConfigureAwait(false);foreach (var document in result.Documents){yield return (document[content], Convert.ToDouble(document[score]));}}public static async IAsyncEnumerable(string Tag, string Content, double Score) SearchAsync(string tag, float[] vector, int limit){var query new Query($(tag:{tag})[KNN {limit} vector $vector AS score]).AddParam(vector, vector.SelectMany(BitConverter.GetBytes).ToArray()).SetSortBy(score).ReturnFields(tag, content, score).Limit(0, limit).Dialect(2);var result await ft.SearchAsync(indexName, query).ConfigureAwait(false);foreach (var document in result.Documents){yield return (document[tag], document[content], Convert.ToDouble(document[score]));}}} } 篇幅原因先到这里下一篇我们接着探讨如何利用ChatGPT Embeddings技术提取文本向量并基于Redis实现文本相似度匹配。相比传统方法这种方式能够更好地保留文本的语义和情感信息从而更准确地反映文本的实质性内容。

查看全文

http://www.pierceye.com/news/24061/