当前位置：首页 > news >正文

网站手机版电脑版怎么做企健网网址

news 2025/11/22 17:55:47

网站手机版电脑版怎么做,企健网网址,优化网站关键词的技巧,网站怎么设置为可信任网站0. 引言 solr作为搜索引擎经常用于各类查询场景#xff0c;我们之前讲解了solr的查询语法#xff0c;而除了普通的查询语法#xff0c;有时我们还需要实现聚合查询来统计一些指标#xff0c;所以今天我们接着来查看solr的聚合查询语法 1. 常用聚合查询语法以下演示我们…0. 引言 solr作为搜索引擎经常用于各类查询场景我们之前讲解了solr的查询语法而除了普通的查询语法有时我们还需要实现聚合查询来统计一些指标所以今天我们接着来查看solr的聚合查询语法 1. 常用聚合查询语法以下演示我们基于之前创建的核心数据进行可以参考专栏之前的文章核心字段结构如下 order_no : 订单号 address: 地址 product_name: 商品 status: 状态 labels: 标签 remarks: 备注 1.1 group 分组查询 1.1.1 简介 group用于实现简单的聚合分组查询、数值计算等官网文档https://solr.apache.org/guide/8_2/result-grouping.html 1.1.2 参数 group: 设置为true后查询按分组显示group:truegroup.field: 根据哪个字段设置分组配合group:true使用 q*:*grouptruegroup.fieldproduct_namegroup.limit限制返回的docs条数默认为1如上所示的示例中我们发现每个分组不仅返回的分组数也返回了命中的详细数据有的时候我们不需要详细数据这时就可以将group.limit设置为0来控制返回条数 q*:*grouptruegroup.fieldproduct_namegroup.limit0group.func: 根据函数计算出来的值进行分组函数支持求和sum,最小值min,最大值max q*:*grouptruegroup.funcsum(status)group.query根据查询条件进行分组。比如将数据按照0id2, 3id5, 6id20进行分组 q*:*grouptruegroup.queryid:[0 TO 2]group.queryid:[3 TO 5]group.queryid:[6 TO 20]group.format, 支持两个值grouped和simple默认为grouped, 按分组结果展示如果设置为simple则会将匹配的结果按平面列表展示具体可见下图 q*:*grouptruegroup.fieldproduct_namegroup.formatsimple q*:*grouptruegroup.fieldproduct_namegroup.formatgroupedgroup.main是否将第一个字段分组的结果作为返回数据的主结果集有点类似于group.formatsimple (以下解释暂为个人理解待深入验证可能存在误解仅供参考) q*:*grouptruegroup.fieldproduct_namegroup.maintrue多个分组条件时显示的就是按照优先级排序后的第一个分组条件的结果详情列表这里满足group.fieldstatus分组的列表个数是10个满足group.query:id[0 TO 4]的是4个因为排序下来group.query:id[0 TO 4]是第一个分组结果集所以返回的是4个 group.sort每个分组内文档的排序方式如下所示根据status分组每个分组内docs返回根据id逆序 q*:*grouptruegroup.fieldstatusgroup.sortid descgroup.limit5group.offset每个分组返回的docs的起点位位置如下所示设置group.offset6则从id6后的数据开始显示 q*:*grouptruegroup.fieldstatusgroup.limit4group.offset6group.ngroups是否显示分组数默认为false q*:*grouptruegroup.fieldstatusgroup.limit4group.ngroupstruegroup.truncate默认为false 当为ture时facet计数将基于每组中与查询条件相关度高的文档而不是全部文档研究中暂未找到合适案例row显示分桶数量默认为10有时我们分桶数据不止10个需要增加显示可以用row参数设置相当于sql中的limitstart从第几个开始与row配合共同组成分页显示sort: 根据指定字段进行组间排序有该字段的桶也将排序在前。与group.sort的区别是sort用于控制组间排序group.sort控制组内排序。 q*:*grouptruegroup.fieldproduct_namegroup.limit10rows10sortstatus descgroup.cache.percent分组搜索结果占用堆内存的百分比当设置大于0时即开启搜索结果缓存数值越大允许缓存占用的堆内存大小越大。根据官方的解释该配置会提升布尔、通配符、模糊查询的效率但却会降低普通查询效率 1.1.3 案例 1、统计销量排行前5的商品思路在开始之前我们需要注意单纯使用group实际上是无法完成此题的因为group不支持按照各个桶数量进行排序需要使用facet我们将在下文讲解但如果只使用group的话排序需要借助java代码再来实现 DSL: q*:*grouptruegroup.fieldproduct_namegroup.limit0rows100查询结果 solrJ客户端代码 RestController RequestMapping(group) AllArgsConstructor public class GroupSearchController {private final HttpSolrClient solrClient;GetMapping(sellTopFive)public MapString, Long sellTopFive() {MapString, Long result new HashMap();// 设置查询条件SolrQuery query new SolrQuery().setQuery(*:*).setRows(100);// 设置分组条件query.set(GroupParams.GROUP, true).set(GroupParams.GROUP_FIELD, product_name).set(GroupParams.GROUP_LIMIT, 0);try {QueryResponse response solrClient.query(orders,query);GroupResponse groupResponse response.getGroupResponse();ListGroupCommand values groupResponse.getValues();GroupCommand group values.get(0);ListGroup productGroups group.getValues();for (Group productGroup : productGroups) {result.put(productGroup.getGroupValue(), productGroup.getResult().getNumFound());}// 根据数量逆序排序截取前5return result.entrySet().stream().sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())).limit(5).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) - e1, LinkedHashMap::new));} catch (SolrServerException | IOException e) {e.printStackTrace();return result;}} }spring-data-solr客户端代码 RestController RequestMapping(group) AllArgsConstructor public class GroupSearchController {private final SolrTemplate solrTemplate;GetMapping(sellTopFive2)public MapString, Long sellTopFive2() {MapString, Long result new HashMap();// 设置分组条件Field field new SimpleField(product_name);SimpleQuery groupQuery new SimpleQuery(new SimpleStringCriteria(*:*)).setRows(100);GroupOptions groupOptions new GroupOptions().addGroupByField(field).setLimit(0);groupQuery.setGroupOptions(groupOptions);try {GroupPageOrders page solrTemplate.queryForGroupPage(orders, groupQuery, Orders.class);GroupResultOrders fieldGroup page.getGroupResult(field);ListGroupEntryOrders content fieldGroup.getGroupEntries().getContent();for (GroupEntryOrders ordersGroupEntry : content) {result.put(ordersGroupEntry.getGroupValue(), ordersGroupEntry.getResult().getTotalElements());}// 根据数量逆序排序截取前5return result.entrySet().stream().sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())).limit(5).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) - e1, LinkedHashMap::new));}catch (Exception e){e.printStackTrace();return result;}} }执行结果 2、各个标签的订单数思路此题与上述的区别就在于分组的字段labels是Nested类型但group分组不支持Nested字段的分组因此使用group是无法实现的我们将在Facet中讲解用法 1.2 facet 分组查询 1.2.1 简介 facet与group有些相近都是做分组查询的但是facet允许用户再对结果集进行二次处理也就是支持嵌套聚合也可以对分组数量进行排序、过滤等group会返回每个分组详细的数据列表docs而facet并不会返回每个分组的docs只是返回一个统计指标。facet与group可以结合使用。官方文档https://solr.apache.org/guide/8_2/faceting.html facet分组查询支持4大类型 facet.query: 自定义查询分组 facet.field按字段分组 facet.range: 范围查询分组 facet.date日期分组 1.2.2 参数 facet: 设置为true则开启facet分组查询facet.field以什么字段作为分组统计字段如下所示可以看到与group明显的区别是没有返回每个分组的docs了。 q*:*facettruefacet.fieldproduct_name与group.field一样也可以设置多个分组字段 q*:*facettruefacet.fieldproduct_namefacet.fieldstatusfacet.pivot多字段嵌套分组如上所示的分组是分割开单独分组的某些场景下我们需要嵌套分组基于前一个分组结果继续做分组这就需要用到facet.pivot 比如统计每种状态下各个商品的个数 q*:*facettruefacet.pivotstatus,product_namefacet.pivot.mincount嵌套分组显示最小数量有时我们希望显示的分组是具有一定数量的数量比较小的就不要显示了这就需要用到facet.pivot.mincount默认值为1 q*:*facettruefacet.pivotstatus,product_namefacet.pivot.mincount2facet.mincount分组最小数量与facet.pivot.mincount不同的是这个是控制facet.field产生的普通分组 facet.query: 根据查询条件来进行分类可以设置多个facet.query来实现自定义的分组统计 q*:*facettruefacet.querystatus:1facet.querystatus:2facet.prefix分组字段的值满足该前缀的才会被统计 q*:*facettruefacet.fieldproduct_namefacet.prefix小米facet.contains分组字段包含该值的才会被分组统计 q*:*facettruefacet.fieldproduct_namefacet.contains小米facet.contains.ignoreCase: 与facet.contains的区别就是不区分大小写facet.matches分组字段值满足该正则表达式的才会被分组统计未实际使用待验证 q*:*facettruefacet.fieldproduct_namefacet.matches米*facet.sort分组排序条件允许设置两个值count 按照每个桶的数量逆序排序、index按照各分组桶名字符排序默认为count q*:*facettruerows0facet.fieldproduct_namefacet.sortindexfacet.limit返回的桶数量默认100 q*:*facettruerows0facet.fieldproduct_namefacet.limit5facet.offset从第几个桶开始显示 q*:*facettruerows0facet.fieldproduct_namefacet.limit5facet.offset2facet.missing文档数据每一行数据中分组字段facet.field没有值的是否统计默认为falsefacet.method指定分组算法支持三种分组算法fc, enum, filterCache默认为fc。详细解释可见官方文档facet.threads分组查询创建的线程数最大值Integer.MAX_VALUE最小值0这时只会创建一个主线程范围查询分组 facet.range: 定义范围查询的字段facet.range.start范围查询的最小值facet.range.end范围查询的最大值facet.range.gap范围查询的步长每组间隔facet.range.hardend是否将facet.range.end最为最后一组的上限值为true/false默认为falsefalse时将会把最后一组上限设置为大于facet.range.end的最小可能上限。比如facet.range.end4文档中大于4的还有5、6、7则如果为false时会取5最为上限 q*:*facettruerows0facet.rangestatusfacet.range.start1facet.range.end5facet.range.gap3facet.range.hardendtrue也可以实现按日期月份分组的效果%2B表示URL中的 q*:*facettruerows0facet.rangecreate_timefacet.range.startNOW/MONTH-12MONTHfacet.range.endNOW/MONTHfacet.range.gap%2B1MONTHfacet.range.include指定每个区间中是否包含上下限 lower所有区间都包含其下限 upper所有区间都包含其上限 edge即使未指定相应的上限/下限选项第一个和最后一个间隙范围也包括其边缘边界第一个间隙范围较低最后一个间隙范围较高 outer第一个或最后一个区间包含其边界 all包含上述所有选项 facet.range.other其他区间统计规则值为before、after、between、none、all默认为none before对start之前的值做统计 after对end之后的值做统计 between对start至end之间所有值做统计如果hardend为true的话,那么该值就是各个时间段统计值的和 none表示该项禁用 all表示before,after,all都会统计如果指定了多个范围字段的话通过facet.field_id的形式区分 facet.rangepricefacet.rangeagefacet.rangelastModified_dt f.price.facet.range.end1000.0 f.age.facet.range.start99 f.lastModified_dt.facet.range.endNOW/DAY30DAYS指定间隔分组 facet.interval间隔统计字段facet.interval.set间隔统计指定区间 q*:*facettruerows0facet.intervalstatusfacet.interval.set(0,1]facet.interval.set[1,3]时间类型分组 facet.date该参数表示需要进行按时间分组的字段名,与facet.field一样,该参数可以设置多个facet.date.start起始时间facet.date.end结束时间facet.date.gap时间间隔.如果start为2023-01-01,end为2024-01-01gap设置为1MONTH则表示间隔1个月,那么将会把这段时间划分为12个间隔段facet.date.hardend 与facet.range.hardend类似 facet.date.other 其他区间统计规则与facet.range.other类似 1.2.3 案例 1、统计销量排行前5的商品 facet中默认就根据每组桶数逆序排序无需特殊指定如果需要根据桶名排序的修改facet.sortindex即可 q*:*facettruerows0facet.fieldproduct_namefacet.limit5SolrJ实现代码 RestController RequestMapping(facet) AllArgsConstructor public class FacetSearchController {private final HttpSolrClient solrClient;/*** 统计销量排行前5的商品* return*/GetMapping(sellTopFive)public MapString, Long sellTopFive(){MapString, Long result new HashMap();// 设置查询条件SolrQuery query new SolrQuery().setQuery(*:*).setRows(0);// 设置分组条件query.set(FacetParams.FACET, true).set(FacetParams.FACET_FIELD, product_name).set(FacetParams.FACET_LIMIT, 5);try {QueryResponse response solrClient.query(orders,query);FacetField facetFields response.getFacetField(product_name);for (FacetField.Count value : facetFields.getValues()) {result.put(value.getName(), value.getCount());}// 根据value排序, 若无需排序则可删除此段return result.entrySet().stream().sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) - e1, LinkedHashMap::new));} catch (SolrServerException | IOException e) {e.printStackTrace();}return result;} } spring-data-solr代码 GetMapping(sellTopFive2)public MapString, Long sellTopFive2() {MapString, Long result new HashMap();// 设置分组条件Field field new SimpleField(product_name);SimpleFacetQuery query new SimpleFacetQuery(new SimpleStringCriteria(*:*)).setRows(0);FacetOptions facetOptions new FacetOptions().addFacetOnField(field).setFacetLimit(5);query.setFacetOptions(facetOptions);try {FacetPageOrders page solrTemplate.queryForFacetPage(orders, query, Orders.class);PageFacetFieldEntry pageResult page.getFacetResultPage(product_name);ListFacetFieldEntry content pageResult.getContent();for (FacetFieldEntry facetFieldEntry : content) {result.put(facetFieldEntry.getValue(), facetFieldEntry.getValueCount());}// 根据数量逆序排序return result.entrySet().stream().sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())).limit(5).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) - e1, LinkedHashMap::new));}catch (Exception e){e.printStackTrace();return result;}}2、各个标签的订单数 facet.field支持Nested类型的字段直接查询即可这里标签分类没超过10个就未设置rows了 q*:*facettruerows0facet.fieldlabelsSolrJ代码 GetMapping(labelCount)public MapString, Long labelCount(){MapString, Long result new HashMap();// 设置查询条件SolrQuery query new SolrQuery().setQuery(*:*).setRows(0);// 设置分组条件query.set(FacetParams.FACET, true).set(FacetParams.FACET_FIELD, labels).set(FacetParams.FACET_LIMIT, 100);try {QueryResponse response solrClient.query(orders,query);FacetField facetFields response.getFacetField(labels);for (FacetField.Count value : facetFields.getValues()) {result.put(value.getName(), value.getCount());}} catch (SolrServerException | IOException e) {e.printStackTrace();}return result;}3、统计近一年内每月的畅销商品TOP5 思路我们要统计每月的商品中的TOP5很明显要根据月份商品进行分组这属于嵌套分组因此要使用到facet.pivot。我们有一个创建日期字段create_time。实现方式分为如下几种 1、创建一个冗余一个月份字段month用于此处的嵌套查询这种要修改schema_manage且要重新加载索引这里solr不支持类似es的动态字段还是不太方便此方案如果索引数据量较大重新加载索引影响线上使用可以考虑直接新建一个核心待同步完成直接将查询核心切换到新的核心然后删除旧核心 2、先按照年月份进行时间分组然后客户端代码中循环对有值的年份月最为query条件分别按商品进行分组得到每个年月的商品分组详情这种方式的弊端是网络IO较多如果查询对耗时有较高要求可能不能满足 3、如果数据本身是按照天汇总的及create_time格式是YYYY-MM-dd没有到秒或者一天的数据量并不大那么可以先按照facet.pivotcreate_time,product_name的形式按天把数据汇总出来然后在java代码把数据再按月份进行二次计算得出适用于本身按天数据量不大的场景近一年的话按天分桶也就365个相对还能接受 4、最友好的方式就是能有将日期转换为月份的函数类似month(create_time)然后facet.pivotmonth(create_time),product_name来实现统计但目前我还没有找到相关函数solr本身好像也不支持这种操作对于聚合上的支持还是和es有比较大的差距如果后续大家有更好的方式可以留言告知互相学习这里因为我的数据量并不大就直接采用方式3来实现了 q*:*facettruerows0facet.pivotcreate_time,product_namefacet.sortindexSolrJ客户端代码 GetMapping(productTopFiveMonthlyNearYear)public MapString, MapString, Integer productTopFiveMonthlyNearYear(){MapString, MapString, Integer result new TreeMap();// 设置查询条件SolrQuery query new SolrQuery().setQuery(*:*).setRows(0);// 设置分组条件query.set(FacetParams.FACET, true).set(FacetParams.FACET_PIVOT, create_time,product_name).set(FacetParams.FACET_SORT, FacetParams.FACET_SORT_INDEX);try {QueryResponse response solrClient.query(orders,query);ListPivotField pivotFields response.getFacetPivot().get(create_time,product_name);// 组装返回结果按年月分组for (PivotField field : pivotFields) {SimpleDateFormat format new SimpleDateFormat(yyyyMM);String month format.format(field.getValue());if(!result.containsKey(month)){MapString, Integer monthMap new LinkedHashMap();for (PivotField pivotField : field.getPivot()) {monthMap.put(pivotField.getValue().toString(), pivotField.getCount());}result.put(month, monthMap);}else{MapString, Integer monthMap result.get(month);for (PivotField pivotField : field.getPivot()) {String productName pivotField.getValue().toString();if(!monthMap.containsKey(productName)){monthMap.put(productName, pivotField.getCount());}else{// 重复的商品叠加销量monthMap.put(productName,monthMap.get(productName) pivotField.getCount());}}}}// 每月商品截取前5for (String month : result.keySet()) {MapString, Integer sortMap result.get(month).entrySet().stream().sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())).limit(5).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) - e1, LinkedHashMap::new));result.put(month, sortMap);}} catch (SolrServerException | IOException e) {e.printStackTrace();}return result;}1.3 拓展stats 查询当需要统计某字段的平均、最大、最小等统计值时可以结合stats来查询具体用法可查看官网文档官网文档https://solr.apache.org/guide/8_2/the-stats-component.html 2. 总结如上我们对于solr实现分组聚合查询的讲解就到此结束了可以看出group适合与简单的分组查询而facet则更加适合场景复杂的分组查询。具体选型还要根据大家的业务场景而定

查看全文

http://www.pierceye.com/news/110560/