当前位置：首页 > news >正文

WordPress不关站备案插件深圳相册制作公司

news 2025/11/11 15:24:18

WordPress不关站备案插件,深圳相册制作公司,网站建设第二年费用,网站规划的特点缓存对于提高搜索引擎的吞吐量#xff0c;降低CPU占用率极为重要。Lucene/Solr在这块做了很多的工作。Lucene/Solr中默认提供了5种缓存#xff0c;同时solr还提供扩展缓存接口#xff0c;允许开发者自定义缓存。1 缓存的基本原理Solr实现了两种策略的缓存#xff1a;LRU(Le… 缓存对于提高搜索引擎的吞吐量降低CPU占用率极为重要。Lucene/Solr在这块做了很多的工作。Lucene/Solr中默认提供了5种缓存同时solr还提供扩展缓存接口允许开发者自定义缓存。1 缓存的基本原理Solr实现了两种策略的缓存LRU(Leatest Recently Used)和LFU(Least Frequently Used)。这两种策略也用于操作系统的内存管理(页面置换)。当然缓存还有其它的策略比如FIFO、Rand等。无论是基于什么样的策略在应用中命中率高且实现简单的策略才是好策略。1.1 LRU策略LRU又称最近最少使用。假如缓存的容量为10那么把缓存中的对象按访问(插入)的时间先后排序当容量不足时***时间最早的。(当然真正的实现是通过链表维护时间先后顺序)1.1.1 LRUCacheSolr中LRUCache是通过LinkedHashMap来实现的。通过LRUCache的init方法就可以发现其代码如下 map new LinkedHashMapK,V(initialSize, 0.75f, true) {Overrideprotected boolean removeEldestEntry(Map.Entry eldest) {if (size() limit) {// increment evictions regardless of state.// this doesnt need to be synchronized because it will// only be called in the context of a higher level synchronized block.evictions;stats.evictions.incrementAndGet();return true;}return false;}};需要注意的是其构造参数的最后一个accessOrder。这里accessOrdertrue表明map.get()方法会改变链表的结构如果accessOrder为false则map.get()方法不对改变LinkedHashMap中链表的结构就无法体现最近最小使用这个特点了。由于LRUCache其本质是LinkedHashMap,而HashMap不是线程安全的所以就需要在get和put时进行同步锁住整个map所以在高并发条件下其性能会有所影响。因此Solr用另外一种方式实现了LRUCache即FastLRUCache。1.1.2 FastLRUCacheFastLRUCache内部采用了ConcurrentLRUCache实现而ConcurrentLRUCache内部又采用ConcurrentHashMap实现所以是线程安全的。缓存通过CacheEntry中的访问标记lastAccessed来维护CacheEntry被访问的先后顺序。即每当Cache有get或者put操作则当前CacheEntry的lastAccessed都会变成最大的(state.accessCounter)。当FastLRUCache容量已满时通过markAndSweep方式来剔除缓存中lastAccessed最小的N个项以保证缓存的大小达到一个acceptable的值。markAndSweep分两个阶段执行第一阶段收回最近最少使用的项如果经过第一阶段缓存的大小依然大于acceptable那么第二阶段将会开始。第二阶段会更加严格地把缓存的大小降下来。在第一阶段一个数轴就可以把运行原理解释清楚。对应代码如下(见ConcurrentLRUCache.markAndSweep方法) // since the wantToKeep group is likely to be bigger than wantToRemove, check it firstif (thisEntry newestEntry - wantToKeep) {// this entry is guaranteed not to be in the bottom// group, so do nothing.numKept;newOldestEntry Math.min(thisEntry, newOldestEntry);} else if (thisEntry oldestEntry wantToRemove) { // entry in bottom group?// this entry is guaranteed to be in the bottom group// so immediately remove it from the map.evictEntry(ce.key);numRemoved;} else {// This entry *could* be in the bottom group.// Collect these entries to avoid another full pass... this is wasted// effort if enough entries are normally removed in this first pass.// An alternate impl could make a full second pass.if (eSize eset.length-1) {eset[eSize] ce;newNewestEntry Math.max(thisEntry, newNewestEntry);newOldestEntry Math.min(thisEntry, newOldestEntry);}}} 看代码可知第一阶段会按相同的逻辑运行两次。一般来说经过第一阶段缓存的大小应该控制下来了。如果依然控制不下来那么就把上图中的待定Entry直接扔到指定大小的优先队列中。最后把优先队列中的Entry全部***。这样就能够保证缓存的Size降下来。其实如果一开始就直接上优先队列代码会少很多。但是程序的性能会降低好多。通过分析可以看到如果缓存中put操作频繁很容易触发markAndSweep方法的执行。而markAndSweep操作比较耗时。所以这部分的操作可以通过设置newThreadForCleanuptrue来优化。即新开一个线程执行。这样就不会阻塞put方法。在solrconfig.xml中配置是这样的cleanupThreadtrue。Cache在构造的时候就会开启一个线程。通过线程的wait/nofity来控制markAndSweep。从而避免了newThreadForCleanuptrue这样的不停开线程的开销总而言之缓存是通过markAndSweep来控制容量。 1.2 LFU策略LFU策略即【最近最少使用】策略。当缓存已满时设定时间段内使用次数最少的缓存将被剔除出去。通过前面的描述容易看出LFU策略实现时必须有一个计数器来记录Cache的Entry被访问的次数。Solr也正是这么干的。(看CacheEntry结构) private static class CacheEntryK, V implements ComparableCacheEntryK, V {K key;V value;volatile AtomicLong hits new AtomicLong(0);long hitsCopy 0;volatile long lastAccessed 0;long lastAccessedCopy 0;public CacheEntry(K key, V value, long lastAccessed) {this.key key;this.value value;this.lastAccessed lastAccessed;}很清楚地看到CacheEntry用hits 来记录访问次数。lastAccessed 存在则是为了应付控制缓存容量时如果在待***队列中出现hits相同的CacheEntry那么***lastAccessed 较小的一个。hitsCopy 和lastAccessedCopy的存在则是基于性能的考虑。避免多线程时内存跨越内存栅栏。 LFUCache通过ConcurrentLFUCache来实现而ConcurrentLFUCache内部又是ConcurrentHashMap。我们关注的重点放在ConcurrentLFUCache。ConcurrentLFUCache对容量的控制依然是markAndSweep,我猜想这是为了在代码可读性上与ConcurrentLRUCache保持一致。相对ConcurrentLRUCache的markAndSweep实现而言ConcurrentLFUCache的markAndSweep就比较简单了。用一个TreeSet来维护待***队列。TreeSet排序则是基于hits 和lastAccessed 。(可参看CacheEntry的comparTo方法)markAndSweep方法的核心代码如下TreeSetCacheEntry tree new TreeSetCacheEntry();for (CacheEntryK, V ce : map.values()) {// set hitsCopy to avoid later Atomic readsce.hitsCopy ce.hits.get();ce.lastAccessedCopy ce.lastAccessed;if (timeDecay) {ce.hits.set(ce.hitsCopy 1);}if (tree.size() wantToRemove) {tree.add(ce);} else {// If the hits are not equal, we can remove before adding// which is slightly fasterif (ce.hitsCopy tree.first().hitsCopy) {tree.remove(tree.first());tree.add(ce);} else if (ce.hitsCopy tree.first().hitsCopy) {tree.add(ce);tree.remove(tree.first());}}}for (CacheEntryK, V e : tree) {evictEntry(e.key);}Solr实现了LFUCache却没有再来一个FastLFUCache。因为LFUCache的实现用的是ConcurrentHashMap。能够很好的支持并发。如果非要来一个FastLFUCache那么就得用上非阻塞数据结构了。 2 缓存在Solr的中应用前面已经提到过Solr实现了各种层次的缓存。缓存由SolrIndexSearcher集中控制。分别应用在query、fact等查询相关的操作上。2.1 filterCachefilterCache在SolrIndexSearcher的定义如下SolrCacheQuery,DocSet filterCache; filterCache的key是Query,value是DocSet对象。而DocSet的基本功能就是过滤。filter在英语中的解释是过滤器。那么哪些地方有可能用到过滤功能呢?filterCache在solr中的应用包含以下场景1、查询参数facet.methodenum2、如果solrconfig.xml中配置useFilterForSortedQuery/ 为true3、查询参数含Facet.query或者group.query4、查询参数含fq 2.2 fieldvalueCachefieldValueCache在SolrIdexSearcher的定义如下SolrCacheString,UnInvertedField fieldValueCache; 其中key代表FieldNamevalue是一种数据结构UnInvertedField。 fieldValueCache在solr中只用于multivalued Field。一般用到它的就是facet操作。关于这个缓存需要注意的是如果没有在solrconfig.xml中配置那么它是默认存在的(初始大小10最大10000不会autowarm) 会有内存溢出的隐患。由于该cache的key为FieldName而一般一个solrCore中的字段最多也不过几百。在这么多字段中multivalued 字段会更少会用到facet操作的则少之又少。所以该在solrconfig.xml中的配置不必过大大了也是浪费。该缓存存储排序好的docIds一般是topN。这个缓存占用内存会比filterCache 小。因为它存储的是topN。但是如果QueryCommand中带有filter(DocSet类型)那么该缓存不会起作用。原因是DocSet在执行hashcode和equals方法时比较耗时。2.4 documentCache该缓存映射docId-Document。没有什么值得多说的。 2.5 自定义缓存如果solr中实现的缓存不满足需求。那么可以在SolrConfig.xml中自定义缓存。 cache namecclasssolr.FastLRUCachesize4096initialSize1024autowarmCount1024regeneratorcom.mycompany.cache.CacheRegenerator/需要写代码的地方就是 regeneratorcom.mycompany.cache.CacheRegenerator这里了。Regenerator在SolrIndexSearcher执行warm方法时会被调用。假如solr的索引2分钟更新一次为了保证更新的索引能够被搜索到那么就需要重新打开一个SolrIndexSearcher这时候就有一个问题SolrIndexSearcher里面的缓存怎么办如果把旧的缓存全部抛弃那么搜索的性能势必下降。Solr的做法是通过warm方法来预热缓存。即把通过原有缓存里面的Key值重新获取一次value。warm完毕后再切换到新的Searcher。regenrator里面的regenerateItem方法就是用来更新缓存。关注一下regenerateItem的参数 public boolean regenerateItem(SolrIndexSearcher newSearcher, SolrCache newCache, SolrCache oldCache, Object oldKey, Object oldVal) throws IOException;有SolrIndexSearcher,有oldCache,有oldKey有oldVal想查询结果很容易就能得到了。这样做的话已经***到Solr内部了不推荐。如果以后想要升级的话可能得重新改代码。升级维护不太方便。2.6 fieldCache我们知道lucene保存了正向索引(docId--field)和反向索引(field--docId)。反向索引是搜索的核心检索速度很快。但是如果我们需要快速由docId得到Field信息(比如按照某个字段排序字段值的信息统计solr facet功能)由于需要磁盘读取速度会比较慢。因此Lucene实现了fieldCache。Lucene实现了各种类型Field的缓存Byte,Short,Int,Float,Long……fieldCache是Lucene内部的缓存主要用于缓存Lucene搜索结果排序比如按时间排序等。由于fieldCache内部利用数组来存储数据(可以参看FieldCacheImpl源码)而且数组的大小开的都是maxDoc所以当数据量较大时fieldCache是相当消耗内存的所以很容易出现内存溢出问题。 fieldCache使用的样例可可参看如下的源代码。package com.vancl.cache;import java.io.IOException;import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.IntField; import org.apache.lucene.document.StringField; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.TopFieldCollector; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.Version;public class TestFieldCache {Directory d new RAMDirectory();Analyzer analyzer new WhitespaceAnalyzer(Version.LUCENE_42);IndexWriterConfig conf null;IndexWriter iw null;public void index() throws IOException{conf new IndexWriterConfig(Version.LUCENE_42,analyzer);iw new IndexWriter(d, conf);Document doc null;int[] ids {1,5,3,2,4,8,6,7,9,10};String[] addTimes{2012-12-12 12:12:12,2012-12-12 12:12:13,2012-12-12 12:12:14,2012-12-12 12:12:15,2012-12-12 12:12:11,2012-12-12 12:12:10,2012-12-12 12:12:09,2012-12-12 12:12:08,2012-12-12 12:12:07,2012-12-12 12:12:06} ;for(int i1;i10;i){docnew Document();doc.add(new StringField(addTime,addTimes[i-1], Store.YES));doc.add(new IntField(id,ids[i-1], Store.YES));iw.addDocument(doc);}iw.commit();iw.close();}public void query() throws IOException{IndexReader ir DirectoryReader.open(d);IndexSearcher is new IndexSearcher(ir);//按addTime逆序排序//Sort sort new Sort(new SortField(addTime, SortField.Type.STRING,true));Sort sort new Sort(new SortField(addTime, SortField.Type.STRING,true));//按id逆序排序//Sort sort new Sort(new SortField(id, SortField.Type.INT,true));TopFieldCollector collector TopFieldCollector.create(sort, 5, false, false, false, false);is.search(new MatchAllDocsQuery(),collector);TopDocs top collector.topDocs();for (ScoreDoc doc : top.scoreDocs) {// System.out.println(ir.document(doc.doc).get(id));System.out.println(ir.document(doc.doc).get(addTime));}}public static void main(String[] args) throws IOException {TestFieldCache c new TestFieldCache();c.index();c.query();} } 转载于:https://blog.51cto.com/sbp810050504/1421546

查看全文

http://www.pierceye.com/news/810886/