当前位置: 首页 > news >正文

大型公司网站建设目标部署wordpress最应该用什么os

大型公司网站建设目标,部署wordpress最应该用什么os,长沙官网优化多少钱,网络规划设计师论文背别人的行么几周前#xff0c;我写了一篇博客文章#xff0c;介绍了如何使用scikit-learn在HIMYM成绩单上运行TF / IDF#xff0c;以按情节找到最重要的短语#xff0c;然后我很好奇在Neo4j中很难做到。 我首先将Wikipedia的TF / IDF示例之一翻译为cypher#xff0c;以查看该算法的外… 几周前我写了一篇博客文章介绍了如何使用scikit-learn在HIMYM成绩单上运行TF / IDF以按情节找到最重要的短语然后我很好奇在Neo4j中很难做到。 我首先将Wikipedia的TF / IDF示例之一翻译为cypher以查看该算法的外观 WITH 3 as termFrequency, 2 AS numberOfDocuments, 1 as numberOfDocumentsWithTerm WITH termFrequency, log10(numberOfDocuments / numberOfDocumentsWithTerm) AS inverseDocumentFrequency return termFrequency * inverseDocumentFrequency0.9030899869919435 接下来我需要检查HIMYM情节成绩单并提取每个情节中的短语及其对应的计数。 我使用scikit-learn的CountVectorizer进行了此操作并将结果写入了CSV文件。 这是该文件的预览 $ head -n 10 data/import/words_scikit.csv EpisodeId,Phrase,Count 1,2005,1 1,2005 seven,1 1,2005 seven just,1 1,2030,3 1,2030 kids,1 1,2030 kids intently,1 1,2030 narrator,1 1,2030 narrator kids,1 1,2030 son,1 现在使用LOAD CSV工具将其导入Neo4j // phrases USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM file:///Users/markneedham/projects/neo4j-himym/data/import/words_scikit.csv AS row MERGE (phrase:Phrase {value: row.Phrase});// episode - phrase USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM file:///Users/markneedham/projects/neo4j-himym/data/import/words_scikit.csv AS row MATCH (phrase:Phrase {value: row.Phrase}) MATCH (episode:Episode {id: TOINT(row.EpisodeId)}) MERGE (episode)-[:CONTAINED_PHRASE {times:TOINT(row.Count)}]-(phrase); 现在所有数据都可以转换为TF / IDF查询以利用我们的图表。 我们将从第1集开始 match (e:Episode) WITH COUNT(e) AS numberOfDocuments match (p:Phrase)-[r:CONTAINED_PHRASE]-(e:Episode {id: 1}) WITH numberOfDocuments, p, r.times AS termFrequency MATCH (p)-[:CONTAINED_PHRASE]-(otherEpisode) WITH p, COUNT(otherEpisode) AS numberOfDocumentsWithTerm, numberOfDocuments, termFrequency WITH p, numberOfDocumentsWithTerm, log10(numberOfDocuments / numberOfDocumentsWithTerm) AS inverseDocumentFrequency, termFrequency, numberOfDocuments RETURN p.value, termFrequency, numberOfDocumentsWithTerm, inverseDocumentFrequency, termFrequency * inverseDocumentFrequency AS score ORDER BY score DESC LIMIT 10 -----------------------------------------------------------------------------------| p.value | termFrequency | numberOfDocumentsWithTerm | inverseDocumentFrequency | score |-----------------------------------------------------------------------------------| olives | 18 | 2 | 2.0170333392987803 | 36.306600107378046 || yasmine | 13 | 1 | 2.3180633349627615 | 30.1348233545159 || signal | 11 | 5 | 1.6127838567197355 | 17.740622423917088 || goanna | 10 | 4 | 1.7160033436347992 | 17.16003343634799 || flashback date | 6 | 1 | 2.3180633349627615 | 13.908380009776568 || scene | 17 | 37 | 0.6989700043360189 | 11.88249007371232 || flashback date robin | 5 | 1 | 2.3180633349627615 | 11.590316674813808 || ted yasmine | 5 | 1 | 2.3180633349627615 | 11.590316674813808 || smurf pen1s | 5 | 2 | 2.0170333392987803 | 10.085166696493902 || eye patch | 5 | 2 | 2.0170333392987803 | 10.085166696493902 |-----------------------------------------------------------------------------------10 rows 我们计算出的分数不同于scikit-learn的分数但是相对顺序似乎不错所以很好。 在Neo4j中计算这一点的整洁之处在于我们现在可以更改等式的“逆文档”部分例如找出一个季节而不是一个情节中最重要的短语 match (:Season) WITH COUNT(*) AS numberOfDocuments match (p:Phrase)-[r:CONTAINED_PHRASE]-(:Episode)-[:IN_SEASON]-(s:Season {number: 1}) WITH p, SUM(r.times) AS termFrequency, numberOfDocuments MATCH (p)-[:CONTAINED_PHRASE]-(otherEpisode)-[:IN_SEASON]-(s:Season) WITH p, COUNT(DISTINCT s) AS numberOfDocumentsWithTerm, termFrequency, numberOfDocuments WITH p, numberOfDocumentsWithTerm, log10(numberOfDocuments / numberOfDocumentsWithTerm) AS inverseDocumentFrequency, termFrequency, numberOfDocuments RETURN p.value, termFrequency, numberOfDocumentsWithTerm, inverseDocumentFrequency, termFrequency * inverseDocumentFrequency AS score ORDER BY score DESC LIMIT 10 -----------------------------------------------------------------------------------| p.value | termFrequency | numberOfDocumentsWithTerm | inverseDocumentFrequency | score |-----------------------------------------------------------------------------------| moby | 46 | 1 | 0.9542425094393249 | 43.895155434208945 || int | 71 | 3 | 0.47712125471966244 | 33.87560908509603 || ellen | 53 | 2 | 0.6020599913279624 | 31.909179540382006 || claudia | 104 | 4 | 0.3010299956639812 | 31.307119549054043 || ericksen | 59 | 3 | 0.47712125471966244 | 28.150154028460083 || party number | 29 | 1 | 0.9542425094393249 | 27.67303277374042 || subtitle | 27 | 1 | 0.9542425094393249 | 25.76454775486177 || vo | 47 | 3 | 0.47712125471966244 | 22.424698971824135 || ted vo | 47 | 3 | 0.47712125471966244 | 22.424698971824135 || future ted vo | 45 | 3 | 0.47712125471966244 | 21.47045646238481 |-----------------------------------------------------------------------------------10 rows 从该查询中我们了解到“ Moby”在整个系列中仅被提及一次实际上所有提及都在同一集中 。 “ int”的出现似乎更多是数据问题–在某些情节中成绩单描述了位置但在许多情节中却没有 $ ack -iw int data/import/sentences.csv 2361,8,1,8,INT. LIVING ROOM, YEAR 2030 2377,8,1,8,INT. CHINESE RESTAURANT 2395,8,1,8,INT. APARTMENT 2412,8,1,8,INT. APARTMENT 2419,8,1,8,INT. BAR 2472,8,1,8,INT. APARTMENT 2489,8,1,8,INT. BAR 2495,8,1,8,INT. APARTMENT 2506,8,1,8,INT. BAR 2584,8,1,8,INT. APARTMENT 2629,8,1,8,INT. RESTAURANT 2654,8,1,8,INT. APARTMENT 2682,8,1,8,INT. RESTAURANT 2689,8,1,8,(Robin gets up and leaves restaurant) INT. HOSPITAL WAITING AREA “ vo”代表语音应该在停用词中删除它因为它不会带来太多价值。 之所以显示在这里是因为这些笔录在表示Future Ted说话时的方式不一致。 让我们看一下最后一个赛季看看票价如何 match (:Season) WITH COUNT(*) AS numberOfDocuments match (p:Phrase)-[r:CONTAINED_PHRASE]-(:Episode)-[:IN_SEASON]-(s:Season {number: 9}) WITH p, SUM(r.times) AS termFrequency, numberOfDocuments MATCH (p)-[:CONTAINED_PHRASE]-(otherEpisode:Episode)-[:IN_SEASON]-(s:Season) WITH p, COUNT(DISTINCT s) AS numberOfDocumentsWithTerm, termFrequency, numberOfDocuments WITH p, numberOfDocumentsWithTerm, log10(numberOfDocuments / numberOfDocumentsWithTerm) AS inverseDocumentFrequency, termFrequency, numberOfDocuments RETURN p.value, termFrequency, numberOfDocumentsWithTerm, inverseDocumentFrequency, termFrequency * inverseDocumentFrequency AS score ORDER BY score DESC LIMIT 10 -----------------------------------------------------------------------------------| p.value | termFrequency | numberOfDocumentsWithTerm | inverseDocumentFrequency | score |-----------------------------------------------------------------------------------| ring bear | 28 | 1 | 0.9542425094393249 | 26.718790264301095 || click options | 26 | 1 | 0.9542425094393249 | 24.810305245422448 || thank linus | 26 | 1 | 0.9542425094393249 | 24.810305245422448 || vow | 39 | 2 | 0.6020599913279624 | 23.480339661790534 || just click | 24 | 1 | 0.9542425094393249 | 22.901820226543798 || rehearsal dinner | 23 | 1 | 0.9542425094393249 | 21.947577717104473 || linus | 36 | 2 | 0.6020599913279624 | 21.674159687806647 || just click options | 22 | 1 | 0.9542425094393249 | 20.993335207665147 || locket | 32 | 2 | 0.6020599913279624 | 19.265919722494797 || cassie | 19 | 1 | 0.9542425094393249 | 18.13060767934717 |----------------------------------------------------------------------------------- BarneyRobin的婚礼有几个特定的​​短语“誓言”“圆环熊”“排练晚宴”因此将这些放在首位是有道理的。 这里的“ linus”主要是指酒吧中与Lily进行交互的服务器尽管对笔录进行了快速搜索后发现她还有一个Linus叔叔 $ ack -iw linus data/import/sentences.csv | head -n 5 18649,61,3,17,Lily: Why dont we just call Duluth Mental Hospital and say my Uncle Linus can live with us? 59822,185,9,1,Linus. 59826,185,9,1,Are you my guy, Linus? 59832,185,9,1,Thank you Linus. 59985,185,9,1,Thank you, Linus. ... 通过执行此练习我认为TF / IDF是探索非结构化数据的一种有趣方式但是对于一个对我们来说真的很有趣的短语它应该出现在多个情节/季节中。 实现该目标的一种方法是对这些功能进行更多加权因此我将在下一步进行尝试。 如果您想看看并加以改进则本文中的所有代码都位于github上 。 翻译自: https://www.javacodegeeks.com/2015/03/neo4j-tfidf-and-variants-with-cypher.html
http://www.pierceye.com/news/745228/

相关文章:

  • 漫画网站建设教程网站描述怎么设置
  • 网站左侧树形导航怎么做农村网站做移动
  • 建立企业网站方案php做简单网站教程
  • 一个网站交互怎么做的银行营销活动方案
  • 网站读取速度慢58同城二手房出售
  • 个人备案 网站名称 例子wordpress怎样下载
  • 郑州网络营销网站定制做网站服务
  • 学校网站英文怎么做souq网站
  • 原油可以取什么做标题发网站免费建网站哪个好
  • 莱特币做空 网站一个虚拟主机可以放几个网站
  • 注册个体可以做网站吗太平洋建设集团有限公司
  • 餐饮环境评估在哪个网站做天元建设集团有限公司法人
  • 汽车租赁网站开发网站建设实验周志与总结
  • 有没有哪个做美食的网站wordpress多用户模版
  • 网站关键词几个justnews wordpress
  • 创维爱内购网站2017网站设计尺寸
  • 网站建设的一般过程包括哪些方面手机免费制作app的软件下载
  • dw对网站建设有哪些作用深圳团购网站设计多少钱
  • 阿里巴巴网站详情页怎么做龙岩公司做网站
  • 网站后台这么做视频教程佛山百度seo排名
  • 网站建立吸引人的策划活动适合做推广的平台
  • 无锡市住房和城乡建设局网站昆明做网站的网络公司
  • 家居seo整站优化方案已经有了网站源代码怎样搭建
  • 红河州网站建设对网站做维护
  • 网站建设5000费用预算wordpress 鼠标点击特效
  • 江门企业自助建站系统做网站诱导网站
  • 知识问答网站开发南昌建站
  • 开发网站需要租服务器网易企业邮箱如何申请注册
  • 青岛市黄岛区城市建设局网站南京建设监理协会网站
  • 网站设计要点做网站的是什么职业