special.synonym.html 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  3. <head>
  4. <meta http-equiv="Content-Type" content="text/html; charset=gbk" />
  5. <meta name="language" content="zh-cn" />
  6. <link rel="stylesheet" type="text/css" href="../api/css/style.css" />
  7. <link rel="stylesheet" type="text/css" href="../api/css/guide.css" />
  8. <link rel="stylesheet" type="text/css" href="../api/css/highlight.css" />
  9. <title>同义词搜索</title>
  10. </head>
  11. <body>
  12. <div id="apiPage">
  13. <div id="apiHeader">
  14. <a href="http://www.xunsearch.com" target="_blank">Xunsearch PHP-SDK</a> v1.3.2 权威指南
  15. </div><!-- end of header -->
  16. <div id="content" class="markdown">
  17. <div class="toc"><ol><li><a href="#ch0">什么是同义词搜索</a></li><li><a href="#ch1">英文同义词的特殊处理</a></li><li><a href="#ch2">管理同义词库</a></li><li><a href="#ch3">查看已有同义词库</a></li><li><a href="#ch4">使用同义词搜索功能</a></li></ol></div><h1 id="-">同义词搜索</h1>
  18. <p>人类语言文化丰富,同一个意思有无数种不同的表达方式。因此,<strong>同义词</strong>
  19. 是全文检索中非常重要和必要的一项基础功能。从 <code>1.3.0</code> 版本起,<code>Xunsearch</code>
  20. 开始提供同义搜索搜索功能。</p>
  21. <h2 id="ch0">1. 什么是同义词搜索<a name="ch0" class="anchor">?</a></h2>
  22. <p>为了更好的提升用户搜索体验,我们希望用户在搜索“浙江大学”的时候也同时匹配包含缩写“浙大”的结果,
  23. 这就是所谓的同义词搜索。通常我们说“浙大”是“浙江大学”的同义词,前者是常用缩写,后者是标准词(原词)。</p>
  24. <p><strong>同义词搜索</strong> 完全依赖于您事先设置好的同义词库,词库内每条记录由“标准词(原词)”和“同义词”组成,
  25. 它们都必须是独立的词汇,也就是最小的索引单位,不可以是多个词组成的短语。</p>
  26. <blockquote class="tip">
  27. <p><strong>Tip:</strong> 独立词汇的意思对于英文来说就是一个单词,对于中文来说必须是 <code>scws</code> 词库中的一个词。<br />
  28. <em>Xunsearch</em> 的同义词不同于 <em>Xapian</em>,会智能进行字段匹配和转换,您只需维护通用词库。</p>
  29. </blockquote>
  30. <h2 id="ch1">2. 英文同义词的特殊处理<a name="ch1" class="anchor">?</a></h2>
  31. <p>英文单词的同义词在 xunsearch 中做了一些个特殊处理:</p>
  32. <ul>
  33. <li><p><strong>统一小写</strong> 统一转换为小写字母进行存储,不支持必须保留大写字母的同义词。</p></li>
  34. <li><p><strong>同根词支持</strong> 英语同一个单词有多种不同的形式,如:单复数、过去式和现在分词等。
  35. 而它们要传送的其实是同一种涵义。因此,对于纯英文的同义词记录会自动进行词根处理。
  36. 词根记录保存为大写字母 Z 开头的记录。例如:</p>
  37. <p>设置 <code>find</code> 是 <em>search</em> 的同义词,那么检索 <em>searching</em> 时如果 <em>searching</em>
  38. 本身没有同义词,那么也会匹配包含 <code>finding</code> 或 <code>finds</code> 等同根词的结果。</p></li>
  39. <li><p><strong>复合词支持</strong> 支持对多个单词组成的短语设置同义词,词之间用一个半角空格分开,
  40. 但同义词仍然只允许一个单词或中文词哦。如:</p>
  41. <p>设置 <code>您好</code> 为 <em>hello world</em> 的同义词,那么检索 <em>hello world ok</em>
  42. 时也会匹配同时包含 <code>您好</code> 和 <code>ok</code> 的记录。</p></li>
  43. </ul>
  44. <h2 id="ch2">3. 管理同义词库<a name="ch2" class="anchor">?</a></h2>
  45. <p>您可以通过 <a href="../api/XSIndex.html#addSynonym">XSIndex::addSynonym</a> 和 <a href="../api/XSIndex.html#delSynonym">XSIndex::delSynonym</a> 来添加和删除同义词记录。
  46. 多次调用这两个接口函数的时候支持使用<a href="index.buffer.html">索引缓冲区</a>来提高效率。</p>
  47. <p>参数 <code>$raw</code> 是字符串,表示记录中的原词(标准词),<code>$synonym</code> 也是字符串,表示记录中的同义词,
  48. 对于删除操作可以省略该参数表示清空原词的所有同义词。</p>
  49. <p>以下为示例代码,其中的 <code>$index</code> 变量是索引操作对像,参见<a href="index.overview#ch1">如何开始使用索引?</a></p>
  50. <div class="hl-code"><div class="php-hl-main"><pre><span class="php-hl-comment">//</span><span class="php-hl-comment"> 给 &quot;搜索&quot; 增加 &quot;检索&quot; &quot;查找&quot; 两个同义词</span>
  51. <span class="php-hl-var">$index</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">addSynonym</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">搜索</span><span class="php-hl-quotes">'</span><span class="php-hl-code">, </span><span class="php-hl-quotes">'</span><span class="php-hl-string">检索</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  52. </span><span class="php-hl-var">$index</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">addSynonym</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">搜索</span><span class="php-hl-quotes">'</span><span class="php-hl-code">, </span><span class="php-hl-quotes">'</span><span class="php-hl-string">查找</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  53. </span><span class="php-hl-comment">//</span><span class="php-hl-comment"> 给 &quot;Hello world&quot; 增加同义词 &quot;你好&quot;</span>
  54. <span class="php-hl-var">$index</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">addSynonym</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">hello world</span><span class="php-hl-quotes">'</span><span class="php-hl-code">, </span><span class="php-hl-quotes">'</span><span class="php-hl-string">你好</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  55. </span><span class="php-hl-comment">//</span><span class="php-hl-comment"> 删除 search 的全部同义词</span>
  56. <span class="php-hl-var">$index</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">delSynonym</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">search</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  57. </span><span class="php-hl-comment">//</span><span class="php-hl-comment"> 删除 &quot;搜索&quot; 的同义词 &quot;检索&quot;</span>
  58. <span class="php-hl-var">$index</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">delSynonym</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">搜索</span><span class="php-hl-quotes">'</span><span class="php-hl-code">, </span><span class="php-hl-quotes">'</span><span class="php-hl-string">检索</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;</span></pre></div></div>
  59. <p>除了调用 <code>API</code> 外,您还可以使用 <a href="util.indexer#ch8">Indexer 索引管理工具</a> 进行同义词管理。</p>
  60. <h2 id="ch3">4. 查看已有同义词库<a name="ch3" class="anchor">?</a></h2>
  61. <p>通过 <a href="../api/XSSearch.html#getAllSynonyms">XSSearch::getAllSynonyms</a> 来获取当前索引库内的全部同义词记录,返回一个数组,
  62. 以原词(标准词)为键,其同义词列表为值。</p>
  63. <p>以下为示例代码,其中的 <code>$search</code> 变量是搜索操作对像,参见<a href="search.overview#ch1">如何开始使用搜索?</a></p>
  64. <div class="hl-code"><div class="php-hl-main"><pre><span class="php-hl-comment">//</span><span class="php-hl-comment"> 获取当前库的前 100 个同义词记录</span>
  65. <span class="php-hl-var">$synonyms</span><span class="php-hl-code"> = </span><span class="php-hl-var">$search</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">getAllSynonyms</span><span class="php-hl-brackets">(</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  66. </span><span class="php-hl-comment">//</span><span class="php-hl-comment"> 获取当前库第 6~15 条同义词记录</span>
  67. <span class="php-hl-var">$synonyms</span><span class="php-hl-code"> = </span><span class="php-hl-var">$search</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">getAllSynonyms</span><span class="php-hl-brackets">(</span><span class="php-hl-number">10</span><span class="php-hl-code">, </span><span class="php-hl-number">5</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  68. </span><span class="php-hl-comment">//</span><span class="php-hl-comment"> 查看包含隐藏同义词根在内的前 20 条记录</span>
  69. <span class="php-hl-var">$synonyms</span><span class="php-hl-code"> = </span><span class="php-hl-var">$search</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">getAllSynonyms</span><span class="php-hl-brackets">(</span><span class="php-hl-number">20</span><span class="php-hl-code">, </span><span class="php-hl-number">0</span><span class="php-hl-code">, </span><span class="php-hl-reserved">true</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;</span></pre></div></div>
  70. <p>除了调用 <code>API</code> 外,您还可以使用 <a href="util.quest#ch2">Quest 搜索工具</a> 进行同义词管理。</p>
  71. <h2 id="ch4">5. 使用同义词搜索功能<a name="ch4" class="anchor">?</a></h2>
  72. <p>同义词搜索默认是不开启的,如果您打算使用同义词搜索,这部分功能隶属于<a href="search.query.html">构建搜索语句</a>。</p>
  73. <p>在设置查询语句 <a href="../api/XSSearch.html#setQuery">XSSearch::setQuery</a> 之前调用 <a href="../api/XSSearch.html#setAutoSynonyms">XSSearch::setAutoSynonyms</a> 来开启同义词功能。</p>
  74. <div class="hl-code"><div class="php-hl-main"><pre><span class="php-hl-comment">//</span><span class="php-hl-comment"> 假设”搜索“有且仅有一个同义词”检索“</span>
  75. <span class="php-hl-comment">//</span><span class="php-hl-comment"> 开启同义词搜索,输出:</span>
  76. <span class="php-hl-comment">//</span><span class="php-hl-comment"> Xapian::Query(((搜索:(pos=1) SYNONYM 检索:(pos=89)) AND 世界:(pos=2)))</span>
  77. <span class="php-hl-var">$search</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">setAutoSynonyms</span><span class="php-hl-brackets">(</span><span class="php-hl-brackets">)</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">setQuery</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">搜索世界</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">getQuery</span><span class="php-hl-brackets">(</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  78. </span><span class="php-hl-comment">//</span><span class="php-hl-comment"> 开启同义词搜索,并带有字段效果,假设 subject 是项目的第二个字段,输出:</span>
  79. <span class="php-hl-comment">//</span><span class="php-hl-comment"> Xapian::Query(((B搜索:(pos=1) SYNONYM B检索:(pos=89)) AND B世界:(pos=2)))</span>
  80. <span class="php-hl-var">$search</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">setAutoSynonyms</span><span class="php-hl-brackets">(</span><span class="php-hl-brackets">)</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">setQuery</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">subject:搜索世界</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">getQuery</span><span class="php-hl-brackets">(</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;
  81. </span><span class="php-hl-comment">//</span><span class="php-hl-comment"> 关闭同义词搜索,输出:</span>
  82. <span class="php-hl-comment">//</span><span class="php-hl-comment"> Xapian::Query((搜索:(pos=1) AND 世界:(pos=2)))</span>
  83. <span class="php-hl-var">$search</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">setAutoSynonyms</span><span class="php-hl-brackets">(</span><span class="php-hl-reserved">false</span><span class="php-hl-brackets">)</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">setQuery</span><span class="php-hl-brackets">(</span><span class="php-hl-quotes">'</span><span class="php-hl-string">搜索世界</span><span class="php-hl-quotes">'</span><span class="php-hl-brackets">)</span><span class="php-hl-code">-&gt;</span><span class="php-hl-identifier">getQuery</span><span class="php-hl-brackets">(</span><span class="php-hl-brackets">)</span><span class="php-hl-code">;</span></pre></div></div>
  84. <blockquote class="tip">
  85. <p><strong>Tip:</strong> 您可以分别在开启/关闭同义词功能的条件下,对比 <a href="../api/XSSearch.html#getQuery">XSSearch::getQuery</a> 查询语句分析结果。</p>
  86. </blockquote>
  87. <div class="revision">$Id$</div>
  88. <div class="clear"></div>
  89. </div><!-- end of content -->
  90. <div id="guideNav">
  91. <div class="prev"><a href="util.logger.html">&laquo; Logger 搜索日志管理</a></div>
  92. <div class="next"><a href="special.scws.html">在SDK中使用SCWS分词 &raquo;</a></div>
  93. <div class="clear"></div>
  94. </div><!-- end of nav -->
  95. <div id="apiFooter">
  96. Copyright &copy; 2008-2011 by <a href="http://www.xunsearch.com" target="_blank">杭州云圣网络科技有限公司</a><br/>
  97. All Rights Reserved.<br/>
  98. </div><!-- end of footer -->
  99. </div><!-- end of page -->
  100. <div style="display:none;">
  101. <img src="../api/css/info.gif" />
  102. <img src="../api/css/tip.gif" />
  103. <img src="../api/css/note.gif" />
  104. </div>
  105. </body>
  106. </html>