广告

针对高科技领域的语义搜索--超越Google的搜索引擎?

2008-11-04 阅读:
一个称作illumin8的搜索工具加入到了语义搜索(semantic search)的队伍中,该搜索引擎能理解高科技公司研发人员输入的查询的含义。与Google只搜到关键词相关的搜索页面的规则不同,Elsevier公司的illumin8已经预定义了一个五十万个相关语义的辞典。

一个称作illumin8的搜索工具加入到了语义搜索(semantic search)的队伍中,该搜索引擎能理解高科技公司研发人员输入的查询的含义。与Google只搜到关键词相关的搜索页面的规则不同,Elsevier公司的illumin8已经预定义了一个五十万个相关语义的辞典,与查询相关语义的结果也能被搜索出来。

“我们认为基于关键词的搜索需要改进,这也是我们发布illumin8的一个原因,我们需要找出信息的含义,”Elsevier公司illumin8的产品经理Joe Buzzanga表示,“这真的是个基于自然语言处理技术的研究和开发的工具,我们已经为我们的核心用户提供服务,他们都是各个企业里的研发专业人士。”

新的搜索引擎不是免费的,但已经在网页上运作了,允许用户执行他们之前在Google、Yahoo或其他关键词搜索引擎上所做的类似的查询。不同的是当你点“搜索”按钮,不是立即出现搜出的相关度排名列表,illumin8会花几秒钟与它的语义数据库进行对比来决定你的查询的含义。Elsevier的crawler算法是经常的搜索60亿个页面、3百万篇科学和技术期刊文章、3千3百万科学报告的结果,和2千1百万个专利,这些结果被翻译到11亿个相关概念的语义解释上。

广告

在完成对你的相关查询的语义抽取后,立即可以在整个视窗屏幕显示出分栏的搜索结果,分类是根据组织、方法、益处、作者/发明者、公司和产品。每栏都有一系列搜索的信息,显示每个不同的项目的所有搜索结果。

将鼠标移到相关条目上会弹出一个窗口显示摘要,指出相关性并分类,还会拼出缩写词和别称。点击该条目就可以新出一个窗口转到搜索的结果。

搜索需要花费15秒钟来锁定语义数据库中的结果,而且需要近1分钟(取决于你得到信息的数量)来在概要页中进行组织。Elsevier表示目前正在为提高速度进行升级的工作,概要页面可以被约束为只显示网页条目、只显示期刊条目或专利条目,或自定义。例如,输入“半导体研发”,在概要页会出现5284个结果,包括公司、方法、人物、产品和其他相关结果 - 3869个项目是来自网页的。该查询出来的“公司”是由升序排列,包括了IBM、INTEL、英飞凌、意法半导体、三星、摩托罗拉、AMD、Toshiba、德州仪器等等。

使用illumin8需要注册,对各个组织的价格不同。目前已经有免费的语义搜索供读者尝试,虽然不提供概要页面,数据库也没有illumin8那么大。大家可以试试Hakia(http://www.hakia.com/,只搜索网页)或者Powerset(http://www.powerset.com/只搜索Wikipedia)。


{pagination}

Semantic search said to surpass Google

Serious research has a new tool called illumin8 that harnesses semantic searches, which understand the meaning of queries. Unlike the free Google search engine, which merely matches the words in a query against Web pages containing those keywords, Elsevier's illumin8 uses a thesaurus with a half-million pre-defined technology terms to associate semantics--the meaning of the phrases--with queries.

"We think that there can be improvements on keyword searches and that's one of the reasons we've introduced illumin8—to help find the meaning in information," said Joe Buzzanga, illumin8's product manager at Elsevier. "It's really a research and discovery tool, powered by natural language processing technology. We have tuned it for our core users, who are R&D professionals in corporations."

The new search engine is not free, but it does run on a Web page, permitting the user to enter queries very similar to those they are used to doing at Google, Yahoo or any of the keyword search engines. The difference comes when you press the search button. Rather than getting an immediate list of results ranked by popularity, illumin8 takes a few seconds to determine the meaning of your query by comparing it against its precompiled semantic database. Elsevier's crawler algorithm is constantly searching six billion Web pages, three million science and technical journal articles, 33 million reported scientific results, and 21 million patents, which it compiles into 1.1 billion semantic extractions of related concepts.

After determining the semantic extractions relevant to your query, it then presents a summary of its results organized in separate panes of a full-screen window, sorted by organization, approach, benefit, author/inventor, company and product. Each of these panes shows a list of items, each displaying how many individual records are associated with each item.

Moving a cursor over an item pops up a window describing it, ranking its relevance, categorizing its type, and spelling out any acronyms or aliases. Clicking on the item finally brings up a list of records each of which is summarized and has a direct link to view it in a separate window.

Searches take about 15 seconds to locate results in the semantic database and up to a minute to organize them on a summary page, depending on how many results you get; Elsevier says it is working on an upgrade to speed up the process. The summary panes can be restricted to only Web results, only journal results or only patent results, or some combination. For example, typing in "semiconductor R&D" returns 5284 results organized into panes for organizations, approaches, people, products and related results—3869 of which come from the Web.

Organizations for this query, listed in ascending order, included IBM, Intel, Infineon, STMicroelectronics, Samsung, Motorola, AMD, Toshiba, Texas Instruments.

In order to use illumin8 you need a subscription, which is priced individually for each organization. To test a free semantic search engine, albeit one that does not provide summary pages and does not access a large a database like illumin8, try Hakia (searches Web only) or Powerset (searches Wikipedia only).

您可能感兴趣的文章
相关推荐
广告
近期热点
广告
广告
可能感兴趣的话题
广告
广告
向右滑动:上一篇 向左滑动:下一篇 我知道了