Powerset - Humans versus Computers (Search Algorithms)
Recently, I answered a question that was posed on Linked In that asked:
Do you think computer search algorithms are more relevant than a ranking made by experts ? which one is the best ?
Everybody is currently working on new search algorithms (semantic search, new ways to rank contents, ...), but comparing to real life, when you need a real good advice (ie:website, answer, ...) on something, would it be better to ask your network of experts to find the best answer instead of asking a search engine ?
For example, using your linkedin network instead of searching on google/yahoo(except myweb2)/... alone.
You can click to see all of the answers here.
The answers were a very interesting read and I suggest that you take a look at all of the answers to get a taste of what people think.
Steve Newcomb's Answer: my answer is posted below:
There will be several important advancements that semantic search brings to the table.
How keyword based engines work:
When keyword search engines index the web they are capturing information about keywords, their proximity, anchor text and so on. When a user types in a query, the query engine is matching the keywords in that query to the keywords in the index and then ranking them based on pagerank, keyword proximity and so on. (I'm sure I'm telling anyone much new information here and of course its more complex than that, but this is the basics)
How semantic based engines work:
However, a semantic engine works differently. When a semantic search engine indexes the web, instead of indexing keywords alone, it indexes semantic relationships and captures things like what was the subject, the object and the relationships between words. At runtime, when a user enters a query, what is matched is the semantic meaning behind the query with the semantic information in the index.
The results of this approach will be fundamental advancements including the following to name a few:
- Understanding the intent of the query
- Understanding the meaning of web documents
- Matching the intent of the query to the meaning of the documents
- Being able to highlight the answer rather than the keywords
- Snippet augmentation to include metadata associated with the answer instead of the query
- Dossier type results pages that are created on the fly containing WikiPedia like information bases on the intent of the query
- On the fly results pages that completely change form based on the intent of the query (i.e. determining the type of query first then instantiating a different search engine and different results page based the intent)
To highlight what I am talking about, let's walk through what I mean by understanding the intent of the query and matching that against the intent of the words in web documents.
Take the following two queries "who did Peoplesoft acquire" and "who acquired Peoplesoft"
To a human these are two completely different queries because the subject and object relationships are inverted. Keyword based search engines can't do this relationship mapping, whereas semantic search engines can.
In addition, a semantic search engine can capture semantic inversions that create paraphrase matches at the sentence level instead of just synonym matches at the keyword level. For example a semantic engine would show a match for a website with a sentence "Peoplesoft sold itself to Oracle" for the query "Who bought Peoplesoft". A keyword search engine would see buy (in the query) and sell (in the web document) as unrelated, whereas a semantic search engine would see a paraphrase match at the sentence level and return the result.
The big shift that Semantic search engines will bring will be
- much higher precision as a result of matching meaning to meaning instead of keywords to keywords
- much higher recall because as a result of matching on equivalent meanings that are expressed in different ways
- better ranking where the pagerank and existing methods are combine with semantic data
- better and more natural mechanisms to create more complex and accurate queries
- better basis for surrounding the results with metadata based on the semantic content of the answer rather than the query
- better results pages that are created on the fly based on the intent of the query.
Hope this was helpful. I would enjoy seeing your comments as well.
Comments
Typo in the title of this blog post: replace Algorythms with Algorithms
Also, great info on how semantic search is better than keyword search, but you didn't actually answer the question of whether an algorithm (be it semantic or keyword based) can beat out human expert generated results. I would be very curious to hear your thoughts on this as there is a big push towards human powered search from the likes of Wikia (Jimmy Wales) and Mahalo (Jason Calcanis/Sequoia Capital).
Posted by: Jim | July 13, 2007 02:50 AM
Very interesting Post. Thanks!
Marcelo
Posted by: Marcelo | August 9, 2007 08:54 AM