Big surprise - SharePoint search, is built on top of SQL Server full text indexing.
Okay, it's a bit more than full text indexing, such as certain XML driven configuration for word stemming, and other things thrown on top such as authoritative pages (best bets), etc. But in no way is SharePoint (even MOSS) search, a comparison of google or live search.
But then it costs a lot lesser too. So it is quite useful, and compelling for what it is. But if you feel like spending $400K on an enterprise class search engine - knock yourself out.
Anyway, this post isn't about that. This post is about, you have a sharepoint site setup, and now you wish to customize search.
Boy, is this a common request or what!
Sure you can customize search by customizing the out of the box webparts, you could customize it using products such as ontolica. Heck you can even dive into the search results page, open it in sharepoint designer, and start twiddling with the XSLT, to control exactly what gets rendered, but still - that isn't quite taking things in your own hands.
Here is a peice of code, that I know you will find hella useful.
static void Main(string args)
using (SPSite site = new SPSite("http://moss2007"))
SPWeb web = site.OpenWeb();
FullTextSqlQuery query = new FullTextSqlQuery(site);
"Select Title, Rank, Path from portal..scope() where freetext('Test') AND Site='http://moss2007' ORDER BY Rank desc";
query.RowLimit = 100;
query.ResultTypes = ResultType.RelevantResults;
ResultTableCollection results = query.Execute();
ResultTable result = results[ResultType.RelevantResults];
Console.WriteLine(result.ToString() + ", " + result.ToString() + ", " + result.ToString());
Now the above code is interesting on many fronts -
- It clearly shows you that the query is nothing but a SQL query under the scenes. A full text sql query to be precise.
- You can clearly specify what columns to retrieve, and what critereon to put in the where clause.
- You can choose to specify the kind of results you want, and how many results you want.
- You can expose information such as "Rank". This is interesting, because as you do certain experiments on your search you will find various rules embedded inside the search algorithm such as,
- Higher size of the document = lower ranking
- More # of times the keyword matched in a document = higher ranking
- Deeper URL surf depth = lower ranking
- Authoritative pages in SP = higher ranking
- Default views seem to get ranked higher than individual item views
- File types seem to affect ranking (.doc > .txt for instance)
- Language seems to affect ranking, US-EN is always higher, even though your server installation may have french as it's default language <--- Surprised???
- The 100% control over rendering, from C#, is definitely valuable.
- Finally an insight into why the heck, a certain search result is below some other search result, and what you can do about it.
Now, I setup a site called "Test", and created an announcement with title "Test", and body "This is a test announcement", and here were my results -
Test, 998, http://moss2007
Test - Announcements, 923, http://moss2007/Lists/Announcements/AllItems.aspx
Test, 251, http://moss2007/Lists/Announcements/DispForm.aspx?ID=1
Wow, interesting!! As you can clearly see, the highest rank is for the site itself - Due to the shorter depth of the URL - makes complete sense!! The second result is the announcements list itself, specifically it's default view. The default view contains the word "Test", so it got ranked pretty high up, though URL length seems to trump the default view. Finally, the item itself, got ranked last, but did match the search results.
This seems to completely agree with my search results page, as shown below -