SharePoint Search ranking rules and running it programatically

Posted on 2/4/2008 @ 12:23 AM in #SharePoint by | Feedback | 7966 views

Big surprise - SharePoint search, is built on top of SQL Server full text indexing.

Okay, it's a bit more than full text indexing, such as certain XML driven configuration for word stemming, and other things thrown on top such as authoritative pages (best bets), etc. But in no way is SharePoint (even MOSS) search, a comparison of google or live search.

But then it costs a lot lesser too. So it is quite useful, and compelling for what it is. But if you feel like spending $400K on an enterprise class search engine - knock yourself out.

Anyway, this post isn't about that. This post is about, you have a sharepoint site setup, and now you wish to customize search.

Boy, is this a common request or what!

Sure you can customize search by customizing the out of the box webparts, you could customize it using products such as ontolica. Heck you can even dive into the search results page, open it in sharepoint designer, and start twiddling with the XSLT, to control exactly what gets rendered, but still - that isn't quite taking things in your own hands.

Here is a peice of code, that I know you will find hella useful.

using System;

using System.Collections.Generic;

using System.Text;

using Microsoft.SharePoint;

using Microsoft.Office.Server.Search.Query;

 

namespace ConsoleApplication1

{

    class Program

    {

        static void Main(string[] args)

        {

            using (SPSite site = new SPSite("http://moss2007"))

            {

                SPWeb web = site.OpenWeb();

                FullTextSqlQuery query = new FullTextSqlQuery(site);

 

                query.QueryText =

                    "Select Title, Rank, Path from portal..scope() where freetext('Test') AND Site='http://moss2007' ORDER BY Rank desc";

                query.RowLimit = 100;

                query.ResultTypes = ResultType.RelevantResults;

 

                ResultTableCollection results = query.Execute();

                ResultTable result = results[ResultType.RelevantResults];

 

                while (result.Read())

                {

                    Console.WriteLine(result[0].ToString() + ", " + result[1].ToString() + ", " + result[2].ToString());

                }

            }

        }

    }

}

Now the above code is interesting on many fronts -

  • It clearly shows you that the query is nothing but a SQL query under the scenes. A full text sql query to be precise.
  • You can clearly specify what columns to retrieve, and what critereon to put in the where clause.
  • You can choose to specify the kind of results you want, and how many results you want.
  • You can expose information such as "Rank". This is interesting, because as you do certain experiments on your search you will find various rules embedded inside the search algorithm such as,
    • Higher size of the document = lower ranking
    • More # of times the keyword matched in a document = higher ranking
    • Deeper URL surf depth = lower ranking
    • Authoritative pages in SP = higher ranking
    • Default views seem to get ranked higher than individual item views
    • File types seem to affect ranking (.doc > .txt for instance)
    • Language seems to affect ranking, US-EN is always higher, even though your server installation may have french as it's default language <--- Surprised???
  • The 100% control over rendering, from C#, is definitely valuable.
  • Finally an insight into why the heck, a certain search result is below some other search result, and what you can do about it.

Now, I setup a site called "Test", and created an announcement with title "Test", and body "This is a test announcement", and here were my results -

Test, 998, http://moss2007
Test - Announcements, 923, http://moss2007/Lists/Announcements/AllItems.aspx
Test, 251, http://moss2007/Lists/Announcements/DispForm.aspx?ID=1

Wow, interesting!! As you can clearly see, the highest rank is for the site itself - Due to the shorter depth of the URL - makes complete sense!! The second result is the announcements list itself, specifically it's default view. The default view contains the word "Test", so it got ranked pretty high up, though URL length seems to trump the default view. Finally, the item itself, got ranked last, but did match the search results.

This seems to completely agree with my search results page, as shown below -

Sound off but keep it civil:

Older comments..


On 4/2/2008 5:13:23 AM katia said ..
Great but is this code useful for Sharepoint windows services 3.0 site?


Thank


On 4/24/2008 2:03:33 PM Baly said ..
Gr8 post !! I am having a requirement where User will rank the document (which are stored in docuument library) . The rating can be anything between 1 to 5. This rating I gonna store in a custom column in the same document library. So far so good !!! But the pain start now.... I need to display the search results based on the keyword typed in the search box (which is fine) however the result should be displayed in the order in which they are rated. i.e the document with the rating 4.5 shoud come first and the doc with rating 3.2 should come second.


Any idea how to achieve this???


Need your valuable help!!!!


On 7/11/2008 11:37:43 AM Ani said ..
Can you please let me know how do conert into xslt format? I am new in Sharepoint programming, I will be thankful if you convert your code with XSL flavor?


On 10/10/2008 12:52:06 PM chetali said ..
HI ...great post..


please let me know if we can know the search algorithm


On 2/17/2009 2:01:01 AM Ali Raza said ..
How about access mapping?, if there is url build like www.tttt.com than the result will be shown for main UrL, how can we trackle this, plz guide me