SharePoint 2010 Sandboxed Solutions: Monitoring, Management and Deployment

Posted on 12/7/2009 @ 4:32 PM in #SharePoint by | Feedback | 4376 views

In this blog post, I will dissect every aspect of sandbox solutions as they apply to SharePoint 2010. 
The below will turn into links as newer blog posts are published.

Some of the content below uses excerpts from my book on SharePoint 2010.

________________________________________________________________

Table of Contents:

    1. The definitive guide (back to table of contents).
    2. The basics of a sandbox solution
    3. Sandbox solution architecture and restrictions
    4. Sandbox solution monitoring, management and deployment <--- you are here
    5. Sandbox solution validations
    6. Sandbox solutions full trust proxies

________________________________________________________________

As you can see, sandbox solutions are inherently secure because they are restricted on what they can, and cannot do. Which means, I can be confident that they won't format my c:\ drive, and that's a good thing! But, what if someone put the following code block in their sandbox solution?

while (true) { i++;}

The above will chew up CPU resources of your high end server processors! But it's not a security violation is it? Neither is it part of the restricted API!Thus, it is important that Sandbox solutions also be monitored! Monitored and punished if they do something naughty, like the above! Monitoring goes a step beyond punishment. A good traffic system is not what is designed around ambulances around each intersection. A good traffic system is designed around placing good traffic lights so those ambulances are not necessary.

Thus, step #1 around monitoring is allowing the site collection administrator, or the application owners to get some visibility into how good or bad their sandbox solutions are behaving. Thus if you visit the solution gallery in your site collection, at the top of your solution gallery, you will see how much resource quota has been allocated to you, and how much if it you have been using lately, and how much of it have you used today! This can be seen in the figure below.

 

Okay, so this is great, but if I was a site collection owner, I'd have some obvious questions here!

  1. Who decided that my site collection should get 300 server resources, or resource points?
  2. Is 300 points a lot? What does it mean? What constitutes a resource point? Can I ask for more points?
  3. I see that my solutions will be temporarily disabled if I exceed my allocated quota. Will I get a warning if I exceed a certain amount of resource usage, so I can plan before I am completely shut down?
  4. What exactly does temporary disabled mean? How long is temporary?

The answers to all the above questions are quite simple! The farm administrator decided that your site collection should have 300 resource points! If you go to central administration, and visit Application Management\Configure quotas and locks, you will see there is a section called "User Solutions Resource Quota". This can be seen as shown in the figure below:

And that also answers my second question, about sending me a warning email when I use a certain number of points in a day. Also, this indirectly answers my fourth question about "How long is temporary?". Temporary is approximately a day.

What happens is, your entire site collection has been allocated a certain number of points to be shared across all sandbox solutions. If one of those solutions ends up using all of those points, all of those sandbox solutions are shut down, for about a day!

But what does using points mean? In fact, let me ask it this way - "What exactly does using 300 points mean?". How did someone come up with this number of 300?

Earlier I was talking abotu the various restrictions placed on your sandbox solutions! Namely CAS policies, and a restricted subset of the API. Also, I referred to the fact that excessive CPU usage is perhaps a metric that you should monitor. All these metrics that should be monitored, are given some weightages, and they collectively contribute to the 300 resource points limit!

Specifically, there are 14 metrics that are monitored by SharePoint. To view the metrics, open powershell and execute the following command:

[Microsoft.SharePoint.Administration.SPUserCodeService]::Local.ResourceMeasures

A quick sidenote: If your SharePoint powershell commands are not running, you may need to load the SharePoint powershell snap in first. You can do so by executing this command on powershell.

Add-PSSnapin Microsoft.SharePoint.Powershell

You will see all the various resource monitoring metrics print out. Also, there is an interesting property on each one of them, namely "ResourcesPerPoint". Specifically, the metrics (also known as ResourceMeasure) and ResourcesPerPoint you will see are as below:

  1. AbnormalProcessTerminationCount: 1
  2. CPUExecutionTime: 3600
  3. CriticalExceptionCount: 3600
  4. InvocationCount: 100
  5. PercentProcessorTime: 85
  6. ProcessCPUCycles: 100000000000
  7. ProcessHandleCount:10000
  8. ProcessIOBytes: 10000000
  9. ProcessThreadCount: 10000
  10. ProcessVirtualBytes: 100000000
  11. SharePointDatabaseQueryCount: 20
  12. SharePointDatabaseQueryTime: 120
  13. UnhandledExceptionCount: 50
  14. UnresponsiveprocessCount: 2

In other words, if you have a single AbnormalProcessTermination, you consume one resource point, and so on so forth. These resource points are customizable using the SharePoint object model. But my suggestion would be to try out what you have out of the box, and then see if you need further tweaking!

The ResourcesPerPoint contributes to the additive resource points calculation. However, if there was a solution that did something really naughty! Like tried to format your C:\, should we wait to keep letting it try, until it hits an additive count? Obviously not!! You need to have the capability of shutting a bad solution immediately. There is another property on the ResourceMeasure, called "AbsoluteLimit". The solution is terminated immediately, even if the daily usage limit hasn't been reached yet. As an example, AbnormalProcessTerminationCount's AbsoluteLimit is set to "1". So, a solution that causes an AbnormalProcessTermination is immediately shut down! The additive count is bumped up, so the solution can try again .. but it will be immediately shut down again! And if the solution keeps causing problems, chances are the farm administrator will notice it.

And if there is such an awful solution that causes nothing but headaches, the farm administrator can go into Central Administration\System Settings\Manage User Solutions, and add a solution to the "Blocked Solutions" list. This can be seen in the figure below:

So now the infrastructure ogre farm administrator can force the developer to fix their code before it'll run again! How nice!

But all of this seems a bit "after the fact". The approach here is to make sure we have enough ambulances, once an accident occurs. If a solution has been naughty, we can throttle it! etc. Can we do anything when the solution is uploaded into the site collection? Perhaps validate it!?

Sound off but keep it civil:

Older comments..


On 12/8/2009 8:06:54 PM Peter said ..
Thanks, these articles are enlightening. Something I'd like to ask for a future post: why is any tolerance granted for illegal operations? Why not permanently disable that version of the loaded UserSolution?


On 12/8/2009 9:15:38 PM Sahil Malik said ..
Hey Peter - Thanks glad you like 'em :), y'know feel free to tell your friends if you found this useful.

So about the permanently disable thing - well, there are ways to permanently disable, but they require human involvement, i.e. blocked solutions. Why not block permanently automatically? Well, I am guessing for MSFT here, but, it would be kinda hard to come up with an algorithm that does that reliably with no false positives I'm guessing!!

So in short, I don't know LOL, but I think the architecture is quite good, and gives everyone enough flexibility.

THE ONLY complaint I have about sandbox solutions IMO is -


a) Full Trust Proxies should also have had a limited scope. Right now they are not suitable for multi-tenant.


b) Dude this is 2010 - we shouldn't be doing postbacks and full trust proxies. Why not just write your WCF service? Why even BOTHER with Full Trust Proxy? Personally, I don't see me writing a lot of FT Proxies .. I'll be writing services and consuming them over Silverlight and AJAX. w00t!

S


On 12/27/2009 11:29:06 AM Jomit said ..
Thanks Sahil,


I was just scratching my head on this whole points calculation for the Sandbox solutions before watching your DNR-tv episode on SharePoint2010, which redirected me here.


Just wanted to ask one questions, can we create our own resource monitoring metrics?


On 12/27/2009 2:39:06 PM Sahil Malik said ..
Jomit - nope. :-)


On 3/7/2012 1:55:21 PM Tej said ..
Hello Shahil,


I have requirement to build solution in Sandbox solution which include


1) BCS


2) Email


3) Web Service


4) InfoPath with code behind


5) SQL ODBC Connection

I know, It's hack of code of proxy & need to cover BCS using Web Services/WCF.


I am wonder that above feature can be build in Sandbox ? & How ?