Thursday, September 4, 2008

Should I use a view, a stored procedure?


A pretty common question is when to use views, stored procedures, and user-defined functions. More importantly, what are the differences between them, and what advantages do they each have that make them the best choice in certain scenarios? Well, as with many technologies, the answer really is: it depends. I know, you're groaning, because you hate that answer. But hopefully, by showing you the pros and cons of each element, you can decide for yourself which to use in *your* situation--and you can back that up with your own performance tests and whatever other criteria you want to measure.
Stored Procedures
If I can borrow from traditional programming languages like VB for a moment, a stored procedure is like a subroutine. It is used to perform tasks within the database, whether it be to INSERT, UPDATE, DELETE, SELECT, send return values, send output parameters, send e-mail, call command line arguments, encapsulate business logic, enforce data integrity, or any combination thereof. Here is a fictitious example:

CREATE PROCEDURE dbo.doStuff

@dt SMALLDATETIME

AS

BEGIN

SET NOCOUNT ON

DECLARE @un SYSNAME,

@now SMALLDATETIME

SET @un = SUSER_SNAME()

SET @now = GETDATE()

BEGIN TRANSACTION

INSERT dbo.myLog(ProcName, UserName, dt )
SELECT 'doStuff', @un, @now

DELETE dbo.myLog

WHERE dt < (@now-7)

UPDATE dbo.Users

SET LastActivity = @now

WHERE UserName = @un

COMMIT TRANSACTION

SELECT TOP 3 ProcName, dt

FROM MyLog

WHERE UserName = @un

ORDER BY dt DESC

DECLARE @subject VARCHAR(255)

SET @subject = @un + ' used the doStuff procedure.'

EXEC master..xp_smtp_sendmail @from = 'foo@bar.com',@to = 'bar@foo.com', @server = 'mail.myserver.com', @subject = @subject

DECLARE @cmd VARCHAR(255)

SET @cmd = 'del c:\users\'+@un+'\archive\*.log'

EXEC master..xp_cmdshell @cmd, NO_OUTPUT

RETURN 0

END


GO



Stored procedures accept parameters, and are the preferred method of both manipulating data and simply returning data. They are compiled when first run, and the query plans are stored and cached by SQL Server's optimizer, and those cached plans are swapped out depending on frequency of usage. Generally, a stored procedure will perform faster than an ad hoc query, but there are certain cases (e.g. when a bad plan is cached) that this is not the case.

It is, for the most part, difficult to share data between stored procedures within SQL Server. As an example, you cannot say:

SELECT * FROM ProcedureName


-- nor

SELECT * FROM a
INNER JOIN (EXEC StoredProcedureName)
ON ...

Erland Sommarskog has devoted a lengthy article to the treatment of this topic: How to share data between stored procedure....

Views


A view is like a table, but SQL Server does not store the data, only the definition. Views are typically used for two primary purposes: to simplify complex schemas and/or queries, and to implement security. With the exception of indexed and distributed partitioned views, they are *not* used to enhance performance! This is a very common myth, and I'm not sure where the sentiment comes from, but it is inaccurate.

By simplifying complex schemas or queries, I mean several things. One could simply be to return the *relevant* columns from a very wide table. You might have an inventory table with 200 columns, detailing a product's every last detail, from weight, size and color to origin country, manufacture date and shipping carrier. However, the database developer trying to write a report for the salespeople, trying to determine which product was their best seller last month, could use a view that only referenced the ProductID and ProductName columns (and join that against the Orders/OrderDetails tables).

Another is to simulate JOINs to reduce the complexity of other queries. For example, let's say you have the following query, that is used in several places:


SELECT
a.foo, b.bar, c.state
FROM a
INNER JOIN b
ON a.userID = b.userID
INNER JOIN c
ON b.userID = c.userID
AND c.state IN ('RI', 'MA')

Well, if you were to create a view like this:

SELECT
CREATE VIEW dbo.RIMA
AS
SELECT b.userID, b.bar, c.state
FROM b
INNER JOIN c
ON b.userID = c.userID
AND c.state IN ('RI', 'MA')
GO


Now your above query can look like this:


SELECT a.foo, b.bar, b.state
FROM a
INNER JOIN dbo.RIMA b
ON a.userID = b.userID

Essentially, the optimizer swaps out your reference to dbo.RIMA with the actual code from the view, and they can all be optimized together. And it sure is easier to look at.

Now, that's a pretty trivial example; I'm sure you can come up with much more convoluted and real examples where simply cutting down on the number of lines of code in a query can speed development time greatly. It can also be easier to explain to junior database developers that to get a result of x, you query the view m, instead of having to explain (or provide direct access to) the joins required against the main tables.

Which brings me to my point about security. Let's say you have a table called Employees, and it has columns like salary and SSN. Well, you probably don't want everyone with access to the database to be able to see those columns, so you can create a view called EmployeeInfo which neglects to select those columns, and deny access to the base table to the users (either by placing them in specific roles/groups and denying access that way, or issuing explicit DENY statements to each user). This way, users don't even have to know that there is a table with more information behind the scenes.


CREATE TABLE dbo.Employees
(
EmployeeID INT,
FirstName VARCHAR(32),
LastName VARCHAR(32),
Salary INT,
SSN CHAR(9)
)
GO

CREATE VIEW dbo.EmployeeInfo
AS
SELECT EmployeeID, FirstName, LastName
FROM dbo.Employees
GO


A view does not accept parameters, which is definitely a strange concept for users migrating from Microsoft Access. To restrict the rows returned by a view, you use a WHERE clause in the query that is accessing the view, not in the view itself. Table-valued functions, described further down, are more like parameterized views; in fact, they were called parameterized views during development, according to Hal Berenson.

Like a subquery or inline query, a view is not ordered. There is a kludgy workaround where users implement something like this:


CREATE VIEW dbo.Kludge
AS
SELECT TOP 100 PERCENT ...
FROM table
ORDER BY column


But this can cause problems and, AFAIK, is still not guaranteed to be in the order you define (even in this case, the order will be dictated by the outer query that selects from this view). If you want the results of your view to be in a specific order, then like the WHERE clause above, you add an ORDER BY in the query that is calling the view, not in the view itself.

I said earlier that views do not store the data, only a definition of the query. This is true except for the case of indexed views, in which the data IS materialized to disk and a query plan that uses the clustered index can be cached. Books Online has plenty of reading material on indexed views: Designing an Indexed View and Creating an Indexed View.

Closing Points - Is it? ....Yes it is






Benefits of Using Stored Procedure :

1. Improvement of Performance -- Because you / application not need to compile the procedure again and again
2. Reduced Network Congestion -- Application not need to submit the multile T-SQL Statement to server
3. Enhance Accuracy -- Less prone to errors
4. Better Security -- User can be granted / revoke permission

Benefit of Using Views :

1. Focus data for User
2. Hiding data complexity
3. Reduce Object Size






Finally....

Now, having said all that, there is a difference between client->server interaction, and interaction within SQL Server itself. All client->server interactions with SQL Server, whether they be from an application or an end user in Query Analyzer, should be through stored procedures. Processes outside of SQL Server should not have direct access to tables, views or functions. So my basic ground rule is, for anything client-facing, build a stored procedure. Whether that stored procedure accesses tables, views or UDFs behind the scenes, rests entirely on the points above that are relevant to your environment, and the testing you perform in order to yield which implementation works best.

Note: This article [blog post] being inherited and edited from several blogs and articles. I would like to say thanks to all the contributors and authors