I was asked by one of my co-workers for some SQL help. He needed a SQL statement that would suppress repeated column values for the result set. Basically the value would be shown for the first row and blanked for each successful row that had the same value. Typically you would handle this in the application code, we had a case where we had to pass data to another application and we needed to do this within a single SQL select statement.
For example if we have the values:
username Category
-------------------- ----------
Brian cs
Tom cs
Joe cs
Allen cs
Bill ts
Steven ts
Fred ts
Ted ts
We would want to return this as the output
username Category
-------------------- ----------
Brian cs
Tom
Joe
Allen
Bill ts
Steven
Fred
Ted
Using the following table structure:
create table test(id int, cat varchar(10), username varchar(20))
We can make a query like
select t.username
,case
when t.id = (select top 1 id
from test t3
where t3.cat = t.cat
order by t3.cat, t3.username) then t.cat
else ''
end as Category
from test t
order by t.cat, t.username
What the case keyword is doing is a sub-select on the same table and uses top 1 to match on only the first row for each set of categories. If we match, we use the category value, otherwise we use an empty string value. This is not very inefficient, you are doing the sub-select for each row of the query. We needed to do this because the situation only allowed a single SQL statement to be executed. We were working with a small set of records and this executed without any delay.
If you can call a stored procedure or execute a batch of SQL, you can split this up and gain a performance increase for larger sets of data. Instead of doing the sub-select on each row, populate a table variable with the first row for each category. Then do a left join from the main table to the table variable. The combined SQL would look something like this:
declare @q table(cat varchar(10), username varchar(20))
insert into @q(cat, username)
select t.cat, MIN(t.username)
from test t
group by t.cat
select t.username, COALESCE(q.cat,'') as Category
from test t
left join @q q on t.cat = q.cat and t.username = q.username
order by t.cat, t.username
Another way to get this affect is to use a Common Table Expression (CTE) as part of the query. This would behave like the table variable, but you would have just a single select statement. This would be useful for reporting tools where you can only specify a single SQL statement to retrieve the data.
Using the above example data, the new select statement would look like this
with cte as
(
select cat, min(username) as username
from test
group by cat
)
select t.username, COALESCE(c.cat,'') as Category
from test t
left join cte c on t.username = c.username and c.cat = t.cat
order by t.cat, t.username
This query should be perform well (if not better) as the query with the table variable.
Cant we achive this functionality with out creating a table?
ReplyDeleteYes, you can use a CTE and I just updated the post to include an example.
Delete