Thursday, April 29, 2010

Suppressing a repeated column value in SQL

I was asked by one of my co-workers for some SQL help. He needed a SQL statement that would suppress repeated column values for the result set. Basically the value would be shown for the first row and blanked for each successful row that had the same value. Typically you would handle this in the application code, we had a case where we had to pass data to another application and we needed to do this within a single SQL select statement.

For example if we have the values:

username             Category
-------------------- ----------
Brian cs
Tom cs
Joe cs
Allen cs
Bill ts
Steven ts
Fred ts
Ted ts

We would want to return this as the output

username             Category
-------------------- ----------
Brian cs
Bill ts

Using the following table structure:

create table test(id int, cat varchar(10), username varchar(20))

We can make a query like

select t.username
  when = (select top 1 id
from test t3
where =
order by, t3.username) then
  else ''
end as Category
from test t
order by, t.username

What the case keyword is doing is a sub-select on the same table and uses top 1 to match on only the first row for each set of categories.  If we match, we use the category value, otherwise we use an empty string value.  This is not very inefficient, you are doing the sub-select for each row of the query.  We needed to do this because the situation only allowed a single SQL statement to be executed.  We were working with a small set of records and this executed without any delay.

If you can call a stored procedure or execute a batch of SQL, you can split this up and gain a performance increase for larger sets of data.  Instead of doing the sub-select on each row, populate a table variable with the first row for each category.  Then do a left join from the main table to the table variable.  The combined SQL would look something like this:

declare @q table(cat varchar(10), username varchar(20))
insert into @q(cat, username)
select, MIN(t.username)
from test t
group by

select t.username, COALESCE(,'') as Category
from test t
left join @q q on = and t.username = q.username
order by, t.username

Another way to get this affect is to use a Common Table Expression (CTE) as part of the query.  This would behave like the table variable, but you would have just a single select statement.  This would be useful for reporting tools where you can only specify a single SQL statement to retrieve the data.

Using the above example data, the new select statement would look like this

with cte as
select cat, min(username) as username
from test
group by cat
select t.username, COALESCE(,'') as Category
from test t
left join cte c on t.username = c.username and =
order by, t.username

This query should be perform well (if not better) as the query with the table variable.


  1. Cant we achive this functionality with out creating a table?

    1. Yes, you can use a CTE and I just updated the post to include an example.


Note: Only a member of this blog may post a comment.