Quickest and most efficient ways to handle data

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2012 Forums

Transact-SQL (2012)

Quickest and most efficient ways to handle data

Author

Topic

rockstar283
Yak Posting Veteran

96 Posts

Posted - 2013-09-17 : 15:55:15

Hey everyone..I would like to start a discussion to understand what are the different quickest and most efficient ways to handle the following scenarios:

1) How to move trillion records

a) From one SQL Server table to another SQL Server table

b) From SQL Server table to Excel or CSV file

c) From Excel or csv file into SQL server table

2) How to count trillion records in SQL server

Please provide any details if you can..Thank you!!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2013-09-17 : 16:15:22

1) DELETE FROM dbo.Table1 OUTPUT deleted.*
a) DELETE FROM dbo.Table1 OUTPUT deleted.* INTO dbo.Table2
b) Excel cannot support 1 trillion rows.
c) OPENROWSET
2) SELECT COUNT(*)

Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-09-17 : 17:16:10

My answers.

1) SSIS
2) SSIS
3) SSIS (Excel cannot support 1 trillion rows).
4) SSIS
5)

SELECT SUM (row_count) 
FROM sys.dm_db_partition_stats 
where  (index_id=0 or index_id=1)
and object_name(object_id) = 'YourTableName'

sigmas
Posting Yak Master

172 Posts

Posted - 2013-09-17 : 17:22:43

quote:
Originally posted by James K

My answers.

1) SSIS
2) SSIS
3) SSIS (Excel cannot support 1 trillion rows).
4) SSIS
5)
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
where (index_id=0 or index_id=1)
and object_name(object_id) = 'YourTableName'

I agree with you for how counting the rows in a table by statistics.

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-09-17 : 17:26:28

quote:
Originally posted by sigmas

quote:
Originally posted by James K

My answers.

1) SSIS
2) SSIS
3) SSIS (Excel cannot support 1 trillion rows).
4) SSIS
5)
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
where (index_id=0 or index_id=1)
and object_name(object_id) = 'YourTableName'

I agree with you for how counting the rows in a table by statistics.

Jeez, sigmas, you are agreeing with me only on 20% of my answers. Does this mean you are diasgreeing with me on the other answers?

Some interesting reading: http://technet.microsoft.com/en-us/library/dd537533(v=sql.100).aspx

sigmas
Posting Yak Master

172 Posts

Posted - 2013-09-17 : 17:36:21

" Does this mean you are disagreeing with me on the other answers? "
No, I am not. I am not disagreeing with you on the others.
It means I have not any opinion about others. just this :)

rockstar283
Yak Posting Veteran

96 Posts

Posted - 2013-09-18 : 04:58:48

Answers I have found so far:

Quickest way to move the data:

1) SSIS
2) DELETE FROM dbo.Table1 OUTPUT deleted.* INTO dbo.Table2
3) SELECT * INTO dbo.Table2 from dbo.Table2 where 1=0

Quickest way to count the number of records:

1) SP_Spaceused
Displays the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database.

USE AdventureWorks2012;
GO
EXEC sp_spaceused N'Purchasing.Vendor';
GO

2) Count(*)

3) select SCHEMA_NAME(ST.schema_id)+'.'+ST.name TableName,
SP.rows RowCnt
from
sys.tables ST
inner join sys.partitions SP
on (ST.object_id = SP.object_id
and SP.index_id in (0,1))
order by
RowCnt desc

4) SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
where (index_id=0 or index_id=1)
and object_name(object_id) = 'YourTableName'

5) select Count_Big(*) from HumanResources.Employee

Thanks to all who helped me to formulate above answers

if anyone else has any other thoughts..please pitch in:-)

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2013-09-18 : 08:45:59

quote:
Originally posted by sigmas

quote:
Originally posted by James K

My answers.

1) SSIS
2) SSIS
3) SSIS (Excel cannot support 1 trillion rows).
4) SSIS
5)
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
where (index_id=0 or index_id=1)
and object_name(object_id) = 'YourTableName'

I agree with you for how counting the rows in a table by statistics.

I disagree with this if you want a consistent and accurate answer.

- If it's not transactionally accurate/consistent then it's not the proper count and therefore irrelevant (or at least suspect).
- If it is accurate, then COUNT(*) would surely be implemented in terms of that anyway. If so, then count(*) would therefore be better because it's going to give you the correct answer in a standard way.

If you want a "it might have had something like this many rows at some point in time, maybe and I don't care if I freak out future developers" then go for it.

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-09-18 : 10:39:49

quote:
Originally posted by LoztInSpace

quote:
Originally posted by sigmas

quote:
Originally posted by James K

My answers.

1) SSIS
2) SSIS
3) SSIS (Excel cannot support 1 trillion rows).
4) SSIS
5)
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
where (index_id=0 or index_id=1)
and object_name(object_id) = 'YourTableName'

I agree with you for how counting the rows in a table by statistics.

I disagree with this if you want a consistent and accurate answer.

- If it's not transactionally accurate/consistent then it's not the proper count and therefore irrelevant (or at least suspect).
- If it is accurate, then COUNT(*) would surely be implemented in terms of that anyway. If so, then count(*) would therefore be better because it's going to give you the correct answer in a standard way.

If you want a "it might have had something like this many rows at some point in time, maybe and I don't care if I freak out future developers" then go for it.

Can you show an example of where sys.dm_db_partition_stats gives the "wrong" results?

I am aware of the scenario Kalen Delaney describes in this article. http://sqlblog.com/blogs/kalen_delaney/archive/2009/12/07/how-many-rows.aspx

However, one thing that she does not mention is that if you were to do a count(*) within the SAME transaction as where you are inserting the rows, you will see that the count includes the uncommitted rows - see the example below:

CREATE TABLE xyz (id INT);
INSERT INTO xyz VALUES (1),(2);

-- should return 2
SELECT COUNT(*) FROM xyz;

BEGIN TRAN
INSERT INTO xyz VALUES (3);

-- should return 3
SELECT COUNT(*) FROM xyz;

ROLLBACK;

-- should return 2
SELECT COUNT(*) FROM xyz;

DROP TABLE xyz;

So what is accurate sort of depends on your view of the world and where you are looking at it from, doesn't it? If developers get "freaked out" over that, some education is in order, I guess.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources