Fast CSV Import in PowerShell to SQL Server
By Bill Graziano
on 18 March 2014
| Tags:
INSERT
,
bcp/Bulk Insert
I recently had to write some simple scripts for a proof of concept. These scripts use PowerShell to import and export CSV files from SQL Server. The scripts were simple to write and maintain but were slow. It turns out there are a few ways to improve CSV import and export performance in PowerShell.
Before I landed on PowerShell I considered a few other approaches to get this done.
- SSIS. This gave me all the performance and flexibility around CSV formats that I wanted. But this would have been their first SSIS package and much of the functionality would have been embedded in those SSIS packages. I was looking for something that was a more explicit to read.
- SQL Server BULK INSERT or BCP. This was more script-able but getting the format file right proved to be a challenge. The import file included quotes around the values but only if there was a comma inside the string.
Simple CSV Import using PowerShell
The dirt simplest way to import a CSV file into SQL Server using PowerShell looks like this:
$ConnectionString = "Data Source=L80\SQL2012; Database=tempdb; Trusted_Connection=True;";
. ".\Out-DataTable.ps1"
$csvDataTable = Import-CSV -Path ".\SimpleCsv.txt" | Out-DataTable
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = "CsvDataReader"
$bulkCopy.WriteToServer($csvDataTable)
The Import-CSV
command is handy and flexible. It handled headers rows and optional quotes around columns without problem. This does require the use of Chad Miller's excellent Out-Table script to convert the CSV to a DataTable. That's because the WriteToServer
method only accepts DataTables or DataReaders as data sources.
Unfortunately that means it has to pull the entire CSV into memory to build the DataTable. On a 100MB file I was seeing memory sizes over 500MB. It was also taking over 15 minutes to process the file.
Fast CSV Import using PowerShell
Processing this faster means getting the CSV into an IDataReader
. That streams the file through memory rather than completely loading it in. I found a few people had written either partial or full classes to handle that. None of them quite did what I needed though. So I coded up a simple DLL to expose a CSV as an IDataReader. Using that in PowerShell looks like this:
[System.Reflection.Assembly]::LoadFrom(".\CsvDataReader.dll") | Out-Null
$reader = New-Object SqlUtilities.CsvDataReader(".\SimpleCsv.txt")
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = "CsvDataReader"
$bulkCopy.WriteToServer($reader)
You can find the fast DataReader for CSV files over on GitHub. You can download just the DLL if you'd like. After you download the DLL don't forget to Unblock it in the File Properties dialog box. It also handles optional quotes around values. It's written to support bulk insert to SQL Server so not all the methods are implemented yet. The column names are currently case-sensitive if you need to do custom field mapping.
This reduced my 15 minute load on the 100MB to 45 seconds.
Simple CSV Export using PowerShell
I also needed to export some CSV files from SQL Server. The simplest PowerShell script to complete that is:
Import-Module “sqlps” -DisableNameChecking
invoke-sqlcmd -query "SELECT * FROM CsvImport" -database Test -serverinstance "L80\SQL2012" |
Export-CSV -Path "C:\SimpleCsvOut.txt" -NoTypeInformation }
The Export-CSV command certainly makes this simple. A few things to note about this though:
- This wasn't terribly slow. It exported 26,000 rows in about 45 seconds. But I thought it could be faster. It will also get worse the more rows you export.
- It did put quotes around every single value. Including headers.
- After I imported the SQLPS module I needed to hard-code all the path names. I'm sure there's way around that but I'd moved on after seeing the first two issues.
Faster CSV Export using PowerShell
Rewriting the export to stream a SqlDataReader to StreamWriter looked like this:
$streamWriter = New-Object System.IO.StreamWriter ".\SimpleCsvOut3.txt"
$sqlConn = New-Object System.Data.SqlClient.SqlConnection $ConnectionString
$sqlCmd = New-Object System.Data.SqlClient.SqlCommand
$sqlCmd.Connection = $sqlConn
$sqlCmd.CommandText = "SELECT * FROM Test.dbo.CsvImport"
$sqlConn.Open();
$reader = $sqlCmd.ExecuteReader();
# Initialze the arry the hold the values
$array = @()
for ( $i = 0 ; $i -lt $reader.FieldCount; $i++ )
{ $array += @($i) }
# Write Header
$streamWriter.Write($reader.GetName(0))
for ( $i = 1; $i -lt $reader.FieldCount; $i ++)
{ $streamWriter.Write($("," + $reader.GetName($i))) }
$streamWriter.WriteLine("") # Close the header line
while ($reader.Read())
{
# get the values;
$fieldCount = $reader.GetValues($array);
# add quotes if the values have a comma
for ($i = 0; $i -lt $array.Length; $i++)
{
if ($array[$i].ToString().Contains(","))
{
$array[$i] = '"' + $array[$i].ToString() + '"';
}
}
$newRow = [string]::Join(",", $array);
$streamWriter.WriteLine($newRow)
}
$reader.Close();
$sqlConn.Close();
$streamWriter.Close();
It certainly was a lot more code to write. Most of it is boilerplate and easy to modify when you copy it. The performance only decreased from 45 seconds to 25 seconds on 26,000 rows. If you eliminate the code to wrap quotes around commas it cuts the time from 25 seconds to 12 seconds. I also saw bigger differences on larger data sets.
In summary, PowerShell is a great way to handle CSV files in SQL Server for smaller files. And with a little tweaking it can handle much larger files too.