Clustered indexes on large tables

To create a clustered index on a table that will fill more than 40 to 45% of the segment, and the input data file is not in order by clustered index key, these steps yield good data distribution, as long as the data that you copy in during step 6 contains a representative sample of the data.

Copy the data out.
Unpartition the table.

See “alter table...unpartition Syntax”.
Truncate the table.
Repartition the table.

See “alter table...partition syntax”.
Drop the clustered index and any nonclustered indexes. Depending on how your index was created, use either drop index.

See “Using drop index and create clustered index”) or alter table...drop constraint and “Using constraints and alter table”.
Use parallel bulk copy to copy in enough data to fill approximately 40% of the segment. This must be a representative sample of the values in the key column(s) of the clustered index.

Copying in 40% of the data is much more likely to yield good results than smaller amounts of data, you can perform this portion of the bulk copy can be performed in parallel; you must use nonparallel bcp for the second buld copy operation.

See “Using parallel bcp to copy data into partitions”.
Create the clustered index on the segment, do not use the sorted_data clause.
Use nonparallel bcp, in a single session, to copy in the rest of the data. The clustered index directs the rows to the correct partitions.
Use sp_helppartition to check the distribution of data pages on partitions and sp_helpsegment to check the distribution of pages on the segment.
Create any nonclustered indexes.
Dump the database.

One drawback of this method is that once the clustered index exists, the second bulk copy operation can cause page splitting on the data pages, taking slightly more room in the database. However, once the clustered index exists, and all the data is loaded, future maintenance activities can use simpler and faster methods.