I've CSV file (around 10,000 rows ; each row having 300 columns) stored on LINUX server. I want to break this CSV file into 500 CSV files of 20 records each. (Each having same CSV header as present in original CSV)
Is there any linux command to help this conversion?
I have a one-liner answer (this example gives you 999 lines of data and one header row per file)
cat bigFile.csv | parallel --header : --pipe -N999 'cat >file_{#}.csv'
Use the Linux split command:
split -l 20 file.txt new
Split the file "file.txt" into files beginning with the name "new" each containing 20 lines of text each.
Type man split
at the Unix prompt for more information. However you will have to first remove the header from file.txt (using the tail
command, for example) and then add it back on to each of the split files.
This should work !!!
file_name
= Name of the file you want to split.
10000
= Number of rows each split file would contain
file_part_
= Prefix of split file name (file_part_0,file_part_1,file_part_2..etc goes on)
split -d -l 10000 file_name.csv file_part_
This should do it for you - all your files will end up called Part1-Part500.
#!/bin/bash
FILENAME=10000.csv
HDR=$(head -1 $FILENAME) # Pick up CSV header line to apply to each file
split -l 20 $FILENAME xyz # Split the file into chunks of 20 lines each
n=1
for f in xyz* # Go through all newly created chunks
do
echo $HDR > Part${n} # Write out header to new file called "Part(n)"
cat $f >> Part${n} # Add in the 20 lines from the "split" command
rm $f # Remove temporary file
((n++)) # Increment name of output part
done
Source: Stackoverflow.com