[linux] How to split CSV files as per number of rows specified?

I've CSV file (around 10,000 rows ; each row having 300 columns) stored on LINUX server. I want to break this CSV file into 500 CSV files of 20 records each. (Each having same CSV header as present in original CSV)

Is there any linux command to help this conversion?

This question is related to linux unix csv split

The answer is


I have a one-liner answer (this example gives you 999 lines of data and one header row per file)

cat bigFile.csv | parallel --header : --pipe -N999 'cat >file_{#}.csv'

https://stackoverflow.com/a/53062251/401226


Use the Linux split command:

split -l 20 file.txt new    

Split the file "file.txt" into files beginning with the name "new" each containing 20 lines of text each.

Type man split at the Unix prompt for more information. However you will have to first remove the header from file.txt (using the tail command, for example) and then add it back on to each of the split files.


This should work !!!

file_name = Name of the file you want to split.
10000 = Number of rows each split file would contain
file_part_ = Prefix of split file name (file_part_0,file_part_1,file_part_2..etc goes on)

split -d -l 10000 file_name.csv file_part_


This should do it for you - all your files will end up called Part1-Part500.

#!/bin/bash
FILENAME=10000.csv
HDR=$(head -1 $FILENAME)   # Pick up CSV header line to apply to each file
split -l 20 $FILENAME xyz  # Split the file into chunks of 20 lines each
n=1
for f in xyz*              # Go through all newly created chunks
do
   echo $HDR > Part${n}    # Write out header to new file called "Part(n)"
   cat $f >> Part${n}      # Add in the 20 lines from the "split" command
   rm $f                   # Remove temporary file
   ((n++))                 # Increment name of output part
done

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows