How to split a large text file into smaller files with equal number of lines

Question

I ve got a large  by number of lines  plain text file that I d like to split into smaller files  also by number of lines   So if my file has around 2M lines  I d like to split it up into 10 files that contain 200k lines  or 100 files that contain 20k lines  plus one file with the remainder  being evenly divisible doesn t matter    I could do this fairly easily in Python but I m wondering if there s any kind of ninja way to do this using bash and unix utils  as opposed to manually looping and counting   partitioning lines

User · Answer

split  from GNU coreutils  since version 8 8 from 2010-12-22  includes the following parameter     -n  --number CHUNKS     generate CHUNKS output files  see explanation below  CHUNKS may be    N       split into N files based on size of input   K N     output Kth of N to stdout   l N     split into N files without splitting lines records   l K N   output Kth of N to stdout without splitting lines records   r N     like  l  but use round robin distribution   r K N   likewise but only output Kth of N to stdout   Thus  split -n 4 input output  will generate four files  output a a b c d   with the same amount of bytes  but lines might be broken in the middle   If we want to preserve full lines  i e  split by lines   then this should work   split -n l 4 input output    Related answer  https   stackoverflow com a 19031247

User · Answer

use split     Split a file into fixed-size pieces  creates output files containing consecutive sections of INPUT  standard input if none is given or INPUT is  -     Syntax       split  options   INPUT  PREFIX    http   ss64 com bash split html

User · Answer

Yes  there is a split command   It will split a file by lines or bytes     split --help Usage  split  OPTION      INPUT  PREFIX   Output fixed-size pieces of INPUT to PREFIXaa  PREFIXab       default size is 1000 lines  and default PREFIX is  x    With no INPUT  or when INPUT is -  read standard input   Mandatory arguments to long options are mandatory for short options too    -a  --suffix-length N   use suffixes of length N  default 2    -b  --bytes SIZE        put SIZE bytes per output file   -C  --line-bytes SIZE   put at most SIZE bytes of lines per output file   -d  --numeric-suffixes  use numeric suffixes instead of alphabetic   -l  --lines NUMBER      put NUMBER lines per output file       --verbose           print a diagnostic just before each                             output file is opened       --help     display this help and exit       --version  output version information and exit  SIZE may have a multiplier suffix  b 512  kB 1000  K 1024  MB 1000 1000  M 1024 1024  GB 1000 1000 1000  G 1024 1024 1024  and so on for T  P  E  Z  Y

User · Answer

split the file  file txt  into 10000 lines files   split -l 10000 file txt

User · Answer

Use   sed -n  1 100p  filename  gt  output txt   Here  1 and 100 are the line numbers which you will capture in output txt

User · Answer

you can also use awk  awk -vc 1  NR 200000  0   c  print  0  gt  c  txt    largefile

User · Answer

HDFS getmerge small file and spilt into property size   This method will cause line break   split -b 125m compact file -d -a 3 compact prefix   I try to getmerge and split into about 128MB every file     split into 128m  judge sizeunit is M or G  please test before use   begainsize  hdfs dfs -du -s -h  externaldata  table name  date    awk    print  1     sizeunit  hdfs dfs -du -s -h  externaldata  table name  date    awk    print  2     if    sizeunit    G    then     res   printf    f   echo  scale 5  begainsize 8   bc   else     res   printf    f   echo  scale 5  begainsize 128   bc      celling ref http   blog csdn net naiveloafer article details 8783518 fi echo  res   split into  res files with number suffix   ref  http   blog csdn net microzone article details 52839598 compact file name  compact file    echo  compact file name    compact file name split -n l  res  basedir  compact file -d -a 3  basedir   compact file name

User · Answer

How about the split command   split -l 200000 mybigfile txt

User · Answer

Have you looked at the split command     split --help Usage  split  OPTION   INPUT  PREFIX   Output fixed-size pieces of INPUT to PREFIXaa  PREFIXab       default size is 1000 lines  and default PREFIX is  x    With no INPUT  or when INPUT is -  read standard input   Mandatory arguments to long options are mandatory for short options too    -a  --suffix-length N   use suffixes of length N  default 2    -b  --bytes SIZE        put SIZE bytes per output file   -C  --line-bytes SIZE   put at most SIZE bytes of lines per output file   -d  --numeric-suffixes  use numeric suffixes instead of alphabetic   -l  --lines NUMBER      put NUMBER lines per output file       --verbose           print a diagnostic to standard error just                             before each output file is opened       --help     display this help and exit       --version  output version information and exit   You could do something like this   split -l 200000 filename   which will create files each with 200000 lines named xaa xab xac      Another option  split by size of output file  still splits on line breaks     split -C 20m --numeric-suffixes input filename output prefix   creates files like output prefix01 output prefix02 output prefix03     each of max size 20 megabytes

User · Answer

In case you just want to split by x number of lines each file  the given answers about split are OK  But  i am curious about no one paid attention to  requirements     without having to count them  -  using wc   cut  having the remainder in extra file  -  split does by default   I can t do that without  wc   cut   but I m using that   split -l    expr  wc  filename   cut -d     -f3     chunks   filename   This can be easily added to your bashrc functions so you can just invoke it passing filename and chunks    split -l    expr  wc  1   cut -d     -f3     2   1   In case you want just x chunks without remainder in extra file  just adapt the formula to sum it  chunks - 1  on each file  I do use this approach because usually i just want x number of files rather than x lines per file   split -l    expr  wc  1   cut -d     -f3     2    expr  2 - 1    1   You can add that to a script and call it your  ninja way   because if nothing suites your needs  you can build it  -

[bash] How to split a large text file into smaller files with equal number of lines?

Examples related to bash

Examples related to file

Examples related to unix