[bash] How to split a file into equal parts, without breaking individual lines?

I was wondering if it was possible to split a file into equal parts (edit: = all equal except for the last), without breaking the line? Using the split command in Unix, lines may be broken in half. Is there a way to, say, split up a file in 5 equal parts, but have it still only consist of whole lines (it's no problem if one of the files is a little larger or smaller)? I know I could just calculate the number of lines, but I have to do this for a lot of files in a bash script. Many thanks!

This question is related to bash shell unix split

The answer is


I made a bash script, that given a number of parts as input, split a file

#!/bin/sh

parts_total="$2";
input="$1";

parts=$((parts_total))
for i in $(seq 0 $((parts_total-2))); do
  lines=$(wc -l "$input" | cut -f 1 -d" ")
  #n is rounded, 1.3 to 2, 1.6 to 2, 1 to 1
  n=$(awk  -v lines=$lines -v parts=$parts 'BEGIN { 
    n = lines/parts;
    rounded = sprintf("%.0f", n);
    if(n>rounded){
      print rounded + 1;
    }else{
      print rounded;
    }
  }');
  head -$n "$input" > split${i}
  tail -$((lines-n)) "$input" > .tmp${i}
  input=".tmp${i}"
  parts=$((parts-1));
done
mv .tmp$((parts_total-2)) split$((parts_total-1))
rm .tmp*

I used head and tail commands, and store in tmp files, for split the files

#10 means 10 parts
sh mysplitXparts.sh input_file 10

or with awk, where 0.1 is 10% => 10 parts, or 0.334 is 3 parts

awk -v size=$(wc -l < input) -v perc=0.1 '{
  nfile = int(NR/(size*perc)); 
  if(nfile >= 1/perc){
    nfile--;
  } 
  print > "split_"nfile
}' input

A simple solution for a simple question:

split -n l/5 your_file.txt

no need for scripting here.

From the man file, CHUNKS may be:

l/N     split into N files without splitting lines

Update

Not all unix dist include this flag. For example, it will not work in OSX. To use it, you can consider replacing the Mac OS X utilities with GNU core utilities.


var dict = File.ReadLines("test.txt")
               .Where(line => !string.IsNullOrWhitespace(line))
               .Select(line => line.Split(new char[] { '=' }, 2, 0))
               .ToDictionary(parts => parts[0], parts => parts[1]);


or 

    enter code here

line="[email protected][email protected]";
string[] tokens = line.Split(new char[] { '=' }, 2, 0);

ans:
tokens[0]=to
token[1][email protected][email protected]"

The script isn't even necessary, split(1) supports the wanted feature out of the box:
split -l 75 auth.log auth.log. The above command splits the file in chunks of 75 lines a piece, and outputs file on the form: auth.log.aa, auth.log.ab, ...

wc -l on the original file and output gives:

  321 auth.log
   75 auth.log.aa
   75 auth.log.ab
   75 auth.log.ac
   75 auth.log.ad
   21 auth.log.ae
  642 total

split was updated in coreutils release 8.8 (announced 22 Dec 2010) with the --number option to generate a specific number of files. The option --number=l/n generates n files without splitting lines.

http://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html#split-invocation http://savannah.gnu.org/forum/forum.php?forum_id=6662


Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows