[linux] Linux command (like cat) to read a specified quantity of characters

Is there a command like cat in linux which can return a specified quantity of characters from a file?

e.g., I have a text file like:

Hello world
this is the second line
this is the third line

And I want something that would return the first 5 characters, which would be "hello".

thanks

This question is related to linux command-line

The answer is


I know the answer is in reply to a question asked 6 years ago ...

But I was looking for something similar for a few hours and then found out that: cut -c does exactly that, with an added bonus that you could also specify an offset.

cut -c 1-5 will return Hello and cut -c 7-11 will return world. No need for any other command


head or tail can do it as well:

head -c X

Prints the first X bytes (not necessarily characters if it's a UTF-16 file) of the file. tail will do the same, except for the last X bytes.

This (and cut) are portable.


you could also grep the line out and then cut it like for instance:

grep 'text' filename | cut -c 1-5


Even though this was answered/accepted years ago, the presently accepted answer is only correct for one-byte-per-character encodings like iso-8859-1, or for the single-byte subsets of variable-byte character sets (like Latin characters within UTF-8). Even using multiple-byte splices instead would still only work for fixed-multibyte encodings like UTF-16. Given that now UTF-8 is well on its way to being a universal standard, and when looking at this list of languages by number of native speakers and this list of top 30 languages by native/secondary usage, it is important to point out a simple variable-byte character-friendly (not byte-based) technique, using cut -c and tr/sed with character-classes.

Compare the following which doubly fails due to two common Latin-centric mistakes/presumptions regarding the bytes vs. characters issue (one is head vs. cut, the other is [a-z][A-Z] vs. [:upper:][:lower:]):

$ printf '??? µp??? ?a µ??? sa?s???t???;\n' | \
$     head -c 1 | \
$     sed -e 's/[A-Z]/[a-z]/g'
[[unreadable binary mess, or nothing if the terminal filtered it]]

to this (note: this worked fine on FreeBSD, but both cut & tr on GNU/Linux still mangled Greek in UTF-8 for me though):

$ printf '??? µp??? ?a µ??? sa?s???t???;\n' | \
$     cut -c 1 | \
$     tr '[:upper:]' '[:lower:]'
p

Another more recent answer had already proposed "cut", but only because of the side issue that it can be used to specify arbitrary offsets, not because of the directly relevant character vs. bytes issue.

If your cut doesn't handle -c with variable-byte encodings correctly, for "the first X characters" (replace X with your number) you could try:

  • sed -E -e '1 s/^(.{X}).*$/\1/' -e q - which is limited to the first line though
  • head -n 1 | grep -E -o '^.{X}' - which is limited to the first line and chains two commands though
  • dd - which has already been suggested in other answers, but is really cumbersome
  • A complicated sed script with sliding window buffer to handle characters spread over multiple lines, but that is probably more cumbersome/fragile than just using something like dd

If your tr doesn't handle character-classes with variable-byte encodings correctly you could try:

  • sed -E -e 's/[[:upper:]]/\L&/g (GNU-specific)

head or tail can do it as well:

head -c X

Prints the first X bytes (not necessarily characters if it's a UTF-16 file) of the file. tail will do the same, except for the last X bytes.

This (and cut) are portable.


head:

Name

head - output the first part of files

Synopsis

head [OPTION]... [FILE]...

Description

Print the first 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name. With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
-c, --bytes=[-]N print the first N bytes of each file; with the leading '-', print all but the last N bytes of each file


You can use dd to extract arbitrary chunks of bytes.

For example,

dd skip=1234 count=5 bs=1

would copy bytes 1235 to 1239 from its input to its output, and discard the rest.

To just get the first five bytes from standard input, do:

dd count=5 bs=1

Note that, if you want to specify the input file name, dd has old-fashioned argument parsing, so you would do:

dd count=5 bs=1 if=filename

Note also that dd verbosely announces what it did, so to toss that away, do:

dd count=5 bs=1 2>&-

or

dd count=5 bs=1 2>/dev/null

head -Line_number file_name | tail -1 |cut -c Num_of_chars

this script gives the exact number of characters from the specific line and location, e.g.:

head -5 tst.txt | tail -1 |cut -c 5-8

gives the chars in line 5 and chars 5 to 8 of line 5,

Note: tail -1 is used to select the last line displayed by the head.


head:

Name

head - output the first part of files

Synopsis

head [OPTION]... [FILE]...

Description

Print the first 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name. With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
-c, --bytes=[-]N print the first N bytes of each file; with the leading '-', print all but the last N bytes of each file


you could also grep the line out and then cut it like for instance:

grep 'text' filename | cut -c 1-5


I know the answer is in reply to a question asked 6 years ago ...

But I was looking for something similar for a few hours and then found out that: cut -c does exactly that, with an added bonus that you could also specify an offset.

cut -c 1-5 will return Hello and cut -c 7-11 will return world. No need for any other command


head or tail can do it as well:

head -c X

Prints the first X bytes (not necessarily characters if it's a UTF-16 file) of the file. tail will do the same, except for the last X bytes.

This (and cut) are portable.


You can use dd to extract arbitrary chunks of bytes.

For example,

dd skip=1234 count=5 bs=1

would copy bytes 1235 to 1239 from its input to its output, and discard the rest.

To just get the first five bytes from standard input, do:

dd count=5 bs=1

Note that, if you want to specify the input file name, dd has old-fashioned argument parsing, so you would do:

dd count=5 bs=1 if=filename

Note also that dd verbosely announces what it did, so to toss that away, do:

dd count=5 bs=1 2>&-

or

dd count=5 bs=1 2>/dev/null

head:

Name

head - output the first part of files

Synopsis

head [OPTION]... [FILE]...

Description

Print the first 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name. With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
-c, --bytes=[-]N print the first N bytes of each file; with the leading '-', print all but the last N bytes of each file


you could also grep the line out and then cut it like for instance:

grep 'text' filename | cut -c 1-5


Here's a simple script that wraps up using the dd approach mentioned here:

extract_chars.sh

#!/usr/bin/env bash

function show_help()
{
  IT="
extracts characters X to Y from stdin or FILE
usage: X Y {FILE}

e.g. 

2 10 /tmp/it     => extract chars 2-10 from /tmp/it
EOF
  "
  echo "$IT"
  exit
}

if [ "$1" == "help" ]
then
  show_help
fi
if [ -z "$1" ]
then
  show_help
fi

FROM=$1
TO=$2
COUNT=`expr $TO - $FROM + 1`

if [ -z "$3" ]
then
  dd skip=$FROM count=$COUNT bs=1 2>/dev/null
else
  dd skip=$FROM count=$COUNT bs=1 if=$3 2>/dev/null 
fi

you could also grep the line out and then cut it like for instance:

grep 'text' filename | cut -c 1-5


You can use dd to extract arbitrary chunks of bytes.

For example,

dd skip=1234 count=5 bs=1

would copy bytes 1235 to 1239 from its input to its output, and discard the rest.

To just get the first five bytes from standard input, do:

dd count=5 bs=1

Note that, if you want to specify the input file name, dd has old-fashioned argument parsing, so you would do:

dd count=5 bs=1 if=filename

Note also that dd verbosely announces what it did, so to toss that away, do:

dd count=5 bs=1 2>&-

or

dd count=5 bs=1 2>/dev/null

Even though this was answered/accepted years ago, the presently accepted answer is only correct for one-byte-per-character encodings like iso-8859-1, or for the single-byte subsets of variable-byte character sets (like Latin characters within UTF-8). Even using multiple-byte splices instead would still only work for fixed-multibyte encodings like UTF-16. Given that now UTF-8 is well on its way to being a universal standard, and when looking at this list of languages by number of native speakers and this list of top 30 languages by native/secondary usage, it is important to point out a simple variable-byte character-friendly (not byte-based) technique, using cut -c and tr/sed with character-classes.

Compare the following which doubly fails due to two common Latin-centric mistakes/presumptions regarding the bytes vs. characters issue (one is head vs. cut, the other is [a-z][A-Z] vs. [:upper:][:lower:]):

$ printf '??? µp??? ?a µ??? sa?s???t???;\n' | \
$     head -c 1 | \
$     sed -e 's/[A-Z]/[a-z]/g'
[[unreadable binary mess, or nothing if the terminal filtered it]]

to this (note: this worked fine on FreeBSD, but both cut & tr on GNU/Linux still mangled Greek in UTF-8 for me though):

$ printf '??? µp??? ?a µ??? sa?s???t???;\n' | \
$     cut -c 1 | \
$     tr '[:upper:]' '[:lower:]'
p

Another more recent answer had already proposed "cut", but only because of the side issue that it can be used to specify arbitrary offsets, not because of the directly relevant character vs. bytes issue.

If your cut doesn't handle -c with variable-byte encodings correctly, for "the first X characters" (replace X with your number) you could try:

  • sed -E -e '1 s/^(.{X}).*$/\1/' -e q - which is limited to the first line though
  • head -n 1 | grep -E -o '^.{X}' - which is limited to the first line and chains two commands though
  • dd - which has already been suggested in other answers, but is really cumbersome
  • A complicated sed script with sliding window buffer to handle characters spread over multiple lines, but that is probably more cumbersome/fragile than just using something like dd

If your tr doesn't handle character-classes with variable-byte encodings correctly you could try:

  • sed -E -e 's/[[:upper:]]/\L&/g (GNU-specific)

head or tail can do it as well:

head -c X

Prints the first X bytes (not necessarily characters if it's a UTF-16 file) of the file. tail will do the same, except for the last X bytes.

This (and cut) are portable.


head -Line_number file_name | tail -1 |cut -c Num_of_chars

this script gives the exact number of characters from the specific line and location, e.g.:

head -5 tst.txt | tail -1 |cut -c 5-8

gives the chars in line 5 and chars 5 to 8 of line 5,

Note: tail -1 is used to select the last line displayed by the head.


You can use dd to extract arbitrary chunks of bytes.

For example,

dd skip=1234 count=5 bs=1

would copy bytes 1235 to 1239 from its input to its output, and discard the rest.

To just get the first five bytes from standard input, do:

dd count=5 bs=1

Note that, if you want to specify the input file name, dd has old-fashioned argument parsing, so you would do:

dd count=5 bs=1 if=filename

Note also that dd verbosely announces what it did, so to toss that away, do:

dd count=5 bs=1 2>&-

or

dd count=5 bs=1 2>/dev/null

Here's a simple script that wraps up using the dd approach mentioned here:

extract_chars.sh

#!/usr/bin/env bash

function show_help()
{
  IT="
extracts characters X to Y from stdin or FILE
usage: X Y {FILE}

e.g. 

2 10 /tmp/it     => extract chars 2-10 from /tmp/it
EOF
  "
  echo "$IT"
  exit
}

if [ "$1" == "help" ]
then
  show_help
fi
if [ -z "$1" ]
then
  show_help
fi

FROM=$1
TO=$2
COUNT=`expr $TO - $FROM + 1`

if [ -z "$3" ]
then
  dd skip=$FROM count=$COUNT bs=1 2>/dev/null
else
  dd skip=$FROM count=$COUNT bs=1 if=$3 2>/dev/null 
fi