[bash] How to extract the first two characters of a string in shell scripting?

For example, given:

USCAGoleta9311734.5021-120.1287855805

I want to extract just:

US

This question is related to bash shell grep sh gnu-coreutils

The answer is


if mystring = USCAGoleta9311734.5021-120.1287855805

print substr(mystring,0,2)

would print US

where 0 is the start position and 2 is how meny chars to read


Is this what your after?

my $string = 'USCAGoleta9311734.5021-120.1287855805';

my $first_two_chars = substr $string, 0, 2;

ref: substr


Just for the sake of fun Ill add a few that, although they are over complicated and useless, they were not mentioned :

head -c 2 <( echo 'USCAGoleta9311734.5021-120.1287855805')

echo 'USCAGoleta9311734.5021-120.1287855805' | dd bs=2 count=1 status=none

sed -e 's/^\(.\{2\}\).*/\1/;' <( echo 'USCAGoleta9311734.5021-120.1287855805')

cut -c 1-2 <( echo 'USCAGoleta9311734.5021-120.1287855805')

python -c "print(r'USCAGoleta9311734.5021-120.1287855805'[0:2])"

ruby -e 'puts "USCAGoleta9311734.5021-120.1287855805"[0..1]'

You've gotten several good answers and I'd go with the Bash builtin myself, but since you asked about sed and awk and (almost) no one else offered solutions based on them, I offer you these:

echo "USCAGoleta9311734.5021-120.1287855805" | awk '{print substr($0,0,2)}'

and

echo "USCAGoleta9311734.5021-120.1287855805" | sed 's/\(^..\).*/\1/'

The awk one ought to be fairly obvious, but here's an explanation of the sed one:

  • substitute "s/"
  • the group "()" of two of any characters ".." starting at the beginning of the line "^" and followed by any character "." repeated zero or more times "*" (the backslashes are needed to escape some of the special characters)
  • by "/" the contents of the first (and only, in this case) group (here the backslash is a special escape referring to a matching sub-expression)
  • done "/"

perl -ple 's/^(..).*/$1/'

colrm — remove columns from a file

To leave first two chars, just remove columns starting from 3

cat file | colrm 3

You can use printf:

$ original='USCAGoleta9311734.5021-120.1287855805'
$ printf '%-.2s' "$original"
US

Quite late indeed but here it is

sed 's/.//3g'

Or

awk NF=1 FPAT=..

Or

perl -pe '$_=unpack a2'

If your system is using a different shell (not bash), but your system has bash, then you can still use the inherent string manipulation of bash by invoking bash with a variable:

strEcho='echo ${str:0:2}' # '${str:2}' if you want to skip the first two characters and keep the rest
bash -c "str=\"$strFull\";$strEcho;"

Probably the most efficient method, if you're using the bash shell (and you appear to be, based on your comments), is to use the sub-string variant of parameter expansion:

pax> long="USCAGol.blah.blah.blah"
pax> short="${long:0:2}" ; echo "${short}"
US

This will set short to be the first two characters of long. If long is shorter than two characters, short will be identical to it.

This in-shell method is usually better if you're going to be doing it a lot (like 50,000 times per report as you mention) since there's no process creation overhead. All solutions which use external programs will suffer from that overhead.

If you also wanted to ensure a minimum length, you could pad it out before hand with something like:

pax> long="A"
pax> tmpstr="${long}.."
pax> short="${tmpstr:0:2}" ; echo "${short}"
A.

This would ensure that anything less than two characters in length was padded on the right with periods (or something else, just by changing the character used when creating tmpstr). It's not clear that you need this but I thought I'd put it in for completeness.


Having said that, there are any number of ways to do this with external programs (such as if you don't have bash available to you), some of which are:

short=$(echo "${long}" | cut -c1-2)
short=$(echo "${long}" | head -c2)
short=$(echo "${long}" | awk '{print substr ($0, 0, 2)}'
short=$(echo "${long}" | sed 's/^\(..\).*/\1/')

The first two (cut and head) are identical for a single-line string - they basically both just give you back the first two characters. They differ in that cut will give you the first two characters of each line and head will give you the first two characters of the entire input

The third one uses the awk sub-string function to extract the first two characters and the fourth uses sed capture groups (using () and \1) to capture the first two characters and replace the entire line with them. They're both similar to cut - they deliver the first two characters of each line in the input.

None of that matters if you are sure your input is a single line, they all have an identical effect.


If you want to use shell scripting and not rely on non-posix extensions (such as so-called bashisms), you can use techniques that do not require forking external tools such as grep, sed, cut, awk, etc., which then make your script less efficient. Maybe efficiency and posix portability is not important in your use case. But in case it is (or just as a good habit), you can use the following parameter expansion option method to extract the first two characters of a shell variable:

$ sh -c 'var=abcde; echo "${var%${var#??}}"'
ab

This uses "smallest prefix" parameter expansion to remove the first two characters (this is the ${var#??} part), then "smallest suffix" parameter expansion (the ${var% part) to remove that all-but-the-first-two-characters string from the original value.

This method was previously described in this answer to the "Shell = Check if variable begins with #" question. That answer also describes a couple similar parameter expansion methods that can be used in a slightly different context that the one that applies to the original question here.


easiest way is

${string:position:length}

Where this extracts $length substring from $string at $position.

This is a bash builtin so awk or sed is not required.


If you're in bash, you can say:

bash-3.2$ var=abcd
bash-3.2$ echo ${var:0:2}
ab

This may be just what you need…


Just grep:

echo 'abcdef' | grep -Po "^.."        # ab

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to grep

grep's at sign caught as whitespace cat, grep and cut - translated to python How to suppress binary file matching results in grep Linux find and grep command together Filtering JSON array using jQuery grep() Linux Script to check if process is running and act on the result grep without showing path/file:line How do you grep a file and get the next 5 lines How to grep, excluding some patterns? Fast way of finding lines in one file that are not in another?

Examples related to sh

How to run a cron job inside a docker container? I just assigned a variable, but echo $variable shows something else How to run .sh on Windows Command Prompt? Shell Script: How to write a string to file and to stdout on console? How to cat <<EOF >> a file containing code? Assigning the output of a command to a variable What does set -e mean in a bash script? Get specific line from text file using just shell script Printing PDFs from Windows Command Line Ubuntu says "bash: ./program Permission denied"

Examples related to gnu-coreutils

How can I remove the extension of a filename in a shell script? How do I read the source code of shell commands? How to extract the first two characters of a string in shell scripting?