[string] extract part of a string using bash/cut/split

I have a string like this:

/var/cpanel/users/joebloggs:DNS9=domain.com

I need to extract the username (joebloggs) from this string and store it in a variable.

The format of the string will always be the same with exception of joebloggs and domain.com so I am thinking the string can be split twice using cut?

The first split would split by : and we would store the first part in a variable to pass to the second split function.

The second split would split by / and store the last word (joebloggs) into a variable

I know how to do this in php using arrays and splits but I am a bit lost in bash.

This question is related to string bash

The answer is


To extract joebloggs from this string in bash using parameter expansion without any extra processes...

MYVAR="/var/cpanel/users/joebloggs:DNS9=domain.com" 

NAME=${MYVAR%:*}  # retain the part before the colon
NAME=${NAME##*/}  # retain the part after the last slash
echo $NAME

Doesn't depend on joebloggs being at a particular depth in the path.


Summary

An overview of a few parameter expansion modes, for reference...

${MYVAR#pattern}     # delete shortest match of pattern from the beginning
${MYVAR##pattern}    # delete longest match of pattern from the beginning
${MYVAR%pattern}     # delete shortest match of pattern from the end
${MYVAR%%pattern}    # delete longest match of pattern from the end

So # means match from the beginning (think of a comment line) and % means from the end. One instance means shortest and two instances means longest.

You can get substrings based on position using numbers:

${MYVAR:3}   # Remove the first three chars (leaving 4..end)
${MYVAR::3}  # Return the first three characters
${MYVAR:3:5} # The next five characters after removing the first 3 (chars 4-9)

You can also replace particular strings or patterns using:

${MYVAR/search/replace}

The pattern is in the same format as file-name matching, so * (any characters) is common, often followed by a particular symbol like / or .

Examples:

Given a variable like

MYVAR="users/joebloggs/domain.com" 

Remove the path leaving file name (all characters up to a slash):

echo ${MYVAR##*/}
domain.com

Remove the file name, leaving the path (delete shortest match after last /):

echo ${MYVAR%/*}
users/joebloggs

Get just the file extension (remove all before last period):

echo ${MYVAR##*.}
com

NOTE: To do two operations, you can't combine them, but have to assign to an intermediate variable. So to get the file name without path or extension:

NAME=${MYVAR##*/}      # remove part before last slash
echo ${NAME%.*}        # from the new var remove the part after the last period
domain

What about sed? That will work in a single command:

sed 's#.*/\([^:]*\).*#\1#' <<<$string
  • The # are being used for regex dividers instead of / since the string has / in it.
  • .*/ grabs the string up to the last backslash.
  • \( .. \) marks a capture group. This is \([^:]*\).
    • The [^:] says any character _except a colon, and the * means zero or more.
  • .* means the rest of the line.
  • \1 means substitute what was found in the first (and only) capture group. This is the name.

Here's the breakdown matching the string with the regular expression:

        /var/cpanel/users/           joebloggs  :DNS9=domain.com joebloggs
sed 's#.*/                          \([^:]*\)   .*              #\1       #'

Define a function like this:

getUserName() {
    echo $1 | cut -d : -f 1 | xargs basename
}

And pass the string as a parameter:

userName=$(getUserName "/var/cpanel/users/joebloggs:DNS9=domain.com")
echo $userName

Using a single Awk:

... | awk -F '[/:]' '{print $5}'

That is, using as field separator either / or :, the username is always in field 5.

To store it in a variable:

username=$(... | awk -F '[/:]' '{print $5}')

A more flexible implementation with sed that doesn't require username to be field 5:

... | sed -e s/:.*// -e s?.*/??

That is, delete everything from : and beyond, and then delete everything up until the last /. sed is probably faster too than awk, so this alternative is definitely better.


Using a single sed

echo "/var/cpanel/users/joebloggs:DNS9=domain.com" | sed 's/.*\/\(.*\):.*/\1/'