[unix] Trim last 3 characters of a line WITHOUT using sed, or perl, etc

I've got a shell script outputting data like this:

1234567890  *
1234567891  *

I need to remove JUST the last three characters " *". I know I can do it via

(whatever) | sed 's/\(.*\).../\1/'

But I DON'T want to use sed for speed purposes. It will always be the same last 3 characters.

Any quick way of cleaning up the output?

This question is related to unix shell sed

The answer is


Both awk and sed are plenty fast, but if you think it matters feel free to use one of the following:

If the characters that you want to delete are always at the end of the string

echo '1234567890  *' | tr -d ' *'

If they can appear anywhere within the string and you only want to delete those at the end

echo '1234567890  *' | rev | cut -c 4- | rev

The man pages of all the commands will explain what's going on.

I think you should use sed, though.


I can guarantee you that bash alone won't be any faster than sed for this task. Starting up external processes in bash is a generally bad idea but only if you do it a lot.

So, if you're starting a sed process for each line of your input, I'd be concerned. But you're not. You only need to start one sed which will do all the work for you.

You may however find that the following sed will be a bit faster than your version:

(whatever) | sed 's/...$//'

All this does is remove the last three characters on each line, rather than substituting the whole line with a shorter version of itself. Now maybe more modern RE engines can optimise your command but why take the risk.

To be honest, about the only way I can think of that would be faster would be to hand-craft your own C-based filter program. And the only reason that may be faster than sed is because you can take advantage of the extra knowledge you have on your processing needs (sed has to allow for generalised procession so may be slower because of that).

Don't forget the optimisation mantra: "Measure, don't guess!"


If you really want to do this one line at a time in bash (and I still maintain that it's a bad idea), you can use:

pax> line=123456789abc
pax> line2=${line%%???}
pax> echo ${line2}
123456789
pax> _

You may also want to investigate whether you actually need a speed improvement. If you process the lines as one big chunk, you'll see that sed is plenty fast. Type in the following:

#!/usr/bin/bash

echo This is a pretty chunky line with three bad characters at the end.XXX >qq1
for i in 4 16 64 256 1024 4096 16384 65536 ; do
    cat qq1 qq1 >qq2
    cat qq2 qq2 >qq1
done

head -20000l qq1 >qq2
wc -l qq2

date
time sed 's/...$//' qq2 >qq1
date
head -3l qq1

and run it. Here's the output on my (not very fast at all) R40 laptop:

pax> ./chk.sh
20000 qq2
Sat Jul 24 13:09:15 WAST 2010

real    0m0.851s
user    0m0.781s
sys     0m0.050s
Sat Jul 24 13:09:16 WAST 2010
This is a pretty chunky line with three bad characters at the end.
This is a pretty chunky line with three bad characters at the end.
This is a pretty chunky line with three bad characters at the end.

That's 20,000 lines in under a second, pretty good for something that's only done every hour.


Here's an old-fashioned unix trick for removing the last 3 characters from a line that makes no use of sed OR awk...

> echo 987654321 | rev | cut -c 4- | rev

987654

Unlike the earlier example using 'cut', this does not require knowledge of the line length.


what do you mean don't want to use sed/awk for speed purposes? sed/awk are faster than the shell's while read loop for processing files.

$ sed 's/[ \t]*\*$//' file
1234567890
1234567891

$ sed 's/..\*$//' file
1234567890
1234567891

with bash shell

while read -r a b
do
 echo $a
done <file

No need for cut or magic, in bash you can cut a string like so:

  ORGSTRING="123456"
  CUTSTRING=${ORGSTRING:0:-3}
  echo "The original string: $ORGSTRING"
  echo "The new, shorter and faster string: $CUTSTRING"

See http://tldp.org/LDP/abs/html/string-manipulation.html


Note: This answer is somewhat intended to be a joke, but it actually does work...

#!/bin/bash
outfile="/tmp/$RANDOM"
cfile="$outfile.c"
echo '#include <stdio.h>
int main(void){int e=1;char c;while((c=getc(stdin))!=-1){if(c==10)e=1;if(c==32)e=0;if(e)putc(c,stdout);}}' >> "$cfile"
gcc -o "$outfile" "$cfile"
rm "$cfile"
cat somedata.txt | "$outfile"
rm "$outfile"

You can replace cat somedata.txt with a different command.


You could try

(whatever) | while read line; do echo $line | head --bytes -3; done;

head itself should be faster than sed or cut because there's no regex or delimeter matching, but invoking a for every line separately would probably outweigh that.


You can use awk just to print the first 'field' if there won't be any spaces (or if there will be, change the separator'.

I put the fields you had above into a file and did this

awk '{ print $1 }' < test.txt 
1234567890
1234567891

I don't know if that's any better.


If the script always outputs lines of 10 characters followed by 3 extra (in other words, you just want the first 10 characters), you can use

script | cut -c 1-10

If it outputs an uncertain number of non-space characters, followed by a space and then 2 other extra characters (in other words, you just want the first field), you can use

script | cut -d ' ' -f 1

... as in majhool's comment earlier. Depending on your platform, you may also have colrm, which, again, would work if the lines are a fixed length:

script | colrm 11

$ x="can_haz"
$ echo "${x%???}"
can_

Another answer relies on the third-to-last character being a space. This will work with (almost) any character in that position and does it "WITHOUT using sed, or perl, etc.":

while read -r line
do
    echo ${line:0:${#line}-3}
done

If your lines are fixed length change the echo to:

echo ${line:0:9}

or

printf "%.10s\n" "$line"

but each of these is definitely much slower than sed.


Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to sed

Retrieve last 100 lines logs How to replace multiple patterns at once with sed? Insert multiple lines into a file after specified pattern using shell script Linux bash script to extract IP address Ansible playbook shell output remove white space from the end of line in linux bash, extract string before a colon invalid command code ., despite escaping periods, using sed RE error: illegal byte sequence on Mac OS X How to use variables in a command in sed?