[linux] How to use sed to extract substring

I have a file containing the following lines:

  <parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
  <parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
  <parameter name="RemoteHost" access="readWrite"></parameter>
  <parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="PortMappingProtocol" access="readWrite"></parameter>
  <parameter name="InternalClient" access="readWrite"></parameter>
  <parameter name="PortMappingDescription" access="readWrite"></parameter>

I want to execute command on this file to extract only the parameter names as displayed in the following output:

$sedcommand file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

What could be this command?

This question is related to linux shell ubuntu xml-parsing sed

The answer is


You want awk.

This would be a quick and dirty hack:

awk -F "\"" '{print $2}' /tmp/file.txt

PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

Explaining how you can use cut:

cat yourxmlfile | cut -d'"' -f2

It will 'cut' all the lines in the file based on " delimiter, and will take the 2nd field , which is what you wanted.


sed 's/[^"]*"\([^"]*\).*/\1/'

does the job.

explanation of the part inside ' '

  • s - tells sed to substitute
  • / - start of regex string to search for
  • [^"]* - any character that is not ", any number of times. (matching parameter name=)
  • " - just a ".
  • ([^"]*) - anything inside () will be saved for reference to use later. The \ are there so the brackets are not considered as characters to search for. [^"]* means the same as above. (matching RemoteHost for example)
  • .* - any character, any number of times. (matching " access="readWrite"> /parameter)
  • / - end of the search regex, and start of the substitute string.
  • \1 - reference to that string we found in the brackets above.
  • / end of the substitute string.

basically s/search for this/replace with this/ but we're telling him to replace the whole line with just a piece of it we found earlier.


grep was born to extract things:

grep -Po 'name="\K[^"]*'

test with your data:

kent$  echo '<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
  <parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
  <parameter name="RemoteHost" access="readWrite"></parameter>
  <parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
  <parameter name="PortMappingProtocol" access="readWrite"></parameter>
  <parameter name="InternalClient" access="readWrite"></parameter>
  <parameter name="PortMappingDescription" access="readWrite"></parameter>
'|grep -Po 'name="\K[^"]*'
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

You should not parse XML using tools like sed, or awk. It's error-prone.

If input changes, and before name parameter you will get new-line character instead of space it will fail some day producing unexpected results.

If you are really sure, that your input will be always formated this way, you can use cut. It's faster than sed and awk:

cut -d'"' -f2 < input.txt

It will be better to first parse it, and extract only parameter name attribute:

xpath -q -e //@name input.txt | cut -d'"' -f2

To learn more about xpath, see this tutorial: http://www.w3schools.com/xpath/


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to ubuntu

grep's at sign caught as whitespace "E: Unable to locate package python-pip" on Ubuntu 18.04 How to Install pip for python 3.7 on Ubuntu 18? "Repository does not have a release file" error ping: google.com: Temporary failure in name resolution How to install JDK 11 under Ubuntu? How to upgrade Python version to 3.7? Issue in installing php7.2-mcrypt Install Qt on Ubuntu Failed to start mongod.service: Unit mongod.service not found

Examples related to xml-parsing

How to create XML file with specific structure in Java jQuery xml error ' No 'Access-Control-Allow-Origin' header is present on the requested resource.' SyntaxError of Non-ASCII character xml.LoadData - Data at the root level is invalid. Line 1, position 1 XML Error: Extra content at the end of the document How to use sed to extract substring How to fix Invalid byte 1 of 1-byte UTF-8 sequence The best node module for XML parsing Parsing XML with namespace in Python via 'ElementTree' oracle plsql: how to parse XML and insert into table

Examples related to sed

Retrieve last 100 lines logs How to replace multiple patterns at once with sed? Insert multiple lines into a file after specified pattern using shell script Linux bash script to extract IP address Ansible playbook shell output remove white space from the end of line in linux bash, extract string before a colon invalid command code ., despite escaping periods, using sed RE error: illegal byte sequence on Mac OS X How to use variables in a command in sed?