I have a file containing the following lines:
<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
<parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
<parameter name="RemoteHost" access="readWrite"></parameter>
<parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="PortMappingProtocol" access="readWrite"></parameter>
<parameter name="InternalClient" access="readWrite"></parameter>
<parameter name="PortMappingDescription" access="readWrite"></parameter>
I want to execute command on this file to extract only the parameter names as displayed in the following output:
$sedcommand file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
What could be this command?
This question is related to
linux
shell
ubuntu
xml-parsing
sed
You want awk
.
This would be a quick and dirty hack:
awk -F "\"" '{print $2}' /tmp/file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
Explaining how you can use cut
:
cat yourxmlfile | cut -d'"' -f2
It will 'cut' all the lines in the file based on " delimiter, and will take the 2nd field , which is what you wanted.
sed 's/[^"]*"\([^"]*\).*/\1/'
does the job.
explanation of the part inside ' '
basically s/search for this/replace with this/ but we're telling him to replace the whole line with just a piece of it we found earlier.
grep was born to extract things:
grep -Po 'name="\K[^"]*'
test with your data:
kent$ echo '<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
<parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
<parameter name="RemoteHost" access="readWrite"></parameter>
<parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="PortMappingProtocol" access="readWrite"></parameter>
<parameter name="InternalClient" access="readWrite"></parameter>
<parameter name="PortMappingDescription" access="readWrite"></parameter>
'|grep -Po 'name="\K[^"]*'
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
You should not parse XML using tools like sed, or awk. It's error-prone.
If input changes, and before name parameter you will get new-line character instead of space it will fail some day producing unexpected results.
If you are really sure, that your input will be always formated this way, you can use cut
.
It's faster than sed
and awk
:
cut -d'"' -f2 < input.txt
It will be better to first parse it, and extract only parameter name attribute:
xpath -q -e //@name input.txt | cut -d'"' -f2
To learn more about xpath, see this tutorial: http://www.w3schools.com/xpath/
Source: Stackoverflow.com