I thought this code would work, but the regular expression doesn't ever match the \r\n. I have viewed the data I am reading in a hex editor and verified there really is a hex D and hex A pattern in the file.
I have also tried the regular expressions /\xD\xA/m and /\x0D\x0A/m but they also didn't match.
This is my code right now:
lines2 = lines.gsub( /\r\n/m, "\n" )
if ( lines == lines2 )
print "still the same\n"
else
print "made the change\n"
end
In addition to alternatives, it would be nice to know what I'm doing wrong (to facilitate some learning on my part). :)
What do you get when you do puts lines
? That will give you a clue.
By default File.open
opens the file in text mode, so your \r\n
characters will be automatically converted to \n
. Maybe that's the reason lines
are always equal to lines2
. To prevent Ruby from parsing the line ends use the rb
mode:
C:\> copy con lala.txt a file with many lines ^Z C:\> irb irb(main):001:0> text = File.open('lala.txt').read => "a\nfile\nwith\nmany\nlines\n" irb(main):002:0> bin = File.open('lala.txt', 'rb').read => "a\r\nfile\r\nwith\r\nmany\r\nlines\r\n" irb(main):003:0>
But from your question and code I see you simply need to open the file with the default modifier. You don't need any conversion and may use the shorter File.read
.
Just another variant:
lines.delete(" \n")
How about the following?
irb(main):003:0> my_string = "Some text with a carriage return \r"
=> "Some text with a carriage return \r"
irb(main):004:0> my_string.gsub(/\r/,"")
=> "Some text with a carriage return "
irb(main):005:0>
Or...
irb(main):007:0> my_string = "Some text with a carriage return \r\n"
=> "Some text with a carriage return \r\n"
irb(main):008:0> my_string.gsub(/\r\n/,"\n")
=> "Some text with a carriage return \n"
irb(main):009:0>
def dos2unix(input)
input.each_byte.map { |c| c.chr unless c == 13 }.join
end
remove_all_the_carriage_returns = dos2unix(some_blob)
You can use this :
my_string.strip.gsub(/\s+/, ' ')
Why not read the file in text mode, rather than binary mode?
Generally when I deal with stripping \r or \n, I'll look for both by doing something like
lines.gsub(/\r\n?/, "\n");
I've found that depending on how the data was saved (the OS used, editor used, Jupiter's relation to Io at the time) there may or may not be the newline after the carriage return. It does seem weird that you see both characters in hex mode. Hope this helps.
lines2 = lines.split.join("\n")
modified_string = string.gsub(/\s+/, ' ').strip
lines.map(&:strip).join(" ")
modified_string = string.gsub(/\s+/, ' ').strip
lines2 = lines.split.join("\n")
How about the following?
irb(main):003:0> my_string = "Some text with a carriage return \r"
=> "Some text with a carriage return \r"
irb(main):004:0> my_string.gsub(/\r/,"")
=> "Some text with a carriage return "
irb(main):005:0>
Or...
irb(main):007:0> my_string = "Some text with a carriage return \r\n"
=> "Some text with a carriage return \r\n"
irb(main):008:0> my_string.gsub(/\r\n/,"\n")
=> "Some text with a carriage return \n"
irb(main):009:0>
If you are using Rails, there is a squish
method
"\tgoodbye\r\n".squish => "goodbye"
"\tgood \t\r\nbye\r\n".squish => "good bye"
Generally when I deal with stripping \r or \n, I'll look for both by doing something like
lines.gsub(/\r\n?/, "\n");
I've found that depending on how the data was saved (the OS used, editor used, Jupiter's relation to Io at the time) there may or may not be the newline after the carriage return. It does seem weird that you see both characters in hex mode. Hope this helps.
Use String#strip
Returns a copy of str with leading and trailing whitespace removed.
e.g
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
Using gsub
string = string.gsub(/\r/," ")
string = string.gsub(/\n/," ")
Generally when I deal with stripping \r or \n, I'll look for both by doing something like
lines.gsub(/\r\n?/, "\n");
I've found that depending on how the data was saved (the OS used, editor used, Jupiter's relation to Io at the time) there may or may not be the newline after the carriage return. It does seem weird that you see both characters in hex mode. Hope this helps.
Why not read the file in text mode, rather than binary mode?
Use String#strip
Returns a copy of str with leading and trailing whitespace removed.
e.g
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
Using gsub
string = string.gsub(/\r/," ")
string = string.gsub(/\n/," ")
def dos2unix(input)
input.each_byte.map { |c| c.chr unless c == 13 }.join
end
remove_all_the_carriage_returns = dos2unix(some_blob)
lines2 = lines.split.join("\n")
You can use this :
my_string.strip.gsub(/\s+/, ' ')
I think your regex is almost complete - here's what I would do:
lines2 = lines.gsub(/[\r\n]+/m, "\n")
In the above, I've put \r and \n into a class (that way it doesn't matter in which order they might appear) and added the "+" qualifier (so that "\r\n\r\n\r\n" would also match once, and the whole thing replaced with "\n")
"still the same\n".chomp
or
"still the same\n".chomp!
http://www.ruby-doc.org/core-1.9.3/String.html#method-i-chomp
How about the following?
irb(main):003:0> my_string = "Some text with a carriage return \r"
=> "Some text with a carriage return \r"
irb(main):004:0> my_string.gsub(/\r/,"")
=> "Some text with a carriage return "
irb(main):005:0>
Or...
irb(main):007:0> my_string = "Some text with a carriage return \r\n"
=> "Some text with a carriage return \r\n"
irb(main):008:0> my_string.gsub(/\r\n/,"\n")
=> "Some text with a carriage return \n"
irb(main):009:0>
lines2 = lines.split.join("\n")
Just another variant:
lines.delete(" \n")
If you are using Rails, there is a squish
method
"\tgoodbye\r\n".squish => "goodbye"
"\tgood \t\r\nbye\r\n".squish => "good bye"
lines.map(&:strip).join(" ")
Generally when I deal with stripping \r or \n, I'll look for both by doing something like
lines.gsub(/\r\n?/, "\n");
I've found that depending on how the data was saved (the OS used, editor used, Jupiter's relation to Io at the time) there may or may not be the newline after the carriage return. It does seem weird that you see both characters in hex mode. Hope this helps.
Source: Stackoverflow.com