This does almost everything you want:
f = open('data.txt', 'rb')
while True:
char = f.read(1)
if not char: break
print "{:02x}".format(ord(char)),
With data.txt created like this:
f = open('data.txt', 'wb')
f.write("ab\r\ncd")
f.close()
I get the following output:
61 62 0d 0a 63 64
tl;dr -- 1. You are using poor variable names. 2. You are slicing your hex strings incorrectly. 3. Your code is never going to replace any newlines. You may just want to forget about that feature. You do not quite yet understand the difference between a character, its integer code, and the hex string that represents the integer. They are all different: two are strings and one is an integer, and none of them are equal to each other. 4. For some files, you shouldn't remove newlines.
===
1. Your variable names are horrendous.
That's fine if you never want to ask anybody questions. But since every one needs to ask questions, you need to use descriptive variable names that anyone can understand. Your variable names are only slightly better than these:
fname = 'data.txt'
f = open(fname, 'rb')
xxxyxx = f.read()
xxyxxx = len(xxxyxx)
print "Length of file is", xxyxxx, "bytes. "
yxxxxx = 0
while yxxxxx < xxyxxx:
xyxxxx = hex(ord(xxxyxx[yxxxxx]))
xyxxxx = xyxxxx[-2:]
yxxxxx = yxxxxx + 1
xxxxxy = chr(13) + chr(10)
xxxxyx = str(xxxxxy)
xyxxxxx = str(xyxxxx)
xyxxxxx.replace(xxxxyx, ' ')
print xyxxxxx
That program runs fine, but it is impossible to understand.
2. The hex() function produces strings of different lengths.
For instance,
print hex(61)
print hex(15)
--output:--
0x3d
0xf
And taking the slice [-2:] for each of those strings gives you:
3d
xf
See how you got the 'x' in the second one? The slice:
[-2:]
says to go to the end of the string and back up two characters, then grab the rest of the string. Instead of doing that, take the slice starting 3 characters in from the beginning:
[2:]
3. Your code will never replace any newlines.
Suppose your file has these two consecutive characters:
"\r\n"
Now you read in the first character, "\r", and convert it to an integer, ord("\r"), giving you the integer 13. Now you convert that to a string, hex(13), which gives you the string "0xd"
, and you slice off the first two characters giving you:
"d"
Next, this line in your code:
bndtx.replace(entx, ' ')
tries to find every occurrence of the string "\r\n"
in the string "d"
and replace it. There is never going to be any replacement because the replacement string is two characters long and the string "d"
is one character long.
The replacement won't work for "\r\n"
and "0d"
either. But at least now there is a possibility it could work because both strings have two characters. Let's reduce both strings to a common denominator: ascii codes. The ascii code for "\r" is 13, and the ascii code for "\n" is 10. Now what about the string "0d"
? The ascii code for the character "0"
is 48, and the ascii code for the character "d" is 100. Those strings do not have a single character in common. Even this doesn't work:
x = '0d' + '0a'
x.replace("\r\n", " ")
print x
--output:--
'0d0a'
Nor will this:
x = 'd' + 'a'
x.replace("\r\n", " ")
print x
--output:--
da
The bottom line is: converting a character to an integer then to a hex string does not end up giving you the original character--they are just different strings. So if you do this:
char = "a"
code = ord(char)
hex_str = hex(code)
print char.replace(hex_str, " ")
...you can't expect "a" to be replaced by a space. If you examine the output here:
char = "a"
print repr(char)
code = ord(char)
print repr(code)
hex_str = hex(code)
print repr(hex_str)
print repr(
char.replace(hex_str, " ")
)
--output:--
'a'
97
'0x61'
'a'
You can see that 'a' is a string with one character in it, and '0x61'
is a string with 4 characters in it: '0'
, 'x'
, '6'
, and '1'
, and you can never find a four character string inside a one character string.
4) Removing newlines can corrupt the data.
For some files, you do not want to replace newlines. For instance, if you were reading in a .jpg file, which is a file that contains a bunch of integers representing colors in an image, and some colors in the image happened to be represented by the number 13 followed by the number 10, your code would eliminate those colors from the output.
However, if you are writing a program to read only text files, then replacing newlines is fine. But then, different operating systems use different newlines. You are trying to replace Windows newlines(\r\n), which means your program won't work on files created by a Mac or Linux computer, which use \n for newlines. There are easy ways to solve that, but maybe you don't want to worry about that just yet.
I hope all that's not too confusing.