Tuesday, 10 September 2013

read subprocess output multi-byte characters one by one

read subprocess output multi-byte characters one by one

I'm running a process using subprocess:
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
What I want to do is to read the output characters one by one in a loop:
while something:
char = p.stdout.read(1)
In python3 subprocess.Popen().stdout.read() returns bytes() not str(). I
want to use it as str so I have to do:
char = char.decode("utf-8")
and it works fine with ascii characters.
But with non-ascii characters (eg. greek letters) I get a
UnicodeDecodeError. That's why greek chars are consisted of more than one
bytes. Here's the problem:
>>> b'\xce\xb5'.decode('utf-8')
'å'
>>> b'\xce'.decode('utf-8') # b'\xce' is what subprocess...read(1) returns
- one byte
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0:
unexpected end of data
>>>
How can I deal with this? The output of subprocess.Popen().stdout.read()
(as a string) can be something like "lorem ipsum åöäöäóloremipsum".
I want to read one character a time but this character can consisted of
multiple bytes.

No comments:

Post a Comment