[Python] String type 한글 바이트 -> 한글 변환

Programming Language/Python

[Python] String type 한글 바이트 -> 한글 변환

깜태 2022. 4. 11. 15:11

728x90

유니코드를 변환하고 받아오는 과정에서 타입이 변환되는 경우가 생긴다.

나의 경우는 대략 이렇다.

아래와 같이 입력부터 한글로 들어왔다면 인코딩을 utf-8로 했다면 디코딩도 utf-8로 진행하면 되므로 문제가 되지 않는다.

a = "파이썬"
a = a.encode('utf-8')
print(a, type(a))
b'\xed\x8c\x8c\xec\x9d\xb4\xec\x8d\xac' <class 'bytes'>

문제는 아래와 같은 상황인데, 유니코드가 byte로 넘어왔는데 인식이 잘못되어 string으로 변환된 경우이다.

a = "\\xed\\x8c\\x8c\\xec\\x9d\\xb4\\xec\\x8d\\xac"  # 한글, utf-8로 파이썬
print(a, type(a))
# \xed\x8c\x8c\xec\x9d\xb4\xec\x8d\xac <class 'str'>

내가 발견한 방법은 \x 부분을 지워버린 뒤 bytearray.fromhex() 메소드를 이용하는 법이다.

byte_str = "\\xed\\x8c\\x8c\\xec\\x9d\\xb4\\xec\\x8d\\xac"

# 1. replace
byte_str = byte_str.replace('\\x', '')
print(byte_str, type(byte_str))  
# ed8c8cec9db4ec8dac <class 'str'>

# 2. bytesarray.fromhex()
byte_str = bytearray.fromhex(byte_str)
print(byte_str, type(byte_str))  
# bytearray(b'\xed\x8c\x8c\xec\x9d\xb4\xec\x8d\xac') <class 'bytearray'>

# 3. decode by utf-8
byte_str = byte_str.decode('utf-8')
print(byte_str, type(byte_str))
# 파이썬 <class 'str'>

파이썬이 타입이 자유로운만큼 지멋대로인지라 짜증날 때도 있다.

특히 유니코드가 그런거 같은데, 참고가 되었으면 좋겠다.

728x90

'Programming Language > Python' 카테고리의 다른 글

requests 사용시 주의사항 (0)	2022.02.04
파이썬에서 쉘 스크립트 사용하기 (0)	2021.09.09
[shapely] 지정된 모듈을 찾을 수 없습니다 (0)	2021.08.23
애플워치로 HRV 데이터 추출하기 (2)	2021.07.02
[python] joblib.load() 에러 (0)	2021.05.12

현재글[Python] String type 한글 바이트 -> 한글 변환

공부, 그리고 기록

백준 알고리즘, efficientDet, 딥러닝, 강화학습 DQN, 테트리스 DQN, matplotlib, Brute-force, Python, python flask, 파이썬 알고리즘, Windows, 알고리즘, ONNX, Data Science, Python GIL, 몬스테라 분갈이, 테트리스 강화학습, C++, Unity ML-Agent, 브루트포스,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

깜태