bytearray - Efficient way to convert string to ctypes.c_ubyte array in Python -
i have string of 20 bytes, , convert ctypes.c_ubyte
array bit field manipulation purposes.
import ctypes str_bytes = '01234567890123456789' byte_arr = bytearray(str_bytes) raw_bytes = (ctypes.c_ubyte*20)(*(byte_arr))
is there way avoid deep copy str bytearray sake of cast?
alternatively, possible convert string bytearray without deep copy? (with techniques memoryview?)
i using python 2.7.
performance results:
using eryksun , brian larsen's suggestion, here benchmarks under vbox vm ubuntu 12.04 , python 2.7.
- method1 uses original post
- method2 uses ctype from_buffer_copy
- method3 uses ctype cast/pointer
- method4 uses numpy
results:
- method1 takes 3.87sec
- method2 takes 0.42sec
- method3 takes 1.44sec
- method4 takes 8.79sec
code:
import ctypes import time import numpy str_bytes = '01234567890123456789' def method1(): result = '' t0 = time.clock() x in xrange(0,1000000): byte_arr = bytearray(str_bytes) result = (ctypes.c_ubyte*20)(*(byte_arr)) t1 = time.clock() print(t1-t0) return result def method2(): result = '' t0 = time.clock() x in xrange(0,1000000): result = (ctypes.c_ubyte * 20).from_buffer_copy(str_bytes) t1 = time.clock() print(t1-t0) return result def method3(): result = '' t0 = time.clock() x in xrange(0,1000000): result = ctypes.cast(str_bytes, ctypes.pointer(ctypes.c_ubyte * 20))[0] t1 = time.clock() print(t1-t0) return result def method4(): result = '' t0 = time.clock() x in xrange(0,1000000): arr = numpy.asarray(str_bytes) result = arr.ctypes.data_as(ctypes.pointer(ctypes.c_ubyte*len(str_bytes))) t1 = time.clock() print(t1-t0) return result print(method1()) print(method2()) print(method3()) print(method4())
i don't that's working how think. bytearray
creates copy of string. interpreter unpacks bytearray
sequence starargs
tuple
, merges new tuple
has other args (even though there none in case). finally, c_ubyte
array initializer loops on args tuple
set elements of c_ubyte
array. that's lot of work, , lot of copying, go through initialize array.
instead can use from_buffer_copy
method, assuming string bytestring buffer interface (not unicode):
import ctypes str_bytes = '01234567890123456789' raw_bytes = (ctypes.c_ubyte * 20).from_buffer_copy(str_bytes)
that still has copy string, it's done once, , more efficiently. stated in comments, python string immutable , interned or used dict key. immutability should respected, if ctypes lets violate in practice:
>>> ctypes import * >>> s = '01234567890123456789' >>> b = cast(s, pointer(c_ubyte * 20))[0] >>> b[0] = 97 >>> s 'a1234567890123456789'
edit
i need emphasize not recommending using ctypes modify immutable cpython string. if have to, @ least check sys.getrefcount
beforehand ensure reference count 2 or less (the call adds 1). otherwise, surprised string interning names (e.g. "sys"
) , code object constants. python free reuse immutable objects sees fit. if step outside of language mutate 'immutable' object, you've broken contract.
for example, if modify already-hashed string, cached hash no longer correct contents. breaks use dict key. neither string new contents nor 1 original contents match key in dict. former has different hash, , latter has different value. way @ dict item using mutated string has incorrect hash. continuing previous example:
>>> s 'a1234567890123456789' >>> d = {s: 1} >>> d[s] 1 >>> d['a1234567890123456789'] traceback (most recent call last): file "<stdin>", line 1, in <module> keyerror: 'a1234567890123456789' >>> d['01234567890123456789'] traceback (most recent call last): file "<stdin>", line 1, in <module> keyerror: '01234567890123456789'
now consider mess if key interned string that's reused in dozens of places.
for performance analysis it's typical use timeit module. prior 3.3, timeit.default_timer
varies platform. on posix systems it's time.time
, , on windows it's time.clock
.
import timeit setup = r''' import ctypes, numpy str_bytes = '01234567890123456789' arr_t = ctypes.c_ubyte * 20 ''' methods = [ 'arr_t(*bytearray(str_bytes))', 'arr_t.from_buffer_copy(str_bytes)', 'ctypes.cast(str_bytes, ctypes.pointer(arr_t))[0]', 'numpy.asarray(str_bytes).ctypes.data_as(' 'ctypes.pointer(arr_t))[0]', ] test = lambda m: min(timeit.repeat(m, setup))
>>> tabs = [test(m) m in methods] >>> trel = [t / tabs[0] t in tabs] >>> trel [1.0, 0.060573711879182784, 0.261847116395079, 1.5389279092185282]
Comments
Post a Comment