Tuesday, February 26, 2013

Python Slicing and Copies

I've seen concern a number of times, on blogs or in code reviews, that slicing lists in Python negatively impacts performance, especially when dealing with large data. The concern is about objects being copied as a result of slicing.

I hadn't heard of deep copy being the default behavior for, well, anything in any language. So, it surprised me to think Python would behave like that and I started looking around for information. I could have done my own test initially instead, but I'm relatively new to Python and didn't immediately know how.

It took awhile to find anything. It's one of those things where probably most people know it, but those who don't aren't corrected often.

Anyway, I found in the introductory documentation on lists that "... slice operations return a new list containing the requested elements. This means that the following slice returns a shallow copy of the list..." A shallow copy means that only object references are copied. Strings behave the same way when sliced, probably because lists and strings are both sequence types.

Now, to an example I could have used instead of searching for documentation:

class String(object):
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        a = []
        for c in self.value:
            a.append('%c [%s]' % (c, hex(id(c))))
        return ''.join([self.value, ': ', repr(a)])

s0 = String('OHAY')
print s0
s1 = String(s0.value[:1])
print s1
s2 = String(s0.value[1:])
print s2

And the output:

OHAY: ['O [0x100453a80]', 'H [0x1004d0780]', 'A [0x1004d07b0]', 'Y [0x1004d0840]']
O: ['O [0x100453a80]']
HAY: ['H [0x1004d0780]', 'A [0x1004d07b0]', 'Y [0x1004d0840]']

You can see from the object identities that only the references are copied, not the objects. This is generally inexpensive, that is, it's unlikely to hurt overall performance. Always measure first before trying to optimize.