Understanding the Internal Workings of Python: Copy, Reference Counts
In this article, we'll delve deeper into the internal mechanics of Python, focusing on reference counts, memory management, and how data types are handled in memory. Building upon the concepts discussed in the previous article on mutable vs. immutable objects, we'll explore how Python manages references, copy operations, and slices.
Reference Counts in Python:
Python employs reference counts to track the number of references to an object. When an object's reference count drops to zero, indicating that there are no more references to it, Python's garbage collector removes the object from memory. However, there's an interesting quirk when using the sys.getrefcount()
function:
import sys
print(sys.getrefcount(5)) # Output: 4294967295
The output may appear surprising initially, with a reference count of 4294967295 for the integer 5. This is because the getrefcount()
function itself creates a reference to the object, contributing to the count.
Variables and Data Types ( Interview perspective ):
In Python, variables are simply labels or references to values stored in memory. These values have data types, but variables do not. Consider the following example:
x = 10 # Here, x is just a label referring to the value 10,
# which has the integer data type
Examples with Numbers:
a = 5
b = 2
print(a) # Output: 5
print(b) # Output: 2
a = a + 2
print(a) # Output: 7
In Python, numbers and strings are commonly used, so the garbage collector doesn't immediately collect values with no references. Instead, it waits to see if a reference to the value might appear in the future, optimizing memory usage through delayed garbage collection.
Examples with Lists:
p1 = [1, 2, 3]
p2 = p1
p2 = [1, 2, 3]
p1[0] = 33
print(p1) # Output: [33, 2, 3]
print(p2) # Output: [1, 2, 3]
In this example, p1
and p2
initially reference the same list [1, 2, 3]
. However, reassigning p2
to a new list does not affect p1
because p2
now references a different memory location.
l1 = [1, 2, 3]
l2 = l1
print(l1) # Output: [1, 2, 3]
print(l2) # Output: [1, 2, 3]
l1[0] = 44
print(l1) # Output: [44, 2, 3]
print(l2) # Output: [44, 2, 3]
In contrast, when modifying elements of the list l1
, the changes are reflected in l2
as well. This is because both l1
and l2
reference the same memory location, so any changes made to the list through one variable are visible through the other.
Conclusion:
In Python, numbers and strings are immutable, meaning their values cannot be changed. Therefore, the garbage collector delays their collection to optimize memory usage.
Lists, however, are mutable, allowing their values to be changed. Changes to lists are immediately reflected in memory, affecting all references to them. Thus, delayed garbage collection is less applicable to lists.
This distinction arises from the immutability of numbers and strings compared to the mutability of lists in Python.