Python offers four fundamental built-in data structures—list, tuple, set, and dict—each with its own characteristics around ordering, mutability, hashability, dynamic allocation, and performance (Big-O). Mastering these structures helps you:

  • Choose the right container for each problem and write more “Pythonic” code
  • Optimize for lookup, insertion, deletion speed and memory usage
  • Combine them effectively when designing algorithms or complex systems

This article transitions from basic to advanced, includes clear real-world examples, and highlights terminology such as ordered, unordered, mutable, immutable, hashable, over-allocation, and Big-O.


1. List

1.1 Definition & Properties

  • Definition: A list is a dynamic array that is ordered, mutable (items can be added, removed, or changed), and allows duplicates.
  • Dynamic Allocation: Python over-allocates space so that .append() is amortized O(1). When capacity runs out, the list resizes (typically doubling)—that resize is O(n) but rare, so average remains O(1).
  • Index Access: a[i] is O(1).
  • Insertion/Removal at Ends:
    .append() / .pop() → O(1)
    .pop(0) or .insert(i, x) → O(n), because elements must shift.
  • Complexity:
    Dưới đây là bảng bạn yêu cầu, được trình bày lại bằng cú pháp Markdown:

    Operation Complexity
    append(x) O(1) amortized
    pop() O(1)
    pop(0) O(n)
    Index access [i] O(1)
    insert(i, x) O(n)
    x in list O(n)

1.2 Example & Use Cases

# List comprehension and stack example
numbers = [1, 2, 3, 4, 5]
squares = [x*x for x in numbers]  # Pythonic, C-optimized
print(squares)                    # [1, 4, 9, 16, 25]

# Using list as a stack (LIFO)
stack = []
stack.append('first')
stack.append('second')
print(stack.pop())  # 'second'
print(stack)        # ['first']
  • When to use list:
    • Sequential data you iterate in order
    • Building up results dynamically (.append())
    • Implementing simple stacks

1.3 Optimization Tips

  • Prefer list comprehensions over explicit loops for speed.
  • If you need frequent inserts/pops at the front, use collections.deque (O(1) at both ends).
  • For large numeric arrays, consider array.array or NumPy for memory efficiency and vectorized operations.

2. Tuple

2.1 Definition & Properties

  • Definition: A tuple is an immutable, ordered sequence.
  • Hashable: If all its elements are hashable, the tuple itself is hashable—so it can serve as a dict key or a set element.
  • Access Speed: t[i] is O(1). Because it’s immutable, Python can store it in a compact block, often using less memory than a list.

    Operation Complexity
    Index access [i] O(1)
    Traversal O(n)

2.2 Example & Use Cases

# Coordinates as tuple keys in a dict
point = (10, 20)
weather = {(10, 20): 'Sunny', (30, 50): 'Rainy'}
print(weather[point])  # 'Sunny'

# Returning multiple values from a function
def min_max(vals):
    return min(vals), max(vals)

low, high = min_max([3, 1, 9, 4])
print(low, high)       # 1 9
  • When to use tuple:
    • Fixed groups of values (e.g., coordinates)
    • Function returns (tuple unpacking)
    • Dict keys or set elements when immutability is required

3. Set

3.1 Definition & Properties

  • Definition: A set is an unordered, mutable collection of unique elements.
  • Hash Table: Implemented via a hash table, so .add(), .remove(), and membership tests (x in s) are O(1) on average.
  • Ordering: No guaranteed order when iterating.

    Operation Complexity
    .add(x) O(1) avg.
    .remove(x) O(1) avg.
    x in s O(1) avg.
    Traversal O(n)

3.2 Example & Use Cases

# Remove duplicates
nums = [1, 2, 2, 3, 4, 4]
unique = set(nums)
print(unique)  # {1, 2, 3, 4}

# Set operations
A = {'python', 'java', 'c++'}
B = {'python', 'js', 'ruby'}
print(A & B)   # {'python'}
print(A | B)   # {'python','java','c++','js','ruby'}
print(A - B)   # {'java','c++'}

# Two-Sum in O(n)
def two_sum(arr, target):
    seen = set()
    for x in arr:
        if target - x in seen:
            return (target - x, x)
        seen.add(x)
    return None

print(two_sum([2, 7, 11, 15], 9))  # (2, 7)
  • When to use set:
    • Fast membership tests for large collections
    • Eliminate duplicates
    • Mathematical set operations (union, intersection, difference)

3.3 Caveats

  • Consumes more memory than a list (hash table overhead).
  • Elements must be hashable—you cannot store lists or dicts inside a set.

4. Dict

4.1 Definition & Properties

  • Definition: A dict maps keys to values, is ordered (by insertion, Python 3.7+), mutable, and built on a hash table.
  • Key Requirements: Keys must be hashable types (e.g., int, str, tuple).
  • Performance:

    Operation Complexity
    d[key] / d.get(key) O(1) avg.
    d[key] = value / del d[key] O(1) avg.
    Iteration .items() O(n)

4.2 Example & Use Cases

# JSON-like data
person = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "SQL"]
}
print(person["name"])       # Alice
person["city"] = "Hanoi"    # add new key

# Frequency count with Counter
from collections import Counter
fruits = ["apple", "banana", "apple", "orange", "banana"]
freq = Counter(fruits)
print(freq)  # Counter({'apple': 2, 'banana': 2, 'orange': 1})

# Lookup table in Factory pattern
def make_adder(x): return lambda y: x + y
funcs = {
    "add2": make_adder(2),
    "add5": make_adder(5)
}
print(funcs)      # 5
  • When to use dict:
    • Parsing and representing JSON objects
    • Fast key-based lookup (configuration, caches)
    • Adjacency lists for graphs
    • Memoization in algorithms

5. Key Terms & Big-O

| Term                     | Explanation                                                      |
|--------------------------|------------------------------------------------------------------|
| **ordered**              | Maintains insertion order (list, tuple, dict)                   |
| **unordered**            | No guaranteed order on iteration (set)                          |
| **mutable**              | Can be changed after creation (list, set, dict)                 |
| **immutable**            | Cannot be changed (tuple, frozenset)                            |
| **hashable**             | Can be hashed for dict keys or set elements (int, str, tuple)   |
| **dynamic allocation**   | Runtime memory resizing                                        |
| **over-allocation**      | Allocating extra capacity to optimize append amortized O(1)      |
| **Big-O notation**       | Describes time/space complexity (O(1), O(n), O(n log n), etc.)   |

6. Combining Structures & Design Patterns

6.1 Grouping Items

# Group employees by department
employees = [
    {"name":"Alice","dept":"IT"},
    {"name":"Bob","dept":"HR"},
    {"name":"Charlie","dept":"IT"}
]
groups = {}
for emp in employees:
    groups.setdefault(emp["dept"], []).append(emp["name"])
print(groups)
# {'IT': ['Alice', 'Charlie'], 'HR': ['Bob']}
  • Idea: dict of list to categorize data.

6.2 Graph Traversal (BFS)

from collections import deque

graph = {0: [1, 2], 1: [2], 2: [0, 3], 3: []}
visited = {0}
queue = deque([0])
order = []

while queue:
    node = queue.popleft()
    order.append(node)
    for nbr in graph[node]:
        if nbr not in visited:
            visited.add(nbr)
            queue.append(nbr)

print("BFS order:", order)  # BFS order: [0, 1, 2, 3]
  • Dict for adjacency list, deque for O(1) queue ops, set for O(1) “visited” checks.

Conclusion

Understanding list, tuple, set, and dict, along with terms like ordered, mutable, hashable, and dynamic allocation, empowers you to:

  1. Select the optimal structure for each task
  2. Optimize for speed and memory (Big-O)
  3. Combine containers creatively in algorithms and system design

💬 Bình luận