Table of Contents
- • What is NumPy?
- • Python Lists vs. NumPy Arrays
- • NumPy Array Creation
- • Special Types of Arrays
- • Creating Random Valued Arrays
- • NumPy Data Types
- • Shape and Reshape in NumPy
- • NumPy Arithmetic Operations
- • Broadcasting with NumPy Arrays
- • Indexing & Slicing
- • NumPy Array Iterating
- • The Difference Between Copy and View
- • Join & Split Function
- • Search, Sort, Search Sorted, Filter Functions
- • Shuffle, Unique, Resize, Flatten, Ravel Functions
- • Insert and Delete Function
- • NumPy Matrix
- • Matrix Function
What is NumPy?
Have you ever tried to work with lots of numbers in Python? Maybe you wanted to analyze data, perform calculations on thousands of values, or build a machine learning model. If you’ve used regular Python lists for this, you probably noticed something frustrating: they’re slow. Really slow.
The Problem We Face
Imagine you need to add 1 to every number in a list of one million numbers. With regular Python lists, you have to loop through each number one by one. Python checks the type of each element, allocates memory separately, and processes everything step by step. This takes a lot of time and memory.
Why does this happen? Python is designed to be flexible and easy to use, but that flexibility comes at a cost. Lists can hold any type of data—numbers, strings, objects, even other lists—all mixed together. This means Python can’t make assumptions about what’s inside. It has to check everything carefully, which slows things down.
What did people do before NumPy?
Before NumPy, people working with numerical data had a tough choice: either write slow Python code with loops, or learn a completely different language like C or FORTRAN to get better performance. Neither option was great for data scientists and machine learning engineers who wanted to focus on solving problems, not fighting with slow code.
So how can we solve this problem in a better way?
This is where NumPy comes in. NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides a powerful data structure called an array that’s specifically designed for numerical operations. Unlike Python lists, NumPy arrays:
- Store all elements of the same type together in a continuous block of memory
- Use optimized C code under the hood for lightning-fast operations
- Allow you to perform calculations on entire arrays at once without loops
- Use less memory than regular Python lists
Think of it this way: if Python lists are like a parking lot where any vehicle can park anywhere, NumPy arrays are like a train—all the cars are connected, the same size, and they all move together efficiently.
NumPy is essential for anyone working in data science, machine learning, or scientific computing. Almost every modern data analysis library (like Pandas, SciPy, TensorFlow, and scikit-learn) builds on top of NumPy. Learning NumPy is your gateway to the entire Python data science ecosystem.
Python Lists vs. NumPy Arrays
Before we dive into NumPy arrays, let’s understand why we need them in the first place.
The Problem with Python Lists
Python lists are incredibly flexible. You can store anything in them:
mixed_list = [1, "hello", 3.14, True, [1, 2, 3]]
This flexibility is great for general programming, but it creates serious problems when working with numerical data:
1. Memory Inefficiency: Each element in a list is a separate Python object with its own memory overhead. For a list of one million numbers, this means one million separate objects!
2. Slow Performance: When you want to add two lists element by element, you need to write a loop, and Python has to check the type of each element during every operation.
3. No Built-in Mathematical Operations: Want to multiply every number in a list by 2? You need to write a loop. Want to find the average? Loop again. This gets tedious fast.
Why do these problems happen?
Python lists prioritize flexibility over speed. They’re designed to be general-purpose containers, not specialized numerical computing tools. When Python processes a list, it doesn’t know what types of data are inside until it checks each element individually.
The NumPy Solution
NumPy arrays solve these problems by making a smart trade-off: they sacrifice flexibility for incredible speed and efficiency. Here’s how:
- Homogeneous Data: All elements must be the same type (all integers, or all floats, etc.)
- Continuous Memory: All elements are stored together in one block of memory
- Vectorized Operations: You can perform operations on entire arrays at once
- Optimized C Code: The heavy lifting happens in fast, compiled C code
The result? NumPy arrays can be 10 to 100 times faster than Python lists for numerical operations, and they use much less memory.
Both lists and NumPy arrays can store collections of data, but NumPy arrays are purpose-built for numerical computing. When you’re working with numbers—especially lots of them—NumPy arrays are the better choice.
NumPy Array Creation
Now that we understand why NumPy arrays are useful, let’s learn how to create them.
Creating Arrays from Python Lists
The most straightforward way to create a NumPy array is to convert a Python list:
import numpy as np
my_list = [1, 2, 3, 4, 5]
myarray = np.array(mylist)
print(my_array)
Creating Multi-dimensional Arrays
Real-world data often has more than one dimension. Think of a spreadsheet with rows and columns, or an image with height, width, and color channels. NumPy handles multi-dimensional arrays easily by using nested lists:
# 2D Array
twodlist = [[1, 2, 3], [4, 5, 6]]
twodarray = np.array(twodlist)
print(twodarray)
# 3D Array
threedlist = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
threedarray = np.array(threedlist)
print(threedarray)
A 2D array is like a table with rows and columns. A 3D array is like a stack of tables. You can think of it as having depth in addition to height and width.
Special Types of Arrays
The Problem of Initialization
When you’re starting a calculation, you often need arrays filled with specific values. Imagine you’re building a machine learning model that needs to start with all parameters set to zero, or you need a grid of ones for some mathematical operation.
Without NumPy’s built-in functions, you’d have to manually create lists with loops, filling in every value one by one. This is tedious and error-prone.
How do we solve this efficiently?
NumPy provides several convenient functions to create arrays pre-filled with specific values:
Arrays Filled with Zeros
zeros_array = np.zeros((3, 4)) # 3 rows, 4 columns of zeros
print(zeros_array)
This is perfect for initializing arrays when you plan to fill them with calculated values later.
Arrays Filled with Ones
ones_array = np.ones((2, 3)) # 2 rows, 3 columns of ones
print(ones_array)
Useful when you need a starting point of all ones, or when you need to multiply arrays by 1.
Arrays Filled with a Specific Value
full_array = np.full((2, 2), 7) # 2x2 array filled with 7s
print(full_array)
This lets you fill an array with any value you choose.
Empty Arrays (Uninitialized)
empty_array = np.empty((2, 2))
print(empty_array)
This creates an array without setting the values, which is slightly faster. The values will be whatever was already in that memory location (essentially random). Use this only when you know you’ll overwrite all values immediately.
Arrays with Evenly Spaced Values
The Range Problem: Sometimes you need a sequence of numbers—maybe from 0 to 100 in steps of 5, or from 1 to 1000. Creating this manually would require writing loops or typing out long lists.
arange_array = np.arange(0, 10, 2) # Start, Stop, Step
print(arange_array) # Output: [0 2 4 6 8]
np.arange() works just like Python’s built-in range(), but returns a NumPy array.
Arrays with a Specific Number of Points
What if you need exactly 5 evenly spaced numbers between 0 and 1? You’d have to calculate the spacing manually and then create each value.
linspace_array = np.linspace(0, 1, 5) # Start, Stop, Number of points
print(linspace_array) # Output: [0. 0.25 0.5 0.75 1. ]
np.linspace() divides the range evenly and gives you exactly the number of points you need. This is especially useful in scientific computing and plotting.
Identity Matrices
In linear algebra, an identity matrix is a special matrix with 1s on the diagonal and 0s everywhere else. It’s the matrix equivalent of the number 1 in multiplication.
identity_matrix = np.eye(3) # 3x3 identity matrix
print(identity_matrix)
This creates a square matrix that’s essential for many mathematical operations.
Creating Random Valued Arrays
The Need for Randomness
In data science and machine learning, randomness is everywhere. You might need random numbers to:
- Initialize neural network weights
- Create test data
- Simulate random processes
- Sample from datasets
Without NumPy, generating arrays of random numbers would require loops and multiple function calls.
NumPy’s Random Solutions
NumPy’s random module provides efficient ways to create arrays filled with random values:
Uniform Random Distribution
rand_array = np.random.rand(2, 3) # 2x3 array with random values
print(rand_array)
This creates random numbers uniformly distributed between 0 and 1. Each number has an equal chance of being selected.
Normal (Gaussian) Distribution
randn_array = np.random.randn(2, 3) # 2x3 array with random values from normal distribution
print(randn_array)
This draws numbers from a “bell curve” distribution (mean of 0, standard deviation of 1). Most numbers cluster around 0, with fewer numbers further away. This is extremely common in statistics and machine learning.
Random Integers
randint_array = np.random.randint(1, 10, size=(2, 3)) # 2x3 array with random integers between 1 and 9
print(randint_array)
This generates random whole numbers in a specified range, perfect for simulations or creating random indices.
NumPy Data Types
The Type Problem
Computers store different types of numbers in different ways. An integer takes less memory than a decimal number. A 32-bit float is less precise than a 64-bit float. Sometimes you need control over exactly how your numbers are stored.
Why does this matter? Memory usage and precision. If you’re working with millions of numbers and only need whole numbers, using floating-point storage wastes memory. If you’re doing scientific calculations, you might need maximum precision.
Understanding NumPy Data Types
NumPy allows you to specify and convert data types precisely. The dtype attribute shows what type your array elements are:
Checking Data Type
arr = np.array([1, 2, 3])
print(arr.dtype) # Output: int32 or int64 depending on system
Specifying Data Type During Creation
arr_float = np.array([1, 2, 3], dtype='float64')
print(arr_float.dtype) # Output: float64
print(arr_float) # Output: [1. 2. 3.]
Notice the decimal points? Those indicate these are floating-point numbers, not integers.
Converting Data Types
Sometimes you have an array of one type but need another. The astype() method converts arrays to a different type:
arr_int = np.array([1.1, 2.2, 3.3])
arrconvertedint = arr_int.astype('int32')
print(arrconvertedint.dtype) # Output: int32
print(arrconvertedint) # Output: [1 2 3]
# Converting to boolean
arr_bool = np.array([0, 1, 2])
arrconvertedbool = arr_bool.astype(bool)
print(arrconvertedbool) # Output: [False True True]
When converting floats to integers, NumPy simply drops the decimal part (it doesn’t round). For booleans, 0 becomes False and any non-zero number becomes True.
Common data types include different bit sizes (int8, int16, int32, int64) for integers and (float32, float64) for decimals. The number indicates how many bits of memory each element uses—larger numbers mean more range or precision but also more memory usage.
Shape and Reshape in NumPy
The Dimension Problem
Data comes in different shapes. A single list of numbers is one-dimensional. A table is two-dimensional. A color image is three-dimensional (height, width, and color channels). You often need to know these dimensions or change them.
Why would you change shape? Maybe you have 12 numbers in a line but need to arrange them as a 3×4 table for matrix multiplication. Or you’re feeding data into a machine learning model that expects a specific shape.
Understanding Shape
The shape attribute tells you the size of each dimension:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # Output: (2, 3) (2 rows, 3 columns)
For multi-dimensional arrays, shape identifies all dimensions:
arrmultidim = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
print(arrmultidim.shape) # Output: (2, 2, 2)
This is a 3D array with 2 layers, 2 rows per layer, and 2 columns per row.
Reshaping Arrays
The reshape() function changes an array’s shape without changing its data. Think of it like rearranging furniture in a room—same furniture, different layout:
arroned = np.array([1, 2, 3, 4, 5, 6])
reshapedarr = arrone_d.reshape((2, 3)) # Reshape to 2 rows, 3 columns
print(reshaped_arr)
# Output:
# [[1 2 3]
# [4 5 6]]
# Reshape to 3D
reshaped3d = arrone_d.reshape((2, 1, 3)) # 2 blocks, 1 row per block, 3 columns per row
print(reshaped_3d)
# Output:
# [[[1 2 3]]
# [[4 5 6]]]
Important rule: The total number of elements must stay the same. You can’t reshape 6 elements into a 2×4 array (which needs 8 elements).
Automatic Dimension Calculation
Sometimes you know one dimension but want NumPy to calculate the other. Use -1 for the dimension you want calculated automatically:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
reshaped_auto = arr.reshape(2, -1) # 2 rows, columns calculated automatically
print(reshaped_auto)
# Output:
# [[1 2 3 4]
# [5 6 7 8]]
NumPy figures out that 8 elements arranged in 2 rows means 4 columns per row.
NumPy Arithmetic Operations
The Loop Problem
Imagine you have 10,000 numbers and you want to add 5 to each one. With regular Python lists, you’d write:
result = []
for num in my_list:
result.append(num + 5)
This is slow, requires explicit loops, and doesn’t look like the mathematical operation you’re trying to perform.
What’s the better way?
NumPy lets you perform operations on entire arrays at once, a feature called “vectorization.” This is faster and more readable:
Scalar Operations
Adding, subtracting, multiplying, or dividing an array by a single number:
arr = np.array([1, 2, 3, 4])
result_add = arr + 3
print(result_add) # Output: [4 5 6 7]
result_multiply = arr * 2
print(result_multiply) # Output: [2 4 6 8]
The operation applies to every element automatically. No loops needed!
Array-to-Array Operations
Performing operations between two arrays of the same shape:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result_sum = arr1 + arr2
print(result_sum) # Output: [5 7 9]
result_product = arr1 * arr2
print(result_product) # Output: [4 10 18]
Each element in the first array combines with the corresponding element in the second array. Element 0 with element 0, element 1 with element 1, and so on.
Built-in Mathematical Functions
NumPy includes many functions for common calculations:
arr = np.array([1, 5, 2, 8])
print(np.sum(arr)) # Output: 16
print(np.min(arr)) # Output: 1
print(np.max(arr)) # Output: 8
These are much faster than writing your own loops, and they work on multi-dimensional arrays too.
Statistical Functions
NumPy makes statistical calculations simple:
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr)) # Output: 3.0
print(np.median(arr)) # Output: 3.0
print(np.std(arr)) # Output: 1.4142135623730951
print(np.var(arr)) # Output: 2.0
- Mean: The average value
- Median: The middle value when sorted
- Standard Deviation: How spread out the values are
- Variance: The square of standard deviation
These functions are essential for data analysis and machine learning.
Broadcasting with NumPy Arrays
The Shape Mismatch Problem
What if you want to add arrays of different shapes? For example, adding a 1D array to each row of a 2D array? With regular lists, you’d need nested loops and careful indexing.
How did people handle this before broadcasting? They wrote complex loop structures, being very careful about which dimension was which. It was error-prone and slow.
What is Broadcasting?
Broadcasting is NumPy’s clever way of performing operations on arrays of different shapes. It automatically “stretches” smaller arrays to match the shape of larger ones, but without actually copying data in memory.
Scalar Broadcasting
The simplest case—a single number is broadcast across an entire array:
arr = np.array([1, 2, 3])
result = arr + 5 # 5 is broadcast to [5, 5, 5]
print(result) # Output: [6 7 8]
NumPy mentally expands the scalar 5 to match the array’s shape.
Array Broadcasting Rules
Broadcasting follows specific rules. Dimensions are compared starting from the trailing (rightmost) dimension. Two dimensions are compatible when:
- They are equal, OR
- One of them is 1
Let’s see this in action:
# Example 1: (1, 3) + (3,) -> compatible
arr1 = np.array([[1, 2, 3]]) # shape (1, 3)
arr2 = np.array([10, 20, 30]) # shape (3,)
result = arr1 + arr2
print(result)
# Output: [[11 22 33]]
# Example 2: (2, 1) + (1, 3) -> compatible
arr1 = np.array([[10], [20]]) # shape (2, 1)
arr2 = np.array([[1, 2, 3]]) # shape (1, 3)
result = arr1 + arr2
print(result)
# Output:
# [[11 12 13]
# [21 22 23]]
In Example 2, NumPy broadcasts arr1’s single column across 3 columns, and broadcasts arr2’s single row across 2 rows. The result is a 2×3 array where every combination is computed.
Broadcasting is incredibly powerful because it lets you write clean, efficient code without explicit loops, and NumPy handles the complexity behind the scenes.
Indexing and Slicing
The Access Problem
When working with arrays, you often need to access specific elements or groups of elements. Maybe you want the first element, the last element, every other element, or a specific section of a multi-dimensional array.
With nested lists, accessing multi-dimensional data requires multiple square brackets and can get confusing quickly.
One-dimensional Indexing
Accessing individual elements using their position (index):
arr = np.array([10, 20, 30, 40, 50])
print(arr[0]) # Output: 10 (first element)
print(arr[-1]) # Output: 50 (last element)
Indices start at 0, and negative indices count from the end.
Multi-dimensional Indexing
For 2D and higher dimensional arrays, use comma-separated indices:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d[0, 1]) # Output: 2 (row 0, column 1)
print(arr_2d[1, 2]) # Output: 6 (row 1, column 2)
Think of it as [row, column] for 2D arrays. The first index selects the row, the second selects the column.
Slicing
Slicing extracts a range of elements using the syntax [start:end:step]:
arr = np.array([10, 20, 30, 40, 50, 60, 70])
print(arr[1:5]) # Output: [20 30 40 50] (elements from index 1 to 4)
print(arr[::2]) # Output: [10 30 50 70] (every second element)
print(arr[:3]) # Output: [10 20 30] (elements from start to index 2)
print(arr[4:]) # Output: [50 60 70] (elements from index 4 to end)
Key points about slicing:
- The start index is included
- The end index is excluded
- Omitting start means “from the beginning”
- Omitting end means “to the end”
- Step determines how many elements to skip
Multi-dimensional Slicing
You can slice each dimension separately:
arr_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(arr_2d[0:2, 1:3]) # Rows 0 and 1, columns 1 and 2
# Output:
# [[ 2 3]
# [ 6 7]]
# Accessing entire rows/columns
print(arr_2d[0, :]) # First row
print(arr_2d[:, 1]) # Second column
The colon : by itself means “all elements in this dimension.”
NumPy Array Iterating
The Iteration Challenge
Sometimes you need to process each element in an array individually. With multi-dimensional arrays, this typically requires nested loops—one loop for each dimension. This gets messy and hard to read.
Basic Iteration (1D Array)
For simple 1D arrays, iteration works like Python lists:
arr = np.array([1, 2, 3])
for x in arr:
print(x)
Iteration in Multi-dimensional Arrays
When you iterate over a multi-dimensional array, you iterate over the first dimension:
arr_2d = np.array([[1, 2], [3, 4]])
for row in arr_2d:
print(row)
# Output:
# [1 2]
# [3 4]
To access individual elements, you need nested loops:
for row in arr_2d:
for element in row:
print(element)
The Better Way: np.nditer()
For complex multi-dimensional arrays, especially 3D and higher, nested loops become difficult to manage. NumPy provides np.nditer() for more efficient iteration:
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
for x in np.nditer(arr_3d):
print(x)
This automatically handles all dimensions and visits every element without requiring you to write nested loops.
Iterating with Index
Sometimes you need both the element and its position. Use ndenumerate():
arr = np.array([10, 20, 30])
for idx, x in np.ndenumerate(arr):
print(f"Index: {idx}, Value: {x}")
# Output:
# Index: (0,), Value: 10
# Index: (1,), Value: 20
# Index: (2,), Value: 30
The index is returned as a tuple, which is especially useful for multi-dimensional arrays.
The Difference Between Copy and View
The Hidden Reference Problem
Here’s a dangerous scenario: You create a new variable from an existing array, modify the new variable, and suddenly your original array changes too! This can create hard-to-find bugs.
Why does this happen? In programming, when you create a new variable, you might create a new copy of the data, or you might just create a new reference (pointer) to the same data. Understanding the difference is crucial.
What is a Copy?
A copy creates a completely new, independent array with its own data in memory:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
x[0] = 42 # Change the copy
print(arr) # Original array is unchanged: [1 2 3 4 5]
print(x) # Copy is changed: [42 2 3 4 5]
Changes to the copy don’t affect the original. They’re separate entities.
What is a View?
A view is a new array object that looks at the same data as the original array. It’s like two different windows looking into the same room:
arr = np.array([1, 2, 3, 4, 5])
y = arr.view()
y[0] = 42 # Change the view
print(arr) # Original array is changed: [42 2 3 4 5]
print(y) # View is changed: [42 2 3 4 5]
Changes to the view affect the original, and vice versa. They share the same underlying data.
When to use each?
- Use copy when you need an independent array that won’t affect the original
- Use view when you want to save memory and it’s okay for changes to affect the original
- Views are faster and more memory-efficient, but copies are safer
Join and Split Functions
The Combination Problem
Often you have data split across multiple arrays that you need to combine. Or you have one large array that you need to split into smaller pieces for processing.
Without NumPy’s built-in functions, you’d have to manually iterate through arrays, copy elements, and manage indices—tedious and error-prone.
Joining Arrays
Using np.concatenate()
Joins arrays along an existing axis:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = np.concatenate((arr1, arr2))
print(result) # Output: [1 2 3 4 5 6]
# For 2D arrays, specify axis
arr_a = np.array([[1, 2], [3, 4]])
arr_b = np.array([[5, 6], [7, 8]])
resultaxis0 = np.concatenate((arra, arr_b), axis=0) # Join row-wise
print(result_axis0)
# Output:
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
resultaxis1 = np.concatenate((arra, arr_b), axis=1) # Join column-wise
print(result_axis1)
# Output:
# [[1 2 5 6]
# [3 4 7 8]]
axis=0: Stack vertically (add more rows)axis=1: Stack horizontally (add more columns)
Using np.stack()
Joins arrays along a new axis:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result_stack = np.stack((arr1, arr2), axis=0) # Stack along a new 0-axis
print(result_stack)
# Output:
# [[1 2 3]
# [4 5 6]]
Stack creates a new dimension, turning two 1D arrays into one 2D array.
Using np.hstack()
Stacks arrays horizontally (column-wise):
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result_hstack = np.hstack((arr1, arr2))
print(result_hstack) # Output: [1 2 3 4 5 6]
Using np.vstack()
Stacks arrays vertically (row-wise):
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result_vstack = np.vstack((arr1, arr2))
print(result_vstack)
# Output:
# [[1 2 3]
# [4 5 6]]
Using np.dstack()
Stacks arrays depth-wise (along third axis):
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result_dstack = np.dstack((arr1, arr2))
print(result_dstack)
# Output:
# [[[1 4]
# [2 5]
# [3 6]]]
This is useful for combining different channels of data, like RGB color channels in images.
Splitting Arrays
Using np.split()
Splits an array into multiple sub-arrays:
arr = np.array([10, 20, 30, 40, 50, 60])
new_arrays = np.split(arr, 3) # Split into 3 equal parts
print(new_arrays) # Output: [array([10, 20]), array([30, 40]), array([50, 60])]
# Splitting at specific indices
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
newarraysindices = np.split(arr, [2, 5]) # Split before index 2 and before index 5
print(newarraysindices) # Output: [array([1, 2]), array([3, 4, 5]), array([6, 7, 8])]
You can split evenly (by specifying number of sections) or at specific positions (by providing a list of indices).
Other Split Functions
np.hsplit(): Splits array horizontally (column-wise)np.vsplit(): Splits array vertically (row-wise)np.dsplit(): Splits array depth-wise
These specialized functions make code more readable when working with specific dimensions.
Search, Sort, and Filter Functions
The Data Finding Problem
When working with large arrays, you often need to:
- Find where specific values are located
- Organize data in order
- Extract elements that meet certain conditions
Doing these operations manually with loops is slow and cumbersome.
Searching with np.where()
Finds indices where a condition is met:
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x) # Output: (array([3, 5, 6]),) (indices where value is 4)
# Find even numbers
even_indices = np.where(arr % 2 == 0)
print(even_indices) # Output: (array([1, 3, 5, 6]),)
This returns a tuple containing an array of indices. You can use any condition—equality, greater than, less than, or more complex conditions.
Sorting with np.sort()
Returns a sorted copy of an array:
arr = np.array([3, 2, 0, 1])
sorted_arr = np.sort(arr)
print(sorted_arr) # Output: [0 1 2 3]
arr_2d = np.array([[3, 2, 1], [6, 5, 4]])
sortedrows = np.sort(arr2d) # Sorts each row
print(sorted_rows)
# Output:
# [[1 2 3]
# [4 5 6]]
sortedcols = np.sort(arr2d, axis=0) # Sorts each column
print(sorted_cols)
# Output:
# [[3 2 1]
# [6 5 4]] # (In this specific case, 3 and 6 are already sorted)
By default, np.sort() sorts each row. Use axis=0 to sort columns instead.
Search Sorted with np.searchsorted()
Finds where a value should be inserted to maintain sorted order:
arr = np.array([1, 3, 5, 7])
i = np.searchsorted(arr, 4)
print(i) # Output: 2 (4 would be inserted at index 2 to maintain order)
This is useful for efficiently finding insertion points in sorted arrays.
Filtering with Boolean Indexing
Selecting elements based on conditions:
arr = np.array([41, 42, 43, 44])
filter_arr = arr > 42 # Creates a boolean array
newarr = arr[filterarr]
print(filter_arr) # Output: [False False True True]
print(new_arr) # Output: [43 44]
The condition creates a boolean array (True where condition is met, False otherwise). Using this boolean array as an index extracts only the True elements.
This is incredibly powerful for data analysis—you can filter data based on any condition you can express.
Shuffle, Unique, Resize, Flatten, and Ravel Functions
These functions help you manipulate and reorganize array data in various ways.
Shuffle with np.random.shuffle()
Randomly rearranges array elements in-place:
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
print(arr) # Output: (randomly shuffled array, e.g., [3 5 1 4 2])
This is useful for randomizing data order, such as shuffling a dataset before training a machine learning model. Note that this modifies the original array.
Unique with np.unique()
Returns the unique elements of an array (removes duplicates):
arr = np.array([1, 1, 2, 3, 2, 4, 5, 5])
unique_arr = np.unique(arr)
print(unique_arr) # Output: [1 2 3 4 5]
The result is sorted. This is extremely useful for finding distinct values in your data.
Resize with np.resize()
Returns a new array with the specified shape. If the new size is larger, it repeats the original array:
arr = np.array([1, 2, 3])
resized_arr = np.resize(arr, (2, 3)) # Resize to 2x3
print(resized_arr)
# Output:
# [[1 2 3]
# [1 2 3]]
Unlike reshape(), resize() can change the total number of elements by repeating values.
Flatten with .flatten()
Returns a copy of the array collapsed into one dimension:
arr_2d = np.array([[1, 2], [3, 4]])
flattenedarr = arr2d.flatten()
print(flattened_arr) # Output: [1 2 3 4]
This creates a 1D version of any multi-dimensional array. Always creates a copy of the data.
Ravel with .ravel()
Returns a flattened view of the array (when possible):
arr_2d = np.array([[1, 2], [3, 4]])
raveledarr = arr2d.ravel()
print(raveled_arr) # Output: [1 2 3 4]
Key difference: flatten() always returns a copy, while ravel() returns a view when possible. This makes ravel() more memory-efficient, but changes to the raveled array might affect the original.
Insert and Delete Functions
The Modification Problem
Sometimes you need to add elements to an array at specific positions, or remove elements. With regular Python lists, you can use .insert() and .remove(), but NumPy arrays are fixed in size.
How does NumPy handle this? It creates new arrays with the modifications. The original array remains unchanged.
Insert with np.insert()
Inserts values before given indices:
arr = np.array([1, 2, 3, 4])
new_arr = np.insert(arr, 2, 99) # Insert 99 at index 2
print(new_arr) # Output: [ 1 2 99 3 4]
# Insert multiple values
newarrmultiple = np.insert(arr, [1, 3], [10, 20])
print(newarrmultiple) # Output: [ 1 10 2 3 20 4]
The first argument is the array, the second is the index (or indices), and the third is the value(s) to insert.
Delete with np.delete()
Returns a new array with specified elements removed:
arr = np.array([1, 2, 3, 4, 5])
new_arr = np.delete(arr, 2) # Delete element at index 2
print(new_arr) # Output: [1 2 4 5]
# Delete multiple elements
newarrmultiple = np.delete(arr, [0, 4]) # Delete elements at index 0 and 4
print(newarrmultiple) # Output: [2 3 4]
Both functions return new arrays—the original array is never modified.
NumPy Matrix
Understanding Matrices
A matrix is a specialized type of 2D array used extensively in mathematics, especially linear algebra. While you can use regular 2D NumPy arrays for matrix operations, NumPy also provides a matrix class with some convenient features.
Matrix Creation
matrix = np.matrix([[1, 2], [3, 4]])
print(matrix)
# Output:
# [[1 2]
# [3 4]]
Matrix Multiplication
In linear algebra, matrix multiplication is different from element-wise multiplication. When you multiply two matrices, you compute the dot product of rows and columns:
mat1 = np.matrix([[1, 2], [3, 4]])
mat2 = np.matrix([[5, 6], [7, 8]])
result_mult = mat1 * mat2 # Standard matrix multiplication
print(result_mult)
# Output:
# [[19 22]
# [43 50]]
How is this calculated?
- First row, first column: (1×5) + (2×7) = 5 + 14 = 19
- First row, second column: (1×6) + (2×8) = 6 + 16 = 22
- Second row, first column: (3×5) + (4×7) = 15 + 28 = 43
- Second row, second column: (3×6) + (4×8) = 18 + 32 = 50
This is the fundamental operation of linear algebra, used everywhere from computer graphics to machine learning.
Matrix Functions
Transpose
Transposing swaps rows and columns—the first row becomes the first column, the second row becomes the second column, and so on:
matrix = np.matrix([[1, 2, 3], [4, 5, 6]])
transposed_matrix = matrix.T # Using the .T attribute
print(transposed_matrix)
# Output:
# [[1 4]
# [2 5]
# [3 6]]
transposed_func = np.transpose(matrix) # Using the transpose function
print(transposed_func)
# Output:
# [[1 4]
# [2 5]
# [3 6]]
Both methods produce the same result. Transposing is essential in many mathematical operations, particularly when you need to align dimensions for matrix multiplication.
Conclusion
You’ve now learned the fundamentals of NumPy, Python’s essential library for numerical computing. You’ve discovered:
- Why NumPy arrays are faster and more efficient than Python lists
- How to create arrays in multiple ways
- How to work with different data types and shapes
- How to perform mathematical operations efficiently
- How broadcasting enables operations on arrays of different shapes
- How to access, modify, and reorganize array data
- How to work with matrices for linear algebra
NumPy is the foundation of the entire scientific Python ecosystem. Libraries like Pandas, SciPy, scikit-learn, TensorFlow, and PyTorch all build on NumPy. By mastering NumPy, you’ve taken your first major step into data science, machine learning, and scientific computing.
The best way to truly learn NumPy is through practice. Try creating your own arrays, performing calculations, and exploring the patterns in your data. Experiment with different functions and see what they do. As you work on real projects, you’ll find NumPy becomes an indispensable tool in your programming toolkit.
Happy coding!