When using a new library such as numpy, you can improve you developer effectiveness by being familiar with the most common questions that come up when using the python numpy library. Using the Stack Overflow Data Explorer tool, we’ve determined the top 10 most popular numpy questions & answers by daily views on Stack Overflow to to be familiar with. Check out the top 10 numpy questions & answers below:

#### Looking to get a head start on your next software interview? Pickup a copy of the best book to prepare: Cracking The Coding Interview!

## 1. Unable to allocate array with shape and data type?

This is likely due to your system’s overcommit handling mode.

In the default mode, `0`

,

Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. The root is allowed to allocate slightly more memory in this mode. This is the default.

The exact heuristic used is not well explained here, but this is discussed more on Linux over commit heuristic and on this page.

You can check your current overcommit mode by running

```
$ cat /proc/sys/vm/overcommit_memory
0
```

In this case, you’re allocating

```
>>> 156816 * 36 * 53806 / 1024.0**3
282.8939827680588
```

~282 GB and the kernel is saying well obviously there’s no way I’m going to be able to commit that many physical pages to this, and it refuses the allocation.

If (as root) you run:

```
$ echo 1 > /proc/sys/vm/overcommit_memory
```

This will enable the "always overcommit" mode, and you’ll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).

I tested this myself on a machine with 32 GB of RAM. With overcommit mode `0`

I also got a `MemoryError`

, but after changing it back to `1`

it works:

```
>>> import numpy as np
>>> a = np.zeros((156816, 36, 53806), dtype='uint8')
>>> a.nbytes
303755101056
```

You can then go ahead and write to any location within the array, and the system will only allocate physical pages when you explicitly write to that page. So you can use this, with care, for sparse arrays.

## 2. Valueerror: numpy.ndarray size changed, may indicate binary incompatibility. expected 88 from c header, got 80 from pyobject?

I’m in Python 3.8.5. It sounds too simple to be real, but I had this same issue and all I did was reinstall numpy. Gone.

```
pip install --upgrade numpy
```

or

```
pip uninstall numpy
pip install numpy
```

## 3. Convert pandas dataframe to numpy array?

`df.to_numpy()`

is better than `df.values`

, here’s why.^{*}

It’s time to deprecate your usage of `values`

and `as_matrix()`

.

pandas `v0.24.0`

introduced two new methods for obtaining NumPy arrays from pandas objects:

, which is defined on`to_numpy()`

`Index`

,`Series`

, and`DataFrame`

objects, and, which is defined on`array`

`Index`

and`Series`

objects only.

If you visit the v0.24 docs for `.values`

, you will see a big red warning that says:

## Warning: We recommend using

`DataFrame.to_numpy()`

instead.

See this section of the v0.24.0 release notes, and this answer for more information.

_{* – to_numpy() is my recommended method for any production code that needs to run reliably for many versions into the future. However if you’re just making a scratchpad in jupyter or the terminal, using .values to save a few milliseconds of typing is a permissable exception. You can always add the fit n finish later.}

**Towards Better Consistency: **`to_numpy()`

`to_numpy()`

In the spirit of better consistency throughout the API, a new method `to_numpy`

has been introduced to extract the underlying NumPy array from DataFrames.

```
# Setup
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]},
index=['a', 'b', 'c'])
# Convert the entire DataFrame
df.to_numpy()
# array([[1, 4, 7],
# [2, 5, 8],
# [3, 6, 9]])
# Convert specific columns
df[['A', 'C']].to_numpy()
# array([[1, 7],
# [2, 8],
# [3, 9]])
```

As mentioned above, this method is also defined on `Index`

and `Series`

objects (see here).

```
df.index.to_numpy()
# array(['a', 'b', 'c'], dtype=object)
df['A'].to_numpy()
# array([1, 2, 3])
```

By default, a view is returned, so any modifications made will affect the original.

```
v = df.to_numpy()
v[0, 0] = -1
df
A B C
a -1 4 7
b 2 5 8
c 3 6 9
```

If you need a copy instead, use `to_numpy(copy=True)`

.

### pandas >= 1.0 update for ExtensionTypes

If you’re using pandas 1.x, chances are you’ll be dealing with extension types a lot more. You’ll have to be a little more careful that these extension types are correctly converted.

```
a = pd.array([1, 2, None], dtype="Int64")
a
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64
# Wrong
a.to_numpy()
# array([1, 2, <NA>], dtype=object) # yuck, objects
# Correct
a.to_numpy(dtype='float', na_value=np.nan)
# array([ 1., 2., nan])
# Also correct
a.to_numpy(dtype='int', na_value=-1)
# array([ 1, 2, -1])
```

This is called out in the docs.

### If you need the `dtypes`

in the result…

As shown in another answer, `DataFrame.to_records`

is a good way to do this.

```
df.to_records()
# rec.array([('a', 1, 4, 7), ('b', 2, 5, 8), ('c', 3, 6, 9)],
# dtype=[('index', 'O'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8')])
```

This cannot be done with `to_numpy`

, unfortunately. However, as an alternative, you can use `np.rec.fromrecords`

:

```
v = df.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())
# rec.array([('a', 1, 4, 7), ('b', 2, 5, 8), ('c', 3, 6, 9)],
# dtype=[('index', '<U1'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8')])
```

Performance wise, it’s nearly the same (actually, using `rec.fromrecords`

is a bit faster).

```
df2 = pd.concat([df] * 10000)
%timeit df2.to_records()
%%timeit
v = df2.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())
12.9 ms Â± 511 Âµs per loop (mean Â± std. dev. of 7 runs, 100 loops each)
9.56 ms Â± 291 Âµs per loop (mean Â± std. dev. of 7 runs, 100 loops each)
```

**Rationale for Adding a New Method**

`to_numpy()`

(in addition to `array`

) was added as a result of discussions under two GitHub issues GH19954 and GH23623.

Specifically, the docs mention the rationale:

[…] with

`.values`

it was unclear whether the returned value would be the

actual array, some transformation of it, or one of pandas custom

arrays (like`Categorical`

). For example, with`PeriodIndex`

,`.values`

generates a new`ndarray`

of period objects each time. […]

`to_numpy`

aims to improve the consistency of the API, which is a major step in the right direction. `.values`

will not be deprecated in the current version, but I expect this may happen at some point in the future, so I would urge users to migrate towards the newer API, as soon as you can.

**Critique of Other Solutions**

`DataFrame.values`

has inconsistent behaviour, as already noted.

`DataFrame.get_values()`

is simply a wrapper around `DataFrame.values`

, so everything said above applies.

`DataFrame.as_matrix()`

is deprecated now, do **NOT** use!

## 4. Creating a pandas dataframe from a numpy array: how do i specify the index column and column headers?

You need to specify `data`

, `index`

and `columns`

to `DataFrame`

constructor, as in:

```
>>> pd.DataFrame(data=data[1:,1:], # values
... index=data[1:,0], # 1st column as index
... columns=data[0,1:]) # 1st row as the column names
```

**edit**: as in the @joris comment, you may need to change above to `np.int_(data[1:,1:])`

to have correct data type.

## 5. Most efficient way to map function over numpy array?

I’ve tested all suggested methods plus `np.array(map(f, x))`

with `perfplot`

(a small project of mine).

Message #1: If you can use numpy’s native functions, do that.

If the function you’re trying to vectorize already *is* vectorized (like the `x**2`

example in the original post), using that is *much* faster than anything else (note the log scale):

If you actually need vectorization, it doesn’t really matter much which variant you use.

Code to reproduce the plots:

```
import numpy as np
import perfplot
import math
def f(x):
# return math.sqrt(x)
return np.sqrt(x)
vf = np.vectorize(f)
def array_for(x):
return np.array([f(xi) for xi in x])
def array_map(x):
return np.array(list(map(f, x)))
def fromiter(x):
return np.fromiter((f(xi) for xi in x), x.dtype)
def vectorize(x):
return np.vectorize(f)(x)
def vectorize_without_init(x):
return vf(x)
perfplot.show(
setup=np.random.rand,
n_range=[2 ** k for k in range(20)],
kernels=[f, array_for, array_map, fromiter,
vectorize, vectorize_without_init],
xlabel="len(x)",
)
```

## 6. Pandas create new column based on values from other columns / apply a function of multiple columns, row-wise?

OK, two steps to this – first is to write a function that does the translation you want – I’ve put an example together based on your pseudo-code:

```
def label_race (row):
if row['eri_hispanic'] == 1 :
return 'Hispanic'
if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
return 'Two Or More'
if row['eri_nat_amer'] == 1 :
return 'A/I AK Native'
if row['eri_asian'] == 1:
return 'Asian'
if row['eri_afr_amer'] == 1:
return 'Black/AA'
if row['eri_hawaiian'] == 1:
return 'Haw/Pac Isl.'
if row['eri_white'] == 1:
return 'White'
return 'Other'
```

You may want to go over this, but it seems to do the trick – notice that the parameter going into the function is considered to be a Series object labelled “row”.

Next, use the apply function in pandas to apply the function – e.g.

```
df.apply (lambda row: label_race(row), axis=1)
```

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

```
0 White
1 Hispanic
2 White
3 White
4 Other
5 White
6 Two Or More
7 White
8 Haw/Pac Isl.
9 White
```

If you’re happy with those results, then run it again, saving the results into a new column in your original dataframe.

```
df['race_label'] = df.apply (lambda row: label_race(row), axis=1)
```

The resultant dataframe looks like this (scroll to the right to see the new column):

```
lname fname rno_cd eri_afr_amer eri_asian eri_hawaiian eri_hispanic eri_nat_amer eri_white rno_defined race_label
0 MOST JEFF E 0 0 0 0 0 1 White White
1 CRUISE TOM E 0 0 0 1 0 0 White Hispanic
2 DEPP JOHNNY NaN 0 0 0 0 0 1 Unknown White
3 DICAP LEO NaN 0 0 0 0 0 1 Unknown White
4 BRANDO MARLON E 0 0 0 0 0 0 White Other
5 HANKS TOM NaN 0 0 0 0 0 1 Unknown White
6 DENIRO ROBERT E 0 1 0 0 0 1 White Two Or More
7 PACINO AL E 0 0 0 0 0 1 White White
8 WILLIAMS ROBIN E 0 0 1 0 0 0 White Haw/Pac Isl.
9 EASTWOOD CLINT E 0 0 0 0 0 1 White White
```

## 7. How to count the occurrence of certain item in an ndarray?

```
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
unique, counts = numpy.unique(a, return_counts=True)
dict(zip(unique, counts))
# {0: 7, 1: 4, 2: 1, 3: 2, 4: 1}
```

**Non-numpy way**:

Use `collections.Counter`

;

```
import collections, numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
collections.Counter(a)
# Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})
```

## 8. Valueerror: setting an array element with a sequence?

From the code you showed us, the only thing we can tell is that you are trying to create an array from a list that isn’t shaped like a multi-dimensional array. For example

```
numpy.array([[1,2], [2, 3, 4]])
```

or

```
numpy.array([[1,2], [2, [3, 4]]])
```

will yield this error message, because the shape of the input list isn’t a (generalised) “box” that can be turned into a multidimensional array. So probably `UnFilteredDuringExSummaryOfMeansArray`

contains sequences of different lengths.

**Edit**: Another possible cause for this error message is trying to use a string as an element in an array of type `float`

:

```
numpy.array([1.2, "abc"], dtype=float)
```

That is what you are trying according to your edit. If you really want to have a NumPy array containing both strings and floats, you could use the dtype `object`

, which enables the array to hold arbitrary Python objects:

```
numpy.array([1.2, "abc"], dtype=object)
```

Without knowing what your code shall accomplish, I can’t judge if this is what you want.

## 9. Typeerror: only integer scalar arrays can be converted to a scalar index with 1d numpy indices array?

Perhaps the error message is somewhat misleading, but the gist is that `X_train`

is a list, not a numpy array. You cannot use array indexing on it. Make it an array first:

```
out_images = np.array(X_train)[indices.astype(int)]
```

## 10. Valueerror: the truth value of an array with more than one element is ambiguous. use a.any() or a.all()?

`r`

is a numpy (rec)array. So `r["dt"] >= startdate`

is also a (boolean)

array. For numpy arrays the `&`

operation returns the elementwise-and of the two

boolean arrays.

The NumPy developers felt there was no one commonly understood way to evaluate

an array in boolean context: it could mean `True`

if *any* element is

`True`

, or it could mean `True`

if *all* elements are `True`

, or `True`

if the array has non-zero length, just to name three possibilities.

Since different users might have different needs and different assumptions, the

NumPy developers refused to guess and instead decided to raise a ValueError

whenever one tries to evaluate an array in boolean context. Applying `and`

to

two numpy arrays causes the two arrays to be evaluated in boolean context (by

calling `__bool__`

in Python3 or `__nonzero__`

in Python2).

Your original code

```
mask = ((r["dt"] >= startdate) & (r["dt"] <= enddate))
selected = r[mask]
```

looks correct. However, if you do want `and`

, then instead of `a and b`

use `(a-b).any()`

or `(a-b).all()`

.