Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

pandas - How do I change rows and columns in a dask dataframe?

There are few issues I am having with Dask Dataframes.

lets say I have a dataframe with 2 columns ['a','b']

if i want a new column c = a + b

in pandas i would do :

df['c'] = df['a'] + df['b']

In dask I am doing the same operation as follows:

df = df.assign(c=(df.a + df.b).compute())

is it possible to write this operation in a better way, similar to what we do in pandas?

Second question is something which is troubling me more.

In pandas if i want to change the value of 'a' for row 2 & 6 to np.pi , I do the following

df.loc[[2,6],'a']  = np.pi

I have not been able to figure out how to do a similar operation in Dask. My logic selects some rows and I only want to change values in those rows.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Edit Add New Columns

Setitem syntax now works in dask.dataframe

df['z'] = df.x + df.y

Old answer: Add new columns

You're correct that the setitem syntax doesn't work in dask.dataframe.

df['c'] = ... # mutation not supported

As you suggest you should instead use .assign(...).

df = df.assign(c=df.a + df.b)

In your example you have an unnecessary call to .compute(). Generally you want to call compute only at the very end, once you have your final result.

Change rows

As before, dask.dataframe does not support changing rows in place. Inplace operations are difficult to reason about in parallel codes. At the moment dask.dataframe has no nice alternative operation in this case. I've raised issue #653 for conversation on this topic.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...