Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

scala - How to filter duplicate records having multiple key in Spark Dataframe?

I have two dataframes. I want to delete some records in Data Frame-A based on some common column values in Data Frame-B.

For Example: Data Frame-A:

A B C D
1 2 3 4
3 4 5 7
4 7 9 6
2 5 7 9


Data Frame-B:

A B C D
1 2 3 7
2 5 7 4
2 9 8 7


Keys: A,B,C columns

Desired Output:

A B C D
3 4 5 7
4 7 9 6

Any solution for this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You are looking for left anti-join:

df_a.join(df_b, Seq("A","B","C"), "leftanti").show()
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  3|  4|  5|  7|
|  4|  7|  9|  6|
+---+---+---+---+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...