Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

bash - unix: merge files based on column value

I have two files, that look like this:

File 1 (2 columns):

ID1 123
ID2 234
ID3 232
ID4 344
...

File 2 (>1 million columns)

ID2 A C ...
ID3 G T ...
ID1 C T ...
ID4 A C ... 
...

I want to add the values from column 2 of file 1 based on the ID to file 2 as the second column. So the merged file should look like this:

ID2 234 A C ...
ID3 232 G T ...
ID1 123 C T ...
ID4 344 A C ... 
...

So exactly the same as file 2 (same order of rows), but with the added 2nd column. The IDs are the values of the first column (present in both files). File 1 has more rows/IDs than file 2. All IDs from file 2 are in file 1, but not all IDs from file 1 are in file 2.

Does anyone know how to do this under unix/bash? Many thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
$ join <(sort file1) <(sort file2)
ID1 123 C T ...
ID2 234 A C ...
ID3 232 G T ...
ID4 344 A C ...

If you want keep the order of file2

$ join -1 1 -2 2 <(sort file1) <(cat -n file2 | sort -k2,2) | sort -k3,3n | cut -d' ' -f1-2,4-
ID2 234 A C ...
ID3 232 G T ...
ID1 123 C T ...
ID4 344 A C ...

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...