awk - compare two files and print all columns from both files
I want to compare two files
File 1:
evm.TU.PTPU-T1. PF00808
evm.TU.PTP-T1 PF00498
evm.TU.PTPX-T1 PF00250
evm.TU.PAN-T1 PF00817File 2:
PF00808 CL0012 Histone CBFD_NFYB_HMF Histone-like transcription factor
PF00498 CL0357 SMAD-FHA FHA FHA domain
PF00817 CL0123 HTH Forkhead Forkhead domainOutput:
evm.TU.PTPU-T1 PF00808 CL0012 Histone CBFD_NFYB_HMF Histone-like
evm.TU.PTP-T1 PF00498 CL0357 SMAD-FHA FHA FHA domain
evm.TU.PAN-T1 PF00817 CL0123 HTH Forkhead Forkhead domainI tried the below command
awk 'FNR==NR{a[$1]=$2;next} ($1 in a){print $0,a[$1]}' file2 file1 >file3but it is printing only the second column of the file 2, not the entire line.
PF00808 evm.TU.PTPU-T1 CL0012Please let me know how to add the entire matched line of file 2 to the output and not just the second column
61 Answer
You have a couple of options here:
save whole lines
$0ofFile2into an array keyed on its$1; then look up$1ofFile1based on the key in its$2:$ awk 'NR==FNR{a[$1]=$0; next} ($2 in a){print $1,a[$2]}' File2 File1 evm.TU.PTPU-T1. PF00808 CL0012 Histone CBFD_NFYB_HMF Histone-like transcription factor evm.TU.PTP-T1 PF00498 CL0357 SMAD-FHA FHA FHA domain evm.TU.PAN-T1 PF00817 CL0123 HTH Forkhead Forkhead domainsave the
$1values ofFile1keyed on its$2then look up the corresponding whole lines ofFile2based on the key in its$1$ awk 'NR==FNR{a[$2]=$1} ($1 in a){print a[$1], $0}' File1 File2 evm.TU.PTPU-T1. PF00808 CL0012 Histone CBFD_NFYB_HMF Histone-like transcription factor evm.TU.PTP-T1 PF00498 CL0357 SMAD-FHA FHA FHA domain evm.TU.PAN-T1 PF00817 CL0123 HTH Forkhead Forkhead domain