I need your help. I have a FASTA file, where a lot of genes with different lengths are stored.
Here is an example:
'>ENSMUSG00000031109|X|49009707|49288259
ATGACGCTGCCTGTGTCTGATCCAGCTGCATGGGCCACAGCAATGAATAATCTTGGAATG
GCTCCACTGGGAATTGCTGGACAACCAATTTTACCTGACTTCGATCCTGCCCTTGGGATG
ATGACTGGAATACCACCAATAACTCCCATGATGCCGGGTTTGGGCATAGTCCCGCCACCG
ATTCCTCCAGATATGCCGGTAGCAAAGGAGATCATACACTGCAAAAGCTGCACGCTCTTC
CCTCCCAACCCAAATCTTCCACCACCTGCAACACGAGAAAGGCCACCAGGCTGTAAGACA
GTGTTTGTGGGTGGCCTGCCTGAAAATGGGACAGAGCAGATCATTGTGGAAGTGTTTGAA
CAGTGTGGAGAGATTATTGCTATCCGGAAGAGCAAAAAGAACTTCTGTCACATTCGCTTT
AACTTCACAAAAGCACAACGTAAAAACATCAGTGTTTGGTGCAAACAAGCTGAGGAAATT'
I want to store these different genes in a dictionary. So the title and afterwords the sequence. I was thinking maybe using regex, for the key, as all of the titles begin with <. But does anybody has an idea/tip how to best do it?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…