Tuesday, 10 September 2013

Using regex to search until desired pattern

Using regex to search until desired pattern

I am using the following regex:
orfre = '^(?:...)*?((ATG)(...){%d,}?(?=(TAG|TAA|TGA)))' % (aa)
I basically want to find all sequences that start with ATG followed by
triplets (e.g. TTA, TTC, GTC, etc.) until it finds a stop codon in frame.
However, as my regex is written, it won't actually stop at a stop codon if
aa is large. Instead, it will keep searching until it finds one such that
the condition of aa is met. I would rather have it search the entire
string until a stop codon is found. If a match isn't long enough (for a
given aa argument) then it should return None.
String data: AAAATGATGCATTAACCCTAATAA
Desired output from regex: ATGATGCATTAA
Unless aa > 5, in which case nothing should be returned.
Actual output I'm getting: ATGATGCATTAACCCTAA

No comments:

Post a Comment