How to remove lines that start with the same characters (but are random) in python?
I am trying to remove lines in a file that start with the same 5 characters, however, the first 5 characters are random (I don't know what they will be)?
I have a code that reads the last 5 characters of the first line of a file and matches them to the FIRST 5 characters on a random line in the file that has the same 5 characters. The problem is, when there are two or more matches that have the same first 5 characters the code messes up. I need something that reads all the lines in the file and removes one of the two lines that have the same 5 first characters.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
I will greatly appreciate it if you could explain how I could go about this with words as well.
python bioinformatics matching dna-sequence
add a comment |
I am trying to remove lines in a file that start with the same 5 characters, however, the first 5 characters are random (I don't know what they will be)?
I have a code that reads the last 5 characters of the first line of a file and matches them to the FIRST 5 characters on a random line in the file that has the same 5 characters. The problem is, when there are two or more matches that have the same first 5 characters the code messes up. I need something that reads all the lines in the file and removes one of the two lines that have the same 5 first characters.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
I will greatly appreciate it if you could explain how I could go about this with words as well.
python bioinformatics matching dna-sequence
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. On topic, how to ask, and ... the perfect question apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest coding attempt, and run into a problem, you'd have a good example to post.
– Prune
Nov 15 '18 at 20:12
Hi and welcome to SO. Your posted question does not appear to include any attempt at all to solve the problem. StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example. For more information, please see How to Ask and take the Tour.
– quant
Nov 15 '18 at 20:16
Show us the code you wrote so far so we can see how it can be improved
– Milo Bem
Nov 15 '18 at 20:19
add a comment |
I am trying to remove lines in a file that start with the same 5 characters, however, the first 5 characters are random (I don't know what they will be)?
I have a code that reads the last 5 characters of the first line of a file and matches them to the FIRST 5 characters on a random line in the file that has the same 5 characters. The problem is, when there are two or more matches that have the same first 5 characters the code messes up. I need something that reads all the lines in the file and removes one of the two lines that have the same 5 first characters.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
I will greatly appreciate it if you could explain how I could go about this with words as well.
python bioinformatics matching dna-sequence
I am trying to remove lines in a file that start with the same 5 characters, however, the first 5 characters are random (I don't know what they will be)?
I have a code that reads the last 5 characters of the first line of a file and matches them to the FIRST 5 characters on a random line in the file that has the same 5 characters. The problem is, when there are two or more matches that have the same first 5 characters the code messes up. I need something that reads all the lines in the file and removes one of the two lines that have the same 5 first characters.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
I will greatly appreciate it if you could explain how I could go about this with words as well.
python bioinformatics matching dna-sequence
python bioinformatics matching dna-sequence
edited Nov 15 '18 at 21:46
quant
1,60711527
1,60711527
asked Nov 15 '18 at 20:09
Alpa LucaAlpa Luca
85
85
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. On topic, how to ask, and ... the perfect question apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest coding attempt, and run into a problem, you'd have a good example to post.
– Prune
Nov 15 '18 at 20:12
Hi and welcome to SO. Your posted question does not appear to include any attempt at all to solve the problem. StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example. For more information, please see How to Ask and take the Tour.
– quant
Nov 15 '18 at 20:16
Show us the code you wrote so far so we can see how it can be improved
– Milo Bem
Nov 15 '18 at 20:19
add a comment |
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. On topic, how to ask, and ... the perfect question apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest coding attempt, and run into a problem, you'd have a good example to post.
– Prune
Nov 15 '18 at 20:12
Hi and welcome to SO. Your posted question does not appear to include any attempt at all to solve the problem. StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example. For more information, please see How to Ask and take the Tour.
– quant
Nov 15 '18 at 20:16
Show us the code you wrote so far so we can see how it can be improved
– Milo Bem
Nov 15 '18 at 20:19
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. On topic, how to ask, and ... the perfect question apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest coding attempt, and run into a problem, you'd have a good example to post.
– Prune
Nov 15 '18 at 20:12
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. On topic, how to ask, and ... the perfect question apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest coding attempt, and run into a problem, you'd have a good example to post.
– Prune
Nov 15 '18 at 20:12
Hi and welcome to SO. Your posted question does not appear to include any attempt at all to solve the problem. StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example. For more information, please see How to Ask and take the Tour.
– quant
Nov 15 '18 at 20:16
Hi and welcome to SO. Your posted question does not appear to include any attempt at all to solve the problem. StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example. For more information, please see How to Ask and take the Tour.
– quant
Nov 15 '18 at 20:16
Show us the code you wrote so far so we can see how it can be improved
– Milo Bem
Nov 15 '18 at 20:19
Show us the code you wrote so far so we can see how it can be improved
– Milo Bem
Nov 15 '18 at 20:19
add a comment |
1 Answer
1
active
oldest
votes
You can do this for example like so:
FILE_NAME = "data.txt" # the name of the file to read in
NR_MATCHING_CHARS = 5 # the number of characters that need to match
lines = set() # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF: # open the file
for line in inF: # for every line
line = line.strip() # that is
if line == "": continue # not empty
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines): # and the beginning of this line was not printed yet
print(line) # print the line
lines.add(beginOfSequence) # remember that the beginning of the line
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53327189%2fhow-to-remove-lines-that-start-with-the-same-characters-but-are-random-in-pyth%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can do this for example like so:
FILE_NAME = "data.txt" # the name of the file to read in
NR_MATCHING_CHARS = 5 # the number of characters that need to match
lines = set() # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF: # open the file
for line in inF: # for every line
line = line.strip() # that is
if line == "": continue # not empty
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines): # and the beginning of this line was not printed yet
print(line) # print the line
lines.add(beginOfSequence) # remember that the beginning of the line
add a comment |
You can do this for example like so:
FILE_NAME = "data.txt" # the name of the file to read in
NR_MATCHING_CHARS = 5 # the number of characters that need to match
lines = set() # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF: # open the file
for line in inF: # for every line
line = line.strip() # that is
if line == "": continue # not empty
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines): # and the beginning of this line was not printed yet
print(line) # print the line
lines.add(beginOfSequence) # remember that the beginning of the line
add a comment |
You can do this for example like so:
FILE_NAME = "data.txt" # the name of the file to read in
NR_MATCHING_CHARS = 5 # the number of characters that need to match
lines = set() # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF: # open the file
for line in inF: # for every line
line = line.strip() # that is
if line == "": continue # not empty
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines): # and the beginning of this line was not printed yet
print(line) # print the line
lines.add(beginOfSequence) # remember that the beginning of the line
You can do this for example like so:
FILE_NAME = "data.txt" # the name of the file to read in
NR_MATCHING_CHARS = 5 # the number of characters that need to match
lines = set() # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF: # open the file
for line in inF: # for every line
line = line.strip() # that is
if line == "": continue # not empty
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines): # and the beginning of this line was not printed yet
print(line) # print the line
lines.add(beginOfSequence) # remember that the beginning of the line
answered Nov 15 '18 at 20:30
quantquant
1,60711527
1,60711527
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53327189%2fhow-to-remove-lines-that-start-with-the-same-characters-but-are-random-in-pyth%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. On topic, how to ask, and ... the perfect question apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest coding attempt, and run into a problem, you'd have a good example to post.
– Prune
Nov 15 '18 at 20:12
Hi and welcome to SO. Your posted question does not appear to include any attempt at all to solve the problem. StackOverflow expects you to try to solve your own problem first, as your attempts help us to better understand what you want. Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example. For more information, please see How to Ask and take the Tour.
– quant
Nov 15 '18 at 20:16
Show us the code you wrote so far so we can see how it can be improved
– Milo Bem
Nov 15 '18 at 20:19