Beautiful Soup Pulling Data From Table
I am trying to pull the data from the Four Factors
table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried
url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div',id='all_four_factors')
Then when I try to use tr = div.find_all('tr')
to pull the rows I am getting nothing back.
python beautifulsoup
add a comment |
I am trying to pull the data from the Four Factors
table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried
url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div',id='all_four_factors')
Then when I try to use tr = div.find_all('tr')
to pull the rows I am getting nothing back.
python beautifulsoup
add a comment |
I am trying to pull the data from the Four Factors
table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried
url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div',id='all_four_factors')
Then when I try to use tr = div.find_all('tr')
to pull the rows I am getting nothing back.
python beautifulsoup
I am trying to pull the data from the Four Factors
table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried
url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div',id='all_four_factors')
Then when I try to use tr = div.find_all('tr')
to pull the rows I am getting nothing back.
python beautifulsoup
python beautifulsoup
asked Nov 14 '18 at 0:21
GNMO11GNMO11
5022520
5022520
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->
. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div', id='all_four_factors')
# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))
# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:
# A perhaps crude but effective way of stopping at a comment
# with HTML inside: see if the first character inside is '<'.
if c.strip()[0] == '<':
newsoup = BeautifulSoup(c.strip(), 'html.parser')
tr = newsoup.find_all('tr')
print(tr)
One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.
1
This was very helpful - thank you!
– GNMO11
Nov 14 '18 at 1:03
add a comment |
If you look at list(div.children)[5]
, which is the only children that have tr
as a substring in it, you'll realize that it is a Comment
object, so there is technically no tr
element under that div
node. So div.find_all('tr')
is expected to be empty.
add a comment |
Why are you doing:
div = soup.find('div',id='all_four_factors')
This would get the following line and try to search for 'tr' tags in it.
<div id="all_four_factors" class="table_wrapper floated setup_commented commented">
You can just use your original soup variable from the first part and do
tr = soup.find_all('tr')
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291402%2fbeautiful-soup-pulling-data-from-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->
. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div', id='all_four_factors')
# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))
# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:
# A perhaps crude but effective way of stopping at a comment
# with HTML inside: see if the first character inside is '<'.
if c.strip()[0] == '<':
newsoup = BeautifulSoup(c.strip(), 'html.parser')
tr = newsoup.find_all('tr')
print(tr)
One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.
1
This was very helpful - thank you!
– GNMO11
Nov 14 '18 at 1:03
add a comment |
I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->
. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div', id='all_four_factors')
# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))
# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:
# A perhaps crude but effective way of stopping at a comment
# with HTML inside: see if the first character inside is '<'.
if c.strip()[0] == '<':
newsoup = BeautifulSoup(c.strip(), 'html.parser')
tr = newsoup.find_all('tr')
print(tr)
One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.
1
This was very helpful - thank you!
– GNMO11
Nov 14 '18 at 1:03
add a comment |
I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->
. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div', id='all_four_factors')
# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))
# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:
# A perhaps crude but effective way of stopping at a comment
# with HTML inside: see if the first character inside is '<'.
if c.strip()[0] == '<':
newsoup = BeautifulSoup(c.strip(), 'html.parser')
tr = newsoup.find_all('tr')
print(tr)
One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.
I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->
. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")
div = soup.find('div', id='all_four_factors')
# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))
# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:
# A perhaps crude but effective way of stopping at a comment
# with HTML inside: see if the first character inside is '<'.
if c.strip()[0] == '<':
newsoup = BeautifulSoup(c.strip(), 'html.parser')
tr = newsoup.find_all('tr')
print(tr)
One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.
answered Nov 14 '18 at 0:47
Bill M.Bill M.
73219
73219
1
This was very helpful - thank you!
– GNMO11
Nov 14 '18 at 1:03
add a comment |
1
This was very helpful - thank you!
– GNMO11
Nov 14 '18 at 1:03
1
1
This was very helpful - thank you!
– GNMO11
Nov 14 '18 at 1:03
This was very helpful - thank you!
– GNMO11
Nov 14 '18 at 1:03
add a comment |
If you look at list(div.children)[5]
, which is the only children that have tr
as a substring in it, you'll realize that it is a Comment
object, so there is technically no tr
element under that div
node. So div.find_all('tr')
is expected to be empty.
add a comment |
If you look at list(div.children)[5]
, which is the only children that have tr
as a substring in it, you'll realize that it is a Comment
object, so there is technically no tr
element under that div
node. So div.find_all('tr')
is expected to be empty.
add a comment |
If you look at list(div.children)[5]
, which is the only children that have tr
as a substring in it, you'll realize that it is a Comment
object, so there is technically no tr
element under that div
node. So div.find_all('tr')
is expected to be empty.
If you look at list(div.children)[5]
, which is the only children that have tr
as a substring in it, you'll realize that it is a Comment
object, so there is technically no tr
element under that div
node. So div.find_all('tr')
is expected to be empty.
answered Nov 14 '18 at 0:37
Kevin HeKevin He
622415
622415
add a comment |
add a comment |
Why are you doing:
div = soup.find('div',id='all_four_factors')
This would get the following line and try to search for 'tr' tags in it.
<div id="all_four_factors" class="table_wrapper floated setup_commented commented">
You can just use your original soup variable from the first part and do
tr = soup.find_all('tr')
add a comment |
Why are you doing:
div = soup.find('div',id='all_four_factors')
This would get the following line and try to search for 'tr' tags in it.
<div id="all_four_factors" class="table_wrapper floated setup_commented commented">
You can just use your original soup variable from the first part and do
tr = soup.find_all('tr')
add a comment |
Why are you doing:
div = soup.find('div',id='all_four_factors')
This would get the following line and try to search for 'tr' tags in it.
<div id="all_four_factors" class="table_wrapper floated setup_commented commented">
You can just use your original soup variable from the first part and do
tr = soup.find_all('tr')
Why are you doing:
div = soup.find('div',id='all_four_factors')
This would get the following line and try to search for 'tr' tags in it.
<div id="all_four_factors" class="table_wrapper floated setup_commented commented">
You can just use your original soup variable from the first part and do
tr = soup.find_all('tr')
answered Nov 14 '18 at 0:28
Ahmed El GoharyAhmed El Gohary
263
263
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291402%2fbeautiful-soup-pulling-data-from-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown