Beautiful Soup Pulling Data From Table

I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried

url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div',id='all_four_factors')

Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.

asked Nov 14 '18 at 0:21

GNMO11

5022520

add a comment |

I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried

url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div',id='all_four_factors')

Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.

asked Nov 14 '18 at 0:21

GNMO11

5022520

add a comment |

I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried

url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div',id='all_four_factors')

Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.

asked Nov 14 '18 at 0:21

GNMO11

5022520

I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried

url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div',id='all_four_factors')

Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.

python beautifulsoup

asked Nov 14 '18 at 0:21

GNMO11

5022520

asked Nov 14 '18 at 0:21

GNMO11

5022520

asked Nov 14 '18 at 0:21

GNMO11

5022520

asked Nov 14 '18 at 0:21

GNMO11

5022520

asked Nov 14 '18 at 0:21

GNMO11

5022520

add a comment |

3 Answers
3

active

oldest

votes

I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, . BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:

import requests
from bs4 import BeautifulSoup, Comment

url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div', id='all_four_factors')

# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))

# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:

 # A perhaps crude but effective way of stopping at a comment
 # with HTML inside: see if the first character inside is '<'.
 if c.strip()[0] == '<':
 newsoup = BeautifulSoup(c.strip(), 'html.parser')
 tr = newsoup.find_all('tr')
 print(tr)

One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.

answered Nov 14 '18 at 0:47

Bill M.

73219

1

This was very helpful - thank you!

– GNMO11
Nov 14 '18 at 1:03

add a comment |

If you look at list(div.children)[5], which is the only children that have tr as a substring in it, you'll realize that it is a Comment object, so there is technically no tr element under that div node. So div.find_all('tr') is expected to be empty.

answered Nov 14 '18 at 0:37

Kevin He

622415

add a comment |

Why are you doing:

div = soup.find('div',id='all_four_factors')

This would get the following line and try to search for 'tr' tags in it.

<div id="all_four_factors" class="table_wrapper floated setup_commented commented">

You can just use your original soup variable from the first part and do

tr = soup.find_all('tr')

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291402%2fbeautiful-soup-pulling-data-from-table%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

import requests
from bs4 import BeautifulSoup, Comment

url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div', id='all_four_factors')

# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))

# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:

 # A perhaps crude but effective way of stopping at a comment
 # with HTML inside: see if the first character inside is '<'.
 if c.strip()[0] == '<':
 newsoup = BeautifulSoup(c.strip(), 'html.parser')
 tr = newsoup.find_all('tr')
 print(tr)

answered Nov 14 '18 at 0:47

Bill M.

73219

1

This was very helpful - thank you!

– GNMO11
Nov 14 '18 at 1:03

add a comment |

import requests
from bs4 import BeautifulSoup, Comment

url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div', id='all_four_factors')

# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))

# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:

 # A perhaps crude but effective way of stopping at a comment
 # with HTML inside: see if the first character inside is '<'.
 if c.strip()[0] == '<':
 newsoup = BeautifulSoup(c.strip(), 'html.parser')
 tr = newsoup.find_all('tr')
 print(tr)

answered Nov 14 '18 at 0:47

Bill M.

73219

1

This was very helpful - thank you!

– GNMO11
Nov 14 '18 at 1:03

add a comment |

import requests
from bs4 import BeautifulSoup, Comment

url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div', id='all_four_factors')

# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))

# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:

 # A perhaps crude but effective way of stopping at a comment
 # with HTML inside: see if the first character inside is '<'.
 if c.strip()[0] == '<':
 newsoup = BeautifulSoup(c.strip(), 'html.parser')
 tr = newsoup.find_all('tr')
 print(tr)

answered Nov 14 '18 at 0:47

Bill M.

73219

import requests
from bs4 import BeautifulSoup, Comment

url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div', id='all_four_factors')

# Get everything in here that's a comment
comments = div.find_all(text=lambda text:isinstance(text, Comment))

# Loop through each comment until you find the one that
# has the stuff you want.
for c in comments:

 # A perhaps crude but effective way of stopping at a comment
 # with HTML inside: see if the first character inside is '<'.
 if c.strip()[0] == '<':
 newsoup = BeautifulSoup(c.strip(), 'html.parser')
 tr = newsoup.find_all('tr')
 print(tr)

answered Nov 14 '18 at 0:47

Bill M.

73219

answered Nov 14 '18 at 0:47

Bill M.

73219

answered Nov 14 '18 at 0:47

Bill M.

73219

answered Nov 14 '18 at 0:47

Bill M.

73219

1

This was very helpful - thank you!

– GNMO11
Nov 14 '18 at 1:03

add a comment |

1

This was very helpful - thank you!

– GNMO11
Nov 14 '18 at 1:03

This was very helpful - thank you!

– GNMO11
Nov 14 '18 at 1:03

add a comment |

answered Nov 14 '18 at 0:37

Kevin He

622415

add a comment |

answered Nov 14 '18 at 0:37

Kevin He

622415

add a comment |

answered Nov 14 '18 at 0:37

Kevin He

622415

answered Nov 14 '18 at 0:37

Kevin He

622415

answered Nov 14 '18 at 0:37

Kevin He

622415

answered Nov 14 '18 at 0:37

Kevin He

622415

answered Nov 14 '18 at 0:37

Kevin He

622415

add a comment |

Why are you doing:

div = soup.find('div',id='all_four_factors')

This would get the following line and try to search for 'tr' tags in it.

<div id="all_four_factors" class="table_wrapper floated setup_commented commented">

You can just use your original soup variable from the first part and do

tr = soup.find_all('tr')

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

add a comment |

Why are you doing:

div = soup.find('div',id='all_four_factors')

This would get the following line and try to search for 'tr' tags in it.

<div id="all_four_factors" class="table_wrapper floated setup_commented commented">

You can just use your original soup variable from the first part and do

tr = soup.find_all('tr')

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

add a comment |

Why are you doing:

div = soup.find('div',id='all_four_factors')

This would get the following line and try to search for 'tr' tags in it.

<div id="all_four_factors" class="table_wrapper floated setup_commented commented">

You can just use your original soup variable from the first part and do

tr = soup.find_all('tr')

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

Why are you doing:

div = soup.find('div',id='all_four_factors')

This would get the following line and try to search for 'tr' tags in it.

<div id="all_four_factors" class="table_wrapper floated setup_commented commented">

You can just use your original soup variable from the first part and do

tr = soup.find_all('tr')

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

answered Nov 14 '18 at 0:28

Ahmed El Gohary

263

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

kfx90,fREjMm09Hh1zhptVvDNCGA1rs AirKw,46Z,797BrrLv Rbccc7qmPPKGRG3y D7 oAgm,l4 zoDqY6WH2N6Hd

搜尋此網誌

Odtnhj