Beautiful Soup Pulling Data From Table










0















I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried



url = https://www.basketball-reference.com/boxscores/201101100CHA.html
html = requests.get(url).content
soup = BeautifulSoup(html,"html.parser")

div = soup.find('div',id='all_four_factors')


Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.










share|improve this question


























    0















    I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried



    url = https://www.basketball-reference.com/boxscores/201101100CHA.html
    html = requests.get(url).content
    soup = BeautifulSoup(html,"html.parser")

    div = soup.find('div',id='all_four_factors')


    Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.










    share|improve this question
























      0












      0








      0








      I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried



      url = https://www.basketball-reference.com/boxscores/201101100CHA.html
      html = requests.get(url).content
      soup = BeautifulSoup(html,"html.parser")

      div = soup.find('div',id='all_four_factors')


      Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.










      share|improve this question














      I am trying to pull the data from the Four Factors table on this website https://www.basketball-reference.com/boxscores/201101100CHA.html. I am having trouble getting to the table. I have tried



      url = https://www.basketball-reference.com/boxscores/201101100CHA.html
      html = requests.get(url).content
      soup = BeautifulSoup(html,"html.parser")

      div = soup.find('div',id='all_four_factors')


      Then when I try to use tr = div.find_all('tr') to pull the rows I am getting nothing back.







      python beautifulsoup






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 0:21









      GNMO11GNMO11

      5022520




      5022520






















          3 Answers
          3






          active

          oldest

          votes


















          3














          I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:



          import requests
          from bs4 import BeautifulSoup, Comment

          url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
          html = requests.get(url).content
          soup = BeautifulSoup(html,"html.parser")

          div = soup.find('div', id='all_four_factors')

          # Get everything in here that's a comment
          comments = div.find_all(text=lambda text:isinstance(text, Comment))

          # Loop through each comment until you find the one that
          # has the stuff you want.
          for c in comments:

          # A perhaps crude but effective way of stopping at a comment
          # with HTML inside: see if the first character inside is '<'.
          if c.strip()[0] == '<':
          newsoup = BeautifulSoup(c.strip(), 'html.parser')
          tr = newsoup.find_all('tr')
          print(tr)


          One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.






          share|improve this answer


















          • 1





            This was very helpful - thank you!

            – GNMO11
            Nov 14 '18 at 1:03


















          2














          If you look at list(div.children)[5], which is the only children that have tr as a substring in it, you'll realize that it is a Comment object, so there is technically no tr element under that div node. So div.find_all('tr') is expected to be empty.






          share|improve this answer






























            0














            Why are you doing:



            div = soup.find('div',id='all_four_factors')


            This would get the following line and try to search for 'tr' tags in it.



            <div id="all_four_factors" class="table_wrapper floated setup_commented commented">


            You can just use your original soup variable from the first part and do



            tr = soup.find_all('tr')





            share|improve this answer






















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291402%2fbeautiful-soup-pulling-data-from-table%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:



              import requests
              from bs4 import BeautifulSoup, Comment

              url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
              html = requests.get(url).content
              soup = BeautifulSoup(html,"html.parser")

              div = soup.find('div', id='all_four_factors')

              # Get everything in here that's a comment
              comments = div.find_all(text=lambda text:isinstance(text, Comment))

              # Loop through each comment until you find the one that
              # has the stuff you want.
              for c in comments:

              # A perhaps crude but effective way of stopping at a comment
              # with HTML inside: see if the first character inside is '<'.
              if c.strip()[0] == '<':
              newsoup = BeautifulSoup(c.strip(), 'html.parser')
              tr = newsoup.find_all('tr')
              print(tr)


              One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.






              share|improve this answer


















              • 1





                This was very helpful - thank you!

                – GNMO11
                Nov 14 '18 at 1:03















              3














              I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:



              import requests
              from bs4 import BeautifulSoup, Comment

              url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
              html = requests.get(url).content
              soup = BeautifulSoup(html,"html.parser")

              div = soup.find('div', id='all_four_factors')

              # Get everything in here that's a comment
              comments = div.find_all(text=lambda text:isinstance(text, Comment))

              # Loop through each comment until you find the one that
              # has the stuff you want.
              for c in comments:

              # A perhaps crude but effective way of stopping at a comment
              # with HTML inside: see if the first character inside is '<'.
              if c.strip()[0] == '<':
              newsoup = BeautifulSoup(c.strip(), 'html.parser')
              tr = newsoup.find_all('tr')
              print(tr)


              One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.






              share|improve this answer


















              • 1





                This was very helpful - thank you!

                – GNMO11
                Nov 14 '18 at 1:03













              3












              3








              3







              I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:



              import requests
              from bs4 import BeautifulSoup, Comment

              url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
              html = requests.get(url).content
              soup = BeautifulSoup(html,"html.parser")

              div = soup.find('div', id='all_four_factors')

              # Get everything in here that's a comment
              comments = div.find_all(text=lambda text:isinstance(text, Comment))

              # Loop through each comment until you find the one that
              # has the stuff you want.
              for c in comments:

              # A perhaps crude but effective way of stopping at a comment
              # with HTML inside: see if the first character inside is '<'.
              if c.strip()[0] == '<':
              newsoup = BeautifulSoup(c.strip(), 'html.parser')
              tr = newsoup.find_all('tr')
              print(tr)


              One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.






              share|improve this answer













              I took a look at the HTML code you're trying to scrape, and the problem is that the tags you're trying to get are all within a comment section, <!-- Like this --->. BeautifulSoup treats the comments inside as just a bunch of text, not actual HTML code. So what you'll have to do is take the contents of the comment, then stick this string back into BeautifulSoup:



              import requests
              from bs4 import BeautifulSoup, Comment

              url = 'https://www.basketball-reference.com/boxscores/201101100CHA.html'
              html = requests.get(url).content
              soup = BeautifulSoup(html,"html.parser")

              div = soup.find('div', id='all_four_factors')

              # Get everything in here that's a comment
              comments = div.find_all(text=lambda text:isinstance(text, Comment))

              # Loop through each comment until you find the one that
              # has the stuff you want.
              for c in comments:

              # A perhaps crude but effective way of stopping at a comment
              # with HTML inside: see if the first character inside is '<'.
              if c.strip()[0] == '<':
              newsoup = BeautifulSoup(c.strip(), 'html.parser')
              tr = newsoup.find_all('tr')
              print(tr)


              One caveat with this is that BS is going to assume that the commented-out code is valid, well-formed HTML. This works for me though, so if the page stays relatively the same it should continue to work.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 14 '18 at 0:47









              Bill M.Bill M.

              73219




              73219







              • 1





                This was very helpful - thank you!

                – GNMO11
                Nov 14 '18 at 1:03












              • 1





                This was very helpful - thank you!

                – GNMO11
                Nov 14 '18 at 1:03







              1




              1





              This was very helpful - thank you!

              – GNMO11
              Nov 14 '18 at 1:03





              This was very helpful - thank you!

              – GNMO11
              Nov 14 '18 at 1:03













              2














              If you look at list(div.children)[5], which is the only children that have tr as a substring in it, you'll realize that it is a Comment object, so there is technically no tr element under that div node. So div.find_all('tr') is expected to be empty.






              share|improve this answer



























                2














                If you look at list(div.children)[5], which is the only children that have tr as a substring in it, you'll realize that it is a Comment object, so there is technically no tr element under that div node. So div.find_all('tr') is expected to be empty.






                share|improve this answer

























                  2












                  2








                  2







                  If you look at list(div.children)[5], which is the only children that have tr as a substring in it, you'll realize that it is a Comment object, so there is technically no tr element under that div node. So div.find_all('tr') is expected to be empty.






                  share|improve this answer













                  If you look at list(div.children)[5], which is the only children that have tr as a substring in it, you'll realize that it is a Comment object, so there is technically no tr element under that div node. So div.find_all('tr') is expected to be empty.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 14 '18 at 0:37









                  Kevin HeKevin He

                  622415




                  622415





















                      0














                      Why are you doing:



                      div = soup.find('div',id='all_four_factors')


                      This would get the following line and try to search for 'tr' tags in it.



                      <div id="all_four_factors" class="table_wrapper floated setup_commented commented">


                      You can just use your original soup variable from the first part and do



                      tr = soup.find_all('tr')





                      share|improve this answer



























                        0














                        Why are you doing:



                        div = soup.find('div',id='all_four_factors')


                        This would get the following line and try to search for 'tr' tags in it.



                        <div id="all_four_factors" class="table_wrapper floated setup_commented commented">


                        You can just use your original soup variable from the first part and do



                        tr = soup.find_all('tr')





                        share|improve this answer

























                          0












                          0








                          0







                          Why are you doing:



                          div = soup.find('div',id='all_four_factors')


                          This would get the following line and try to search for 'tr' tags in it.



                          <div id="all_four_factors" class="table_wrapper floated setup_commented commented">


                          You can just use your original soup variable from the first part and do



                          tr = soup.find_all('tr')





                          share|improve this answer













                          Why are you doing:



                          div = soup.find('div',id='all_four_factors')


                          This would get the following line and try to search for 'tr' tags in it.



                          <div id="all_four_factors" class="table_wrapper floated setup_commented commented">


                          You can just use your original soup variable from the first part and do



                          tr = soup.find_all('tr')






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 14 '18 at 0:28









                          Ahmed El GoharyAhmed El Gohary

                          263




                          263



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291402%2fbeautiful-soup-pulling-data-from-table%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              這個網誌中的熱門文章

                              How to read a connectionString WITH PROVIDER in .NET Core?

                              Node.js Script on GitHub Pages or Amazon S3

                              Museum of Modern and Contemporary Art of Trento and Rovereto