Reading PASCAL VOC annotations in python










2















I have annotations in xml files such as this one, which follows the PASCAL VOC convention:



<annotation>
<folder>training</folder>
<filename>chanel1.jpg</filename>
<source>
<database>synthetic initialization</database>
<annotation>PASCAL VOC2007</annotation>
<image>synthetic</image>
<flickrid>none</flickrid>
</source>
<owner>
<flickrid>none</flickrid>
<name>none</name>
</owner>
<size>
<width>640</width>
<height>427</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>chanel</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>344</xmin>
<ymin>10</ymin>
<xmax>422</xmax>
<ymax>83</ymax>
</bndbox>
</object>
<object>
<name>chanel</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>355</xmin>
<ymin>165</ymin>
<xmax>443</xmax>
<ymax>206</ymax>
</bndbox>
</object>
</annotation>


What is the cleanest way of retrieving for example the fields filename and bndbox in Python?



I was trying to ElementTree, which seems to be the official Python solution, but I can't make it work.



My code so far:



from xml.etree import ElementTree as ET
tree = ET.parse("data/all/annotations/" + file)
fn = tree.find('filename').text
boxes = tree.findall('bndbox')


this produces



fn == 'chanel1.jpg'
boxes ==


So it succesfully extracts the filename field, but not the bndbox'es.










share|improve this question




























    2















    I have annotations in xml files such as this one, which follows the PASCAL VOC convention:



    <annotation>
    <folder>training</folder>
    <filename>chanel1.jpg</filename>
    <source>
    <database>synthetic initialization</database>
    <annotation>PASCAL VOC2007</annotation>
    <image>synthetic</image>
    <flickrid>none</flickrid>
    </source>
    <owner>
    <flickrid>none</flickrid>
    <name>none</name>
    </owner>
    <size>
    <width>640</width>
    <height>427</height>
    <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
    <name>chanel</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
    <xmin>344</xmin>
    <ymin>10</ymin>
    <xmax>422</xmax>
    <ymax>83</ymax>
    </bndbox>
    </object>
    <object>
    <name>chanel</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
    <xmin>355</xmin>
    <ymin>165</ymin>
    <xmax>443</xmax>
    <ymax>206</ymax>
    </bndbox>
    </object>
    </annotation>


    What is the cleanest way of retrieving for example the fields filename and bndbox in Python?



    I was trying to ElementTree, which seems to be the official Python solution, but I can't make it work.



    My code so far:



    from xml.etree import ElementTree as ET
    tree = ET.parse("data/all/annotations/" + file)
    fn = tree.find('filename').text
    boxes = tree.findall('bndbox')


    this produces



    fn == 'chanel1.jpg'
    boxes ==


    So it succesfully extracts the filename field, but not the bndbox'es.










    share|improve this question


























      2












      2








      2








      I have annotations in xml files such as this one, which follows the PASCAL VOC convention:



      <annotation>
      <folder>training</folder>
      <filename>chanel1.jpg</filename>
      <source>
      <database>synthetic initialization</database>
      <annotation>PASCAL VOC2007</annotation>
      <image>synthetic</image>
      <flickrid>none</flickrid>
      </source>
      <owner>
      <flickrid>none</flickrid>
      <name>none</name>
      </owner>
      <size>
      <width>640</width>
      <height>427</height>
      <depth>3</depth>
      </size>
      <segmented>0</segmented>
      <object>
      <name>chanel</name>
      <pose>Unspecified</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
      <xmin>344</xmin>
      <ymin>10</ymin>
      <xmax>422</xmax>
      <ymax>83</ymax>
      </bndbox>
      </object>
      <object>
      <name>chanel</name>
      <pose>Unspecified</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
      <xmin>355</xmin>
      <ymin>165</ymin>
      <xmax>443</xmax>
      <ymax>206</ymax>
      </bndbox>
      </object>
      </annotation>


      What is the cleanest way of retrieving for example the fields filename and bndbox in Python?



      I was trying to ElementTree, which seems to be the official Python solution, but I can't make it work.



      My code so far:



      from xml.etree import ElementTree as ET
      tree = ET.parse("data/all/annotations/" + file)
      fn = tree.find('filename').text
      boxes = tree.findall('bndbox')


      this produces



      fn == 'chanel1.jpg'
      boxes ==


      So it succesfully extracts the filename field, but not the bndbox'es.










      share|improve this question
















      I have annotations in xml files such as this one, which follows the PASCAL VOC convention:



      <annotation>
      <folder>training</folder>
      <filename>chanel1.jpg</filename>
      <source>
      <database>synthetic initialization</database>
      <annotation>PASCAL VOC2007</annotation>
      <image>synthetic</image>
      <flickrid>none</flickrid>
      </source>
      <owner>
      <flickrid>none</flickrid>
      <name>none</name>
      </owner>
      <size>
      <width>640</width>
      <height>427</height>
      <depth>3</depth>
      </size>
      <segmented>0</segmented>
      <object>
      <name>chanel</name>
      <pose>Unspecified</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
      <xmin>344</xmin>
      <ymin>10</ymin>
      <xmax>422</xmax>
      <ymax>83</ymax>
      </bndbox>
      </object>
      <object>
      <name>chanel</name>
      <pose>Unspecified</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
      <xmin>355</xmin>
      <ymin>165</ymin>
      <xmax>443</xmax>
      <ymax>206</ymax>
      </bndbox>
      </object>
      </annotation>


      What is the cleanest way of retrieving for example the fields filename and bndbox in Python?



      I was trying to ElementTree, which seems to be the official Python solution, but I can't make it work.



      My code so far:



      from xml.etree import ElementTree as ET
      tree = ET.parse("data/all/annotations/" + file)
      fn = tree.find('filename').text
      boxes = tree.findall('bndbox')


      this produces



      fn == 'chanel1.jpg'
      boxes ==


      So it succesfully extracts the filename field, but not the bndbox'es.







      python xml python-3.x






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 15 '18 at 11:06







      Jsevillamol

















      asked Nov 15 '18 at 10:41









      JsevillamolJsevillamol

      679817




      679817






















          2 Answers
          2






          active

          oldest

          votes


















          2














          That's a quite easy solution for your problem:



          This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename
          Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.



          Hope it helps...



          import xml.etree.ElementTree as ET


          def read_content(xml_file: str):

          tree = ET.parse(xml_file)
          root = tree.getroot()

          list_with_all_boxes =

          for boxes in root.iter('object'):

          filename = root.find('filename').text

          ymin, xmin, ymax, xmax = None, None, None, None

          for box in boxes.findall("bndbox"):
          ymin = int(box.find("ymin").text)
          xmin = int(box.find("xmin").text)
          ymax = int(box.find("ymax").text)
          xmax = int(box.find("xmax").text)

          list_with_single_boxes = [xmin, ymin, xmax, ymax]
          list_with_all_boxes.append(list_with_single_boxes)

          return filename, list_with_all_boxes

          name, boxes = read_content("file.xml")





          share|improve this answer






























            0














            Here is detailed documentation and source code: https://github.com/trinath503/Python/tree/master/Generate_Pascal_VOC_Files



            NOTE: You need to pass the data in the specified formart



             pascal_voc_data = '''
            [

            "folder":"folder",
            "filename": "1.jpg",
            "path":"path",
            "source":"database":"database",
            "size":"width":256,"height":256,"depth":3,
            "segmented":0,
            "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]
            ,

            "folder":"folder",
            "filename": "2.jpg",
            "path":"path",
            "source":"database":"database",
            "size":"width":256,"height":256,"depth":3,
            "segmented":0,
            "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]

            ]
            '''





            share|improve this answer






















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317592%2freading-pascal-voc-annotations-in-python%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              That's a quite easy solution for your problem:



              This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename
              Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.



              Hope it helps...



              import xml.etree.ElementTree as ET


              def read_content(xml_file: str):

              tree = ET.parse(xml_file)
              root = tree.getroot()

              list_with_all_boxes =

              for boxes in root.iter('object'):

              filename = root.find('filename').text

              ymin, xmin, ymax, xmax = None, None, None, None

              for box in boxes.findall("bndbox"):
              ymin = int(box.find("ymin").text)
              xmin = int(box.find("xmin").text)
              ymax = int(box.find("ymax").text)
              xmax = int(box.find("xmax").text)

              list_with_single_boxes = [xmin, ymin, xmax, ymax]
              list_with_all_boxes.append(list_with_single_boxes)

              return filename, list_with_all_boxes

              name, boxes = read_content("file.xml")





              share|improve this answer



























                2














                That's a quite easy solution for your problem:



                This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename
                Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.



                Hope it helps...



                import xml.etree.ElementTree as ET


                def read_content(xml_file: str):

                tree = ET.parse(xml_file)
                root = tree.getroot()

                list_with_all_boxes =

                for boxes in root.iter('object'):

                filename = root.find('filename').text

                ymin, xmin, ymax, xmax = None, None, None, None

                for box in boxes.findall("bndbox"):
                ymin = int(box.find("ymin").text)
                xmin = int(box.find("xmin").text)
                ymax = int(box.find("ymax").text)
                xmax = int(box.find("xmax").text)

                list_with_single_boxes = [xmin, ymin, xmax, ymax]
                list_with_all_boxes.append(list_with_single_boxes)

                return filename, list_with_all_boxes

                name, boxes = read_content("file.xml")





                share|improve this answer

























                  2












                  2








                  2







                  That's a quite easy solution for your problem:



                  This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename
                  Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.



                  Hope it helps...



                  import xml.etree.ElementTree as ET


                  def read_content(xml_file: str):

                  tree = ET.parse(xml_file)
                  root = tree.getroot()

                  list_with_all_boxes =

                  for boxes in root.iter('object'):

                  filename = root.find('filename').text

                  ymin, xmin, ymax, xmax = None, None, None, None

                  for box in boxes.findall("bndbox"):
                  ymin = int(box.find("ymin").text)
                  xmin = int(box.find("xmin").text)
                  ymax = int(box.find("ymax").text)
                  xmax = int(box.find("xmax").text)

                  list_with_single_boxes = [xmin, ymin, xmax, ymax]
                  list_with_all_boxes.append(list_with_single_boxes)

                  return filename, list_with_all_boxes

                  name, boxes = read_content("file.xml")





                  share|improve this answer













                  That's a quite easy solution for your problem:



                  This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename
                  Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.



                  Hope it helps...



                  import xml.etree.ElementTree as ET


                  def read_content(xml_file: str):

                  tree = ET.parse(xml_file)
                  root = tree.getroot()

                  list_with_all_boxes =

                  for boxes in root.iter('object'):

                  filename = root.find('filename').text

                  ymin, xmin, ymax, xmax = None, None, None, None

                  for box in boxes.findall("bndbox"):
                  ymin = int(box.find("ymin").text)
                  xmin = int(box.find("xmin").text)
                  ymax = int(box.find("ymax").text)
                  xmax = int(box.find("xmax").text)

                  list_with_single_boxes = [xmin, ymin, xmax, ymax]
                  list_with_all_boxes.append(list_with_single_boxes)

                  return filename, list_with_all_boxes

                  name, boxes = read_content("file.xml")






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 18 '18 at 11:32









                  pix_1pix_1

                  364




                  364























                      0














                      Here is detailed documentation and source code: https://github.com/trinath503/Python/tree/master/Generate_Pascal_VOC_Files



                      NOTE: You need to pass the data in the specified formart



                       pascal_voc_data = '''
                      [

                      "folder":"folder",
                      "filename": "1.jpg",
                      "path":"path",
                      "source":"database":"database",
                      "size":"width":256,"height":256,"depth":3,
                      "segmented":0,
                      "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]
                      ,

                      "folder":"folder",
                      "filename": "2.jpg",
                      "path":"path",
                      "source":"database":"database",
                      "size":"width":256,"height":256,"depth":3,
                      "segmented":0,
                      "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]

                      ]
                      '''





                      share|improve this answer



























                        0














                        Here is detailed documentation and source code: https://github.com/trinath503/Python/tree/master/Generate_Pascal_VOC_Files



                        NOTE: You need to pass the data in the specified formart



                         pascal_voc_data = '''
                        [

                        "folder":"folder",
                        "filename": "1.jpg",
                        "path":"path",
                        "source":"database":"database",
                        "size":"width":256,"height":256,"depth":3,
                        "segmented":0,
                        "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]
                        ,

                        "folder":"folder",
                        "filename": "2.jpg",
                        "path":"path",
                        "source":"database":"database",
                        "size":"width":256,"height":256,"depth":3,
                        "segmented":0,
                        "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]

                        ]
                        '''





                        share|improve this answer

























                          0












                          0








                          0







                          Here is detailed documentation and source code: https://github.com/trinath503/Python/tree/master/Generate_Pascal_VOC_Files



                          NOTE: You need to pass the data in the specified formart



                           pascal_voc_data = '''
                          [

                          "folder":"folder",
                          "filename": "1.jpg",
                          "path":"path",
                          "source":"database":"database",
                          "size":"width":256,"height":256,"depth":3,
                          "segmented":0,
                          "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]
                          ,

                          "folder":"folder",
                          "filename": "2.jpg",
                          "path":"path",
                          "source":"database":"database",
                          "size":"width":256,"height":256,"depth":3,
                          "segmented":0,
                          "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]

                          ]
                          '''





                          share|improve this answer













                          Here is detailed documentation and source code: https://github.com/trinath503/Python/tree/master/Generate_Pascal_VOC_Files



                          NOTE: You need to pass the data in the specified formart



                           pascal_voc_data = '''
                          [

                          "folder":"folder",
                          "filename": "1.jpg",
                          "path":"path",
                          "source":"database":"database",
                          "size":"width":256,"height":256,"depth":3,
                          "segmented":0,
                          "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]
                          ,

                          "folder":"folder",
                          "filename": "2.jpg",
                          "path":"path",
                          "source":"database":"database",
                          "size":"width":256,"height":256,"depth":3,
                          "segmented":0,
                          "objects":["name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33,"name":"name","pose":"pose","truncated":"truncated","occluded":"occluded","bndbox":"xmin":3,"xmax":33,"ymin":3,"ymax":33]

                          ]
                          '''






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Feb 26 at 11:23









                          Trinath ReddyTrinath Reddy

                          1




                          1



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317592%2freading-pascal-voc-annotations-in-python%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              這個網誌中的熱門文章

                              What does pagestruct do in Eviews?

                              Dutch intervention in Lombok and Karangasem

                              Channel Islands