parse xml file when element contains smth. special with python

i would like to parse an XML file and write some parts into a csv file. I will do it with python. I am pretty new to programming and XML. I read a lot, but i couldn't found a useful example for my problem.

My XML file looks like this:

<Host name="1.1.1.1">
 <Properties>
 <tag name="id">1</tag>
 <tag name="os">windows</tag>
 <tag name="ip">1.11.111.1</tag>
 </Properties>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Update status:
 last detected: 2015-12-02 18:48:28
 last downloaded: 2015-11-17 12:34:22
 last installed: 2015-11-23 01:05:32

 Automatic settings:.....
 </output>
 </Report>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Environment Options:

 Automatic settings:.....
 </output>
 </Report>
</Host>

My XML file contains 500 of this entries! I just want to parse XML blocks where the output contains Update status, because i want to write the 3 dates (last detected, last downloaded and last installed in my CSV file. I would also add the id, os and ip.

I tried it with ElementTree library but i am not able to filter element.text where the output contains Update status. For the moment i am able to extract all text and attributes from the whole file but i am not able to filter blocks where my output contains Update status, last detected, last downloaded or last installed.

Can anyone give some advice how to achieve this?

desired output:

id:1
os:windows 
ip:1.11.111.1 
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22 
last installed:2015-11-23 01:05:32

all of this infos written in a .csv file

At the moment my code looks like this:

#!/usr/bin/env python
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("file.xml")
root = tree.getroot()

# open csv file for writing
data = open('test.csv', 'w')

# create csv writer object
csvwriter = csv.writer(data)

# filter xml file
for tag in root.findall(".Host/Properties/tag[@name='ip']"):print(tag.text) # gives all ip's from whole xml 
for output in root.iter('output'):print(plugin.text) # gives all outputs from whole xml
data.close()

Best regards

edited Nov 14 '18 at 12:17

asked Nov 14 '18 at 11:41

erDi

After "Update status:" will there always be 3 lines, or is it variable?

– Rodolfo Donã Hosp
Nov 14 '18 at 11:43

there will always be 3 lines

– erDi
Nov 14 '18 at 11:46

Could you please include the desired output?

– zipa
Nov 14 '18 at 11:53

"I tried it with ElementTree library but i am not able to filter element.text where the output contains Update status." - That's a good start. Please show that bit of code.

– Tomalak
Nov 14 '18 at 11:54

add a comment |

My XML file looks like this:

<Host name="1.1.1.1">
 <Properties>
 <tag name="id">1</tag>
 <tag name="os">windows</tag>
 <tag name="ip">1.11.111.1</tag>
 </Properties>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Update status:
 last detected: 2015-12-02 18:48:28
 last downloaded: 2015-11-17 12:34:22
 last installed: 2015-11-23 01:05:32

 Automatic settings:.....
 </output>
 </Report>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Environment Options:

 Automatic settings:.....
 </output>
 </Report>
</Host>

Can anyone give some advice how to achieve this?

desired output:

id:1
os:windows 
ip:1.11.111.1 
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22 
last installed:2015-11-23 01:05:32

all of this infos written in a .csv file

At the moment my code looks like this:

#!/usr/bin/env python
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("file.xml")
root = tree.getroot()

# open csv file for writing
data = open('test.csv', 'w')

# create csv writer object
csvwriter = csv.writer(data)

# filter xml file
for tag in root.findall(".Host/Properties/tag[@name='ip']"):print(tag.text) # gives all ip's from whole xml 
for output in root.iter('output'):print(plugin.text) # gives all outputs from whole xml
data.close()

Best regards

edited Nov 14 '18 at 12:17

asked Nov 14 '18 at 11:41

erDi

After "Update status:" will there always be 3 lines, or is it variable?

– Rodolfo Donã Hosp
Nov 14 '18 at 11:43

there will always be 3 lines

– erDi
Nov 14 '18 at 11:46

Could you please include the desired output?

– zipa
Nov 14 '18 at 11:53

"I tried it with ElementTree library but i am not able to filter element.text where the output contains Update status." - That's a good start. Please show that bit of code.

– Tomalak
Nov 14 '18 at 11:54

add a comment |

My XML file looks like this:

<Host name="1.1.1.1">
 <Properties>
 <tag name="id">1</tag>
 <tag name="os">windows</tag>
 <tag name="ip">1.11.111.1</tag>
 </Properties>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Update status:
 last detected: 2015-12-02 18:48:28
 last downloaded: 2015-11-17 12:34:22
 last installed: 2015-11-23 01:05:32

 Automatic settings:.....
 </output>
 </Report>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Environment Options:

 Automatic settings:.....
 </output>
 </Report>
</Host>

Can anyone give some advice how to achieve this?

desired output:

id:1
os:windows 
ip:1.11.111.1 
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22 
last installed:2015-11-23 01:05:32

all of this infos written in a .csv file

At the moment my code looks like this:

#!/usr/bin/env python
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("file.xml")
root = tree.getroot()

# open csv file for writing
data = open('test.csv', 'w')

# create csv writer object
csvwriter = csv.writer(data)

# filter xml file
for tag in root.findall(".Host/Properties/tag[@name='ip']"):print(tag.text) # gives all ip's from whole xml 
for output in root.iter('output'):print(plugin.text) # gives all outputs from whole xml
data.close()

Best regards

edited Nov 14 '18 at 12:17

asked Nov 14 '18 at 11:41

erDi

My XML file looks like this:

<Host name="1.1.1.1">
 <Properties>
 <tag name="id">1</tag>
 <tag name="os">windows</tag>
 <tag name="ip">1.11.111.1</tag>
 </Properties>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Update status:
 last detected: 2015-12-02 18:48:28
 last downloaded: 2015-11-17 12:34:22
 last installed: 2015-11-23 01:05:32

 Automatic settings:.....
 </output>
 </Report>
 <Report id="123">
 <output>
 Host is configured to get updates from another server.

 Environment Options:

 Automatic settings:.....
 </output>
 </Report>
</Host>

Can anyone give some advice how to achieve this?

desired output:

id:1
os:windows 
ip:1.11.111.1 
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22 
last installed:2015-11-23 01:05:32

all of this infos written in a .csv file

At the moment my code looks like this:

#!/usr/bin/env python
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("file.xml")
root = tree.getroot()

# open csv file for writing
data = open('test.csv', 'w')

# create csv writer object
csvwriter = csv.writer(data)

# filter xml file
for tag in root.findall(".Host/Properties/tag[@name='ip']"):print(tag.text) # gives all ip's from whole xml 
for output in root.iter('output'):print(plugin.text) # gives all outputs from whole xml
data.close()

Best regards

python xml

edited Nov 14 '18 at 12:17

asked Nov 14 '18 at 11:41

erDi

edited Nov 14 '18 at 12:17

asked Nov 14 '18 at 11:41

erDi

edited Nov 14 '18 at 12:17

asked Nov 14 '18 at 11:41

erDi

asked Nov 14 '18 at 11:41

erDi

asked Nov 14 '18 at 11:41

erDi

After "Update status:" will there always be 3 lines, or is it variable?

– Rodolfo Donã Hosp
Nov 14 '18 at 11:43

there will always be 3 lines

– erDi
Nov 14 '18 at 11:46

Could you please include the desired output?

– zipa
Nov 14 '18 at 11:53

"I tried it with ElementTree library but i am not able to filter element.text where the output contains Update status." - That's a good start. Please show that bit of code.

– Tomalak
Nov 14 '18 at 11:54

add a comment |

After "Update status:" will there always be 3 lines, or is it variable?

– Rodolfo Donã Hosp
Nov 14 '18 at 11:43

there will always be 3 lines

– erDi
Nov 14 '18 at 11:46

Could you please include the desired output?

– zipa
Nov 14 '18 at 11:53

"I tried it with ElementTree library but i am not able to filter element.text where the output contains Update status." - That's a good start. Please show that bit of code.

– Tomalak
Nov 14 '18 at 11:54

After "Update status:" will there always be 3 lines, or is it variable?

– Rodolfo Donã Hosp
Nov 14 '18 at 11:43

there will always be 3 lines

– erDi
Nov 14 '18 at 11:46

Could you please include the desired output?

– zipa
Nov 14 '18 at 11:53

"I tried it with ElementTree library but i am not able to filter element.text where the output contains Update status." - That's a good start. Please show that bit of code.

– Tomalak
Nov 14 '18 at 11:54

add a comment |

1 Answer
1

active

oldest

votes

It's relatively straightforward when you start at the <Host> element and work your way down.

Iterate all the nodes, but only output something when the substring "Update status:" occurs in the value of <output>:

for host in tree.iter("Host"):
 host_id = host.find('./Properties/tag[@name="id"]')
 host_os = host.find('./Properties/tag[@name="os"]')
 host_ip = host.find('./Properties/tag[@name="ip"]')

 for output in host.iter("output"):
 if output.text is not None and "Update status:" in output.text:
 print("id:" + host_id.text)
 print("os:" + host_os.text)
 print("ip:" + host_ip.text)

 for line in output.text.splitlines():
 if ("last detected:" in line or
 "last downloaded" in line or
 "last installed" in line):
 print(line.strip())

outputs this for your sample XML:

id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32

Minor point: That's not really CSV, so writing that to a *.csv file as-is wouldn't be very clean.

edited Nov 14 '18 at 15:25

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

thanks for that, i will try it out and play around with this. I tried to filter against something but didn't know against what and so on...

– erDi
Nov 14 '18 at 13:00

Sure. Tell me how it goes!

– Tomalak
Nov 14 '18 at 13:08

i get following error: if "Update status:" in output.text: TypeError: argument of type 'NoneType' is not iterable. The reason is, because i have some "os" entires which are empty and they get a NoneType.

– erDi
Nov 14 '18 at 14:25

Well... I'd say, change if statement so it can handle this situation.

– Tomalak
Nov 14 '18 at 14:29

1

I found it!!!!!I tested yours, that worked, so i changed some things in the xml and the srcipt. The problem at the line before: if "Update status:" in output.text. I changed it to: if output.text is not None and "Update status:" in output.text:

– erDi
Nov 14 '18 at 15:10

|
show 3 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53299429%2fparse-xml-file-when-element-contains-smth-special-with-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

It's relatively straightforward when you start at the <Host> element and work your way down.

Iterate all the nodes, but only output something when the substring "Update status:" occurs in the value of <output>:

for host in tree.iter("Host"):
 host_id = host.find('./Properties/tag[@name="id"]')
 host_os = host.find('./Properties/tag[@name="os"]')
 host_ip = host.find('./Properties/tag[@name="ip"]')

 for output in host.iter("output"):
 if output.text is not None and "Update status:" in output.text:
 print("id:" + host_id.text)
 print("os:" + host_os.text)
 print("ip:" + host_ip.text)

 for line in output.text.splitlines():
 if ("last detected:" in line or
 "last downloaded" in line or
 "last installed" in line):
 print(line.strip())

outputs this for your sample XML:

id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32

Minor point: That's not really CSV, so writing that to a *.csv file as-is wouldn't be very clean.

edited Nov 14 '18 at 15:25

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

thanks for that, i will try it out and play around with this. I tried to filter against something but didn't know against what and so on...

– erDi
Nov 14 '18 at 13:00

Sure. Tell me how it goes!

– Tomalak
Nov 14 '18 at 13:08

i get following error: if "Update status:" in output.text: TypeError: argument of type 'NoneType' is not iterable. The reason is, because i have some "os" entires which are empty and they get a NoneType.

– erDi
Nov 14 '18 at 14:25

Well... I'd say, change if statement so it can handle this situation.

– Tomalak
Nov 14 '18 at 14:29

1

I found it!!!!!I tested yours, that worked, so i changed some things in the xml and the srcipt. The problem at the line before: if "Update status:" in output.text. I changed it to: if output.text is not None and "Update status:" in output.text:

– erDi
Nov 14 '18 at 15:10

|
show 3 more comments

It's relatively straightforward when you start at the <Host> element and work your way down.

Iterate all the nodes, but only output something when the substring "Update status:" occurs in the value of <output>:

for host in tree.iter("Host"):
 host_id = host.find('./Properties/tag[@name="id"]')
 host_os = host.find('./Properties/tag[@name="os"]')
 host_ip = host.find('./Properties/tag[@name="ip"]')

 for output in host.iter("output"):
 if output.text is not None and "Update status:" in output.text:
 print("id:" + host_id.text)
 print("os:" + host_os.text)
 print("ip:" + host_ip.text)

 for line in output.text.splitlines():
 if ("last detected:" in line or
 "last downloaded" in line or
 "last installed" in line):
 print(line.strip())

outputs this for your sample XML:

id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32

Minor point: That's not really CSV, so writing that to a *.csv file as-is wouldn't be very clean.

edited Nov 14 '18 at 15:25

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

thanks for that, i will try it out and play around with this. I tried to filter against something but didn't know against what and so on...

– erDi
Nov 14 '18 at 13:00

Sure. Tell me how it goes!

– Tomalak
Nov 14 '18 at 13:08

i get following error: if "Update status:" in output.text: TypeError: argument of type 'NoneType' is not iterable. The reason is, because i have some "os" entires which are empty and they get a NoneType.

– erDi
Nov 14 '18 at 14:25

Well... I'd say, change if statement so it can handle this situation.

– Tomalak
Nov 14 '18 at 14:29

1

I found it!!!!!I tested yours, that worked, so i changed some things in the xml and the srcipt. The problem at the line before: if "Update status:" in output.text. I changed it to: if output.text is not None and "Update status:" in output.text:

– erDi
Nov 14 '18 at 15:10

|
show 3 more comments

It's relatively straightforward when you start at the <Host> element and work your way down.

Iterate all the nodes, but only output something when the substring "Update status:" occurs in the value of <output>:

for host in tree.iter("Host"):
 host_id = host.find('./Properties/tag[@name="id"]')
 host_os = host.find('./Properties/tag[@name="os"]')
 host_ip = host.find('./Properties/tag[@name="ip"]')

 for output in host.iter("output"):
 if output.text is not None and "Update status:" in output.text:
 print("id:" + host_id.text)
 print("os:" + host_os.text)
 print("ip:" + host_ip.text)

 for line in output.text.splitlines():
 if ("last detected:" in line or
 "last downloaded" in line or
 "last installed" in line):
 print(line.strip())

outputs this for your sample XML:

id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32

Minor point: That's not really CSV, so writing that to a *.csv file as-is wouldn't be very clean.

edited Nov 14 '18 at 15:25

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

It's relatively straightforward when you start at the <Host> element and work your way down.

Iterate all the nodes, but only output something when the substring "Update status:" occurs in the value of <output>:

for host in tree.iter("Host"):
 host_id = host.find('./Properties/tag[@name="id"]')
 host_os = host.find('./Properties/tag[@name="os"]')
 host_ip = host.find('./Properties/tag[@name="ip"]')

 for output in host.iter("output"):
 if output.text is not None and "Update status:" in output.text:
 print("id:" + host_id.text)
 print("os:" + host_os.text)
 print("ip:" + host_ip.text)

 for line in output.text.splitlines():
 if ("last detected:" in line or
 "last downloaded" in line or
 "last installed" in line):
 print(line.strip())

outputs this for your sample XML:

id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32

Minor point: That's not really CSV, so writing that to a *.csv file as-is wouldn't be very clean.

edited Nov 14 '18 at 15:25

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

edited Nov 14 '18 at 15:25

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

answered Nov 14 '18 at 12:46

Tomalak

259k51429547

thanks for that, i will try it out and play around with this. I tried to filter against something but didn't know against what and so on...

– erDi
Nov 14 '18 at 13:00

Sure. Tell me how it goes!

– Tomalak
Nov 14 '18 at 13:08

i get following error: if "Update status:" in output.text: TypeError: argument of type 'NoneType' is not iterable. The reason is, because i have some "os" entires which are empty and they get a NoneType.

– erDi
Nov 14 '18 at 14:25

Well... I'd say, change if statement so it can handle this situation.

– Tomalak
Nov 14 '18 at 14:29

1

I found it!!!!!I tested yours, that worked, so i changed some things in the xml and the srcipt. The problem at the line before: if "Update status:" in output.text. I changed it to: if output.text is not None and "Update status:" in output.text:

– erDi
Nov 14 '18 at 15:10

|
show 3 more comments

thanks for that, i will try it out and play around with this. I tried to filter against something but didn't know against what and so on...

– erDi
Nov 14 '18 at 13:00

Sure. Tell me how it goes!

– Tomalak
Nov 14 '18 at 13:08

i get following error: if "Update status:" in output.text: TypeError: argument of type 'NoneType' is not iterable. The reason is, because i have some "os" entires which are empty and they get a NoneType.

– erDi
Nov 14 '18 at 14:25

Well... I'd say, change if statement so it can handle this situation.

– Tomalak
Nov 14 '18 at 14:29

1

I found it!!!!!I tested yours, that worked, so i changed some things in the xml and the srcipt. The problem at the line before: if "Update status:" in output.text. I changed it to: if output.text is not None and "Update status:" in output.text:

– erDi
Nov 14 '18 at 15:10

thanks for that, i will try it out and play around with this. I tried to filter against something but didn't know against what and so on...

– erDi
Nov 14 '18 at 13:00

Sure. Tell me how it goes!

– Tomalak
Nov 14 '18 at 13:08

i get following error: if "Update status:" in output.text: TypeError: argument of type 'NoneType' is not iterable. The reason is, because i have some "os" entires which are empty and they get a NoneType.

– erDi
Nov 14 '18 at 14:25

Well... I'd say, change if statement so it can handle this situation.

– Tomalak
Nov 14 '18 at 14:29

I found it!!!!!I tested yours, that worked, so i changed some things in the xml and the srcipt. The problem at the line before: if "Update status:" in output.text. I changed it to: if output.text is not None and "Update status:" in output.text:

– erDi
Nov 14 '18 at 15:10

|
show 3 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

PSfL6Zs76q hZ G,DN4zKTJ40vdd6 i9Jv,F7q KkNCtOWVNVhVCHeF43,LkrJ8ZveoZslP6bb,CX8 knxLbLlOdYmKRd

搜尋此網誌

Odtnhj