Python3 - 如何读取xml文件

Python自带xml库,可以通过xml.dom读取xml文件。比如有如下xml文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?xml version="1.0"?>
<company>
<name>mediumcn ltd</name>
<staff id="1001">
<nickname>Ben</nickname>
<salary>30,000</salary>
</staff>
<staff id="1002">
<nickname>Jim</nickname>
<salary>30,000</salary>
</staff>
<staff id="1003">
<nickname>Alen</nickname>
<salary>40,000</salary>
</staff>
</company>
代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from xml.dom import minidom
doc = minidom.parse("assets/test.xml")
name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
sid = staff.getAttribute("id")
nickname = staff.getElementsByTagName("nickname")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:%s, nickname:%s, salary:%s" %
(sid, nickname.firstChild.data, salary.firstChild.data))
输出:
1
2
3
4
mediumcn ltd
id:1001, nickname:Ben, salary:30,000
id:1002, nickname:Jim, salary:30,000
id:1003, nickname:Alen, salary:40,000
这种方法不太严谨,没有判断节点是否为叶子节点,就打印了data。更严谨的做法如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def getNodeText(node):
nodelist = node.childNodes
result = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
result.append(node.data)
return ''.join(result)
name = doc.getElementsByTagName("name")[0]
print("Node Name : %s" % name.nodeName)
print("Node Value : %s \n" % getNodeText(name))
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
sid = staff.getAttribute("id")
nickname = staff.getElementsByTagName("nickname")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:%s, nickname:%s, salary:%s" %
(sid, getNodeText(nickname), getNodeText(salary)))
通过判断if node.nodeType == node.TEXT_NODE:
来准确识别xml叶子节点,这样可以避免数据错误。
以上代码在python3.7测试通过。