Python3 - 如何读取xml文件

Python3 - 如何读取xml文件

Python自带xml库,可以通过xml.dom读取xml文件。比如有如下xml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?xml version="1.0"?>
<company>
	<name>mediumcn ltd</name>
	<staff id="1001">
		<nickname>Ben</nickname>
		<salary>30,000</salary>
	</staff>
	<staff id="1002">
		<nickname>Jim</nickname>
		<salary>30,000</salary>
	</staff>
	<staff id="1003">
		<nickname>Alen</nickname>
		<salary>40,000</salary>
	</staff>
</company>

代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from xml.dom import minidom

doc = minidom.parse("assets/test.xml")

name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)

staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        sid = staff.getAttribute("id")
        nickname = staff.getElementsByTagName("nickname")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:%s, nickname:%s, salary:%s" %
              (sid, nickname.firstChild.data, salary.firstChild.data))

输出:

1
2
3
4
mediumcn ltd
id:1001, nickname:Ben, salary:30,000
id:1002, nickname:Jim, salary:30,000
id:1003, nickname:Alen, salary:40,000

这种方法不太严谨,没有判断节点是否为叶子节点,就打印了data。更严谨的做法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def getNodeText(node):
    nodelist = node.childNodes
    result = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            result.append(node.data)
    return ''.join(result)

name = doc.getElementsByTagName("name")[0]
print("Node Name : %s" % name.nodeName)
print("Node Value : %s \n" % getNodeText(name))


staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        sid = staff.getAttribute("id")
        nickname = staff.getElementsByTagName("nickname")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:%s, nickname:%s, salary:%s" %
              (sid, getNodeText(nickname), getNodeText(salary)))

通过判断if node.nodeType == node.TEXT_NODE:来准确识别xml叶子节点,这样可以避免数据错误。

以上代码在python3.7测试通过。

Rating: