Use Python to merge multiple csv / xls / xlsx files

By | November 19, 2019

For the record, this article does not use the openpyx package. The xlrd and xlwt packages are used. Although the functions of these two libraries cannot be compared with openpyx, they can operate the old version of excel with the extension like xls which openpyx does not support.

The general idea is as follows.

  • Traverse to get all excel files in the root directory
  • Match based on excel name to get a specific type of excel
  • Create a workbook for writing copied data
  • Each excel has a Sheet1, loop through the cells to write the created workbook.
import xlrd
import xlwt
import os
import re

def write_excel(path, write_sheet):
    book = xlrd.open_workbook(path)
    read_sheet = book.sheet_by_name('Sheet1')
    for row in  range(read_sheet.nrows):
        for col in  range(read_sheet.ncols):
            write_sheet.write(row, col, read_sheet.cell_value(row,col))
            
def walk(path):
  for root,dirs,names in os.walk(path):
    list = []
    for filename in names:
        path = os.path.join(root, filename)
        list.append(path)
    return list

if __name__ == "__main__":
    write_book = xlwt.Workbook()
    root = r'D:\DropBox\file'
    path_list = walk(root)
    for path in path_list:
        val = path.find("keyword")
        if val!=-1:
            ser = re.search('.*20200403(.*?).xls',path)
            name = ser.group(1)
            write_sheet = write_book.add_sheet(name)
            write_excel(path, write_sheet)
    write_book.save(r'merged_file.xls')