您需要先对df
的索引应用pandas.DataFrame.set_axis
方法设置到merged_data
上,然后再使用pandas.DataFrame.join
方法将两个DataFrame连接起来:
import pathlib
import pandas as pd
root_path = pathlib.Path('root') # 使用pathlib替代os.path
data = {}
# 使用内置的enumerate函数而非创建外部计数器
for count, (_, row) in enumerate(df.iterrows(), 1):
folder_name = row['File ID'].strip()
file_name = row['File Name'].strip()
file_path = root_path / folder_name / file_name
folder_file_id = f'{folder_name}_{file_name}'
file_data = pd.read_csv(file_path, header=None, sep='\t',
names=['Case', folder_file_id],
memory_map=True, low_memory=False)
data[folder_file_id] = file_data.set_index('Case').squeeze()
print(count)
merged_data = df.join(pd.concat(data, names=['folder_file_id'])
.unstack('Case').set_axis(df.index))
输出结果:
>>> merged_data
File ID File Name 0 1 2 3 4
0 folderA file001.txt 1234.0 5678.0 9012.0 3456.0 7890.0
1 folderB file002.txt 4567.0 8901.0 2345.0 6789.0 NaN
输入数据与我之前的答案相同:
>>> df
File ID File Name
0 folderA file001.txt
1 folderB file002.txt