Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
265 views
in Technique[技术] by (71.8m points)

python - How to filter rows inside a csv file based on their date?

I have a file named aa_20200907.txt and it looks like this:

#DATA:DD,CARS_INTERNATIONAL:VERSION01.1
2020-09-07T00:00:00.285+02:00,New-York,XX,Audi
2020-09-07T00:01:00.385+02:00,London,100,Mercedes
2020-09-07T00:02:00.255+02:00,New-York,90,Ford
2020-09-07T00:03:00.523+02:00,New-York,91,BMW
2020-09-08T00:00:58.444+02:00,New-York,12,BMW
2020-09-08T00:01:55.336+02:00,New-York,11,Mercedes

I have a code which is filtering the rows based on 2 conditions.

  1. Condition_1: I only want the rows where index[2] is a digit.
  2. Condition_2: I only want the rows if index[1] (the date) is the same as the date mentioned in the processed filename. The date of the filenames is asigned to a list named missing_dates.

Now below code works perfectly on condition_1, the problem is that condition_2 is not working the way I want it to work. Please note that I normally run this code on multiple files which means missing_dates cotains more values.

This is my code:

import csv
import datetime 
from pathlib import Path

root=Path(r'c:dataPPEDesktopest_folder')

def filter_row(r, date):  
    condition_1 = r[2].isdigit()  #<-- select only the rows if index 2 is numbers. 
    condition_2 = date != missing_date #<-- select only the rows of that specific day.
    
    return condition_1 and condition_2

missing_dates = ['20200907']

output_list = []
for missing_date in missing_dates:
    # print(f"processing {missing_date}")
    files=[fn for fn in (e for e in root.glob(f"**/*_{missing_date}.txt") if e.is_file())]
    for file in files:      
        with open(file, 'r') as log_file:
            reader = csv.reader(log_file, delimiter = ',')
            next(reader) # skip header
            for row in reader:
                if filter_row(row, missing_date):
                    output_list.append(row)
                    
print(output_list) 

This is my current output:

[]

This is the desired output:

['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes']
['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford']
['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']

*Please note that I do not want to write a whole new code. I just want to fix condition_2 and keep the current code as I feel comfortable with it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here you go buddy:

Input:


#DATA:DD,CARS_INTERNATIONAL:VERSION01.1
2020-09-07T00:00:00.285+02:00,New-York,XX,Audi
2020-09-07T00:01:00.385+02:00,London,100,Mercedes
2020-09-07T00:02:00.255+02:00,New-York,90,Ford
2020-09-07T00:03:00.523+02:00,New-York,91,BMW
2020-09-08T00:00:58.444+02:00,New-York,12,BMW
2020-09-08T00:01:55.336+02:00,New-York,11,Mercedes

Code:

import csv
import datetime 
from pathlib import Path
import os

os.chdir('/home/chandanmalla/Desktop/')

def filter_row(r, date):  
    condition_1 = r[2].isdigit()  #<-- select only the rows if index 2 is numbers. 
    condition_2 = r[0].split('T')[0] == date #<-- select only the rows of that specific day.
    return condition_1 and condition_2

missing_dates = ['2020-09-07']
file_end_name = ['20200907']

output_list = []


files=[]
for f in os.listdir():
    for m_d in file_end_name:
        if f.endswith(m_d +'.txt'):
            files.append(f)
for file,m_d in zip(files,missing_dates):
    with open(file, 'r') as log_file:
        reader = csv.reader(log_file, delimiter = ',')
        next(reader) # skip header
        for row in reader:
            if filter_row(row, m_d):
                output_list.append(row)
                
print(output_list) 


Output


[['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes'],
 ['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford'],
 ['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']]

Your code had a problem with condition_2 and also with below line of code, zero files were there when ran below piece of code.

    files=[fn for fn in (e for e in root.glob(f"**/*_{missing_date}.txt") if e.is_file())]



与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...