Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
247 views
in Technique[技术] by (71.8m points)

python - Regular Expression from long string

I have the following characters running on DataFrame:

1.83
1
71%
4.25
X
18%
4.30
2
11%
+88

I'm trying regular expression to achieve this: 1.83 4.25 4.30

I've so far tried this code snippet in pandas:

import re

clean_dict = {'[nX%+]':'','
1':''}

but this fails to remove the other unwanted characters.

I'm using regex101.com for testing

What's the best way to resolve this?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use

df['A'].str.replace(r'(?m)^(?!d+.d+$).*
*', '').str.strip()

See the regex demo. Details:

  • (?m) - re.M inline option, makes ^ match start of a line position and $ match the end of a line position
  • ^ - start of a line
  • (?!d+.d+$) - no one or more digits, . and one or more digits till the end of the line
  • .* - the whole line, zero or more chars other than line break chars, as many as possible
  • * - zero or more line feed chars.

The .str.strip() is necessary to remove the trailing newline char if there was a match at the end of the string.

Pandas test:

>>> import pandas as pd
>>> df=pd.DataFrame({'A': ['1.83
1
71%
4.25
X
18%
4.30
2
11%
+88']})
>>> df['A'].str.replace(r'(?m)^(?!d+.d+$).*
*', '').str.strip()
0    1.83
4.25
4.30
Name: A, dtype: object

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...