Skip to content

Use a python linter to clean-up the code #46

Description

@vincent-octo

A linter will help finding things like imported functions that are not used, variables that are never used, etc.

This can probably be more efficient by coupling to an AI agent that writes the fixes and iterates until no linting warnings are found.

See output of running the ruff linter on the current codebase
[E401](https://docs.astral.sh/ruff/rules/multiple-imports-on-one-line) [*] Multiple imports on one line
 --> src/kanta/core/duplicates.py:1:1
  |
1 | import argparse, gzip, os
  | ^^^^^^^^^^^^^^^^^^^^^^^^^
2 | from operator import itemgetter
3 | from datetime import datetime
  |
help: Split imports

[F841](https://docs.astral.sh/ruff/rules/unused-variable) Local variable `date` is assigned to but never used
  --> src/kanta/core/duplicates.py:10:5
   |
 9 | def main(args):
10 |     date = datetime.now().strftime("%Y_%m_%d")
   |     ^^^^
11 |     unique = os.path.join(args.out, f"{args.prefix}.txt.gz")
12 |     dups = os.path.join(args.out, f"{args.prefix}_duplicates.txt.gz")
   |
help: Remove assignment to unused variable `date`

[F841](https://docs.astral.sh/ruff/rules/unused-variable) Local variable `err_count` is assigned to but never used
  --> src/kanta/core/duplicates.py:24:29
   |
22 |         print(cols)
23 |         print(f"bash {','.join(map(str,[elem +1 for elem in cols]))}")
24 |         dup_count = count = err_count = 0
   |                             ^^^^^^^^^
25 |         print(itemgetter(*cols)(header.strip().split()))
26 |         total_lines = 0
   |
help: Remove assignment to unused variable `err_count`

[F541](https://docs.astral.sh/ruff/rules/f-string-missing-placeholders) [*] f-string without any placeholders
  --> src/kanta/core/duplicates.py:48:11
   |
46 |     dup_rate = round(dup_count / total, 4) if total > 0 else 0
47 |
48 |     print(f"\nResults:")
   |           ^^^^^^^^^^^^^
49 |     print(f"Unique entries: {count}")
50 |     print(f"Duplicates: {dup_count}")
   |
help: Remove extraneous `f` prefix

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `numpy` imported but unused
 --> src/kanta/core/filters/extract.py:3:17
  |
1 | import pandas as pd
2 | import re
3 | import numpy as np
  |                 ^^
4 |
5 | def extract_all(df,args):
  |
help: Remove unused import: `numpy`

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
  --> src/kanta/core/filters/extract.py:46:19
   |
44 |     ft_df = col_copy.loc[status_mask].str.split(' ', expand=True, n=4).reindex(columns=[0, 1, 2, 3])
45 |     ft_df.columns = ['comp', 'value', 'unit','extra']
46 |     if ft_df.empty: return df
   |                   ^
47 |     ft_df['ft'] = col_copy.loc[status_mask]
48 |     ft_df['comp'] = ft_df['comp'].replace("alle", "<", regex=True).replace("yli", ">", regex=True)
   |

[F841](https://docs.astral.sh/ruff/rules/unused-variable) Local variable `ft_col` is assigned to but never used
   --> src/kanta/core/filters/extract.py:131:5
    |
129 | def extract_plus_ab(df,args):
130 |
131 |     ft_col = "MEASUREMENT_FREE_TEXT"
    |     ^^^^^^
132 |     pos_col = "extracted::IS_POS"
133 |     out_col = "extracted::TEST_OUTCOME_TEXT"
    |
help: Remove assignment to unused variable `ft_col`

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `numpy` imported but unused
 --> src/kanta/core/filters/outcome.py:2:17
  |
1 | import pandas as pd
2 | import numpy as np
  |                 ^^
  |
help: Remove unused import: `numpy`

[F841](https://docs.astral.sh/ruff/rules/unused-variable) Local variable `ops` is assigned to but never used
  --> src/kanta/core/filters/qc.py:42:5
   |
40 |     df['QC_PASS'] = df['QC_PASS'].astype(int)
41 |
42 |     ops = {
   |     ^^^
43 |         '<': operator.lt, '<=': operator.le, '>': operator.gt,
44 |         '>=': operator.ge, '==': operator.eq, '!=': operator.ne
   |
help: Remove assignment to unused variable `ops`

[F821](https://docs.astral.sh/ruff/rules/undefined-name) Undefined name `val_as_float`
   --> src/kanta/core/filters/qc.py:245:21
    |
243 |             mask &= df[cmp_col].astype(float, errors='ignore') > thr
244 |         elif op == "==":
245 |             mask &= val_as_float == thr
    |                     ^^^^^^^^^^^^
246 |         else:
247 |             raise ValueError(f"Unsupported operator: {op}")
    |

[E401](https://docs.astral.sh/ruff/rules/multiple-imports-on-one-line) [*] Multiple imports on one line
 --> src/kanta/core/main.py:2:1
  |
1 | import pandas as pd
2 | import argparse,logging,os,sys
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3 | from functools import partial
4 | import multiprocessing as mp
  |
help: Split imports

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `sys` imported but unused
 --> src/kanta/core/main.py:2:28
  |
1 | import pandas as pd
2 | import argparse,logging,os,sys
  |                            ^^^
3 | from functools import partial
4 | import multiprocessing as mp
  |
help: Remove unused import: `sys`

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `utils.read_map` imported but unused
 --> src/kanta/core/main.py:7:111
  |
5 | import numpy as np
6 | from datetime import datetime
7 | from utils import file_exists,log_levels,configure_logging,make_sure_path_exists,progressBar,batched,mapcount,read_map,estimate_lines,…
  |                                                                                                               ^^^^^^^^
8 | from magic_config import config
9 | from datetime import datetime
  |
help: Remove unused import: `utils.read_map`

[F811](https://docs.astral.sh/ruff/rules/redefined-while-unused) [*] Redefinition of unused `datetime` from line 6
  --> src/kanta/core/main.py:6:22
   |
 4 | import multiprocessing as mp
 5 | import numpy as np
 6 | from datetime import datetime
   |                      -------- previous definition of `datetime` here
 7 | from utils import file_exists,log_levels,configure_logging,make_sure_path_exists,progressBar,batched,mapcount,read_map,estimate_lines,…
 8 | from magic_config import config
 9 | from datetime import datetime
   |                      ^^^^^^^^ `datetime` redefined here
10 | from filters.extract import extract_all
11 | from filters.qc import qc
   |
help: Remove definition: `datetime`

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/core/main.py:196:15
    |
194 |     args.chunk_size = max(args.chunk_size,args.mp)
195 |     args.out_file = os.path.join(args.out,f"{args.prefix}.txt")
196 |     if args.gz: args.out_file += ".gz"
    |               ^
197 |
198 |     # Setup pandas
    |

[E401](https://docs.astral.sh/ruff/rules/multiple-imports-on-one-line) [*] Multiple imports on one line
 --> src/kanta/core/utils.py:1:1
  |
1 | import os,logging,sys,errno,gzip,mmap,math
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 | from itertools import islice,zip_longest
3 | from collections import defaultdict as dd
  |
help: Split imports

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `itertools.zip_longest` imported but unused
 --> src/kanta/core/utils.py:2:30
  |
1 | import os,logging,sys,errno,gzip,mmap,math
2 | from itertools import islice,zip_longest
  |                              ^^^^^^^^^^^
3 | from collections import defaultdict as dd
4 | from functools import partial
  |
help: Remove unused import: `itertools.zip_longest`

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `urllib.request` imported but unused
 --> src/kanta/core/utils.py:6:8
  |
4 | from functools import partial
5 | import pandas as pd
6 | import urllib.request
  |        ^^^^^^^^^^^^^^
7 | import http.client as httplib
8 | from pathlib import Path
  |
help: Remove unused import: `urllib.request`

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/core/utils.py:101:41
    |
 99 |     # setup error file
100 |     args.err_file = os.path.join(args.out,f"{args.prefix}_err.txt")
101 |     with open(args.err_file,'wt') as err:err.write('\t'.join(args.config['err_cols']) + '\n')
    |                                         ^
102 |     args.warn_file = os.path.join(args.out,f"{args.prefix}_warn.txt")
103 |     with open(args.warn_file,'wt') as warn:warn.write('\t'.join(args.config['err_cols']) + '\n')
    |

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/core/utils.py:103:43
    |
101 |     with open(args.err_file,'wt') as err:err.write('\t'.join(args.config['err_cols']) + '\n')
102 |     args.warn_file = os.path.join(args.out,f"{args.prefix}_warn.txt")
103 |     with open(args.warn_file,'wt') as warn:warn.write('\t'.join(args.config['err_cols']) + '\n')
    |                                           ^
    |

[E722](https://docs.astral.sh/ruff/rules/bare-except) Do not use bare `except`
   --> src/kanta/core/utils.py:123:5
    |
121 |     try:
122 |         return count_lines(filename)
123 |     except:
    |     ^^^^^^
124 |         return 0
    |

[E741](https://docs.astral.sh/ruff/rules/ambiguous-variable-name) Ambiguous variable name: `l`
   --> src/kanta/core/utils.py:145:24
    |
143 |         else:
144 |             with gzip.open(f, 'rb') as f:
145 |                 for i, l in enumerate(f):pass
    |                        ^
146 |             note = 'exact'
147 |             size = i+1
    |

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/core/utils.py:145:41
    |
143 |         else:
144 |             with gzip.open(f, 'rb') as f:
145 |                 for i, l in enumerate(f):pass
    |                                         ^
146 |             note = 'exact'
147 |             size = i+1
    |

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `re` imported but unused
 --> src/kanta/finngen_qc/filters/filter_minimal.py:3:8
  |
1 | import pandas as pd
2 | import numpy as np
3 | import re
  |        ^^
  |
help: Remove unused import: `re`

[F841](https://docs.astral.sh/ruff/rules/unused-variable) Local variable `values` is assigned to but never used
  --> src/kanta/finngen_qc/filters/filter_minimal.py:42:5
   |
41 |     # replace problematic characters in abbrevation (strange minus sign)
42 |     values = args.config['abbreviation_replacements']
   |     ^^^^^^
43 |     abb_df = df[['ROW_ID', 'APPROX_EVENT_DATETIME','TEST_NAME_ABBREVIATION','MEASUREMENT_UNIT']].copy()
44 |     for rep in args.config['abbreviation_replacements']:
   |
help: Remove assignment to unused variable `values`

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `pandas` imported but unused
 --> src/kanta/finngen_qc/filters/fix_unit.py:1:18
  |
1 | import pandas as pd
  |                  ^^
2 | import re
3 | import numpy as np
  |
help: Remove unused import: `pandas`

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `numpy` imported but unused
 --> src/kanta/finngen_qc/filters/fix_unit.py:3:17
  |
1 | import pandas as pd
2 | import re
3 | import numpy as np
  |                 ^^
  |
help: Remove unused import: `numpy`

[E401](https://docs.astral.sh/ruff/rules/multiple-imports-on-one-line) [*] Multiple imports on one line
 --> src/kanta/finngen_qc/main.py:2:1
  |
1 | import pandas as pd
2 | import argparse,logging,os
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^
3 | from functools import partial
4 | import multiprocessing as mp
  |
help: Split imports

[F811](https://docs.astral.sh/ruff/rules/redefined-while-unused) [*] Redefinition of unused `datetime` from line 6
  --> src/kanta/finngen_qc/main.py:6:22
   |
 4 | import multiprocessing as mp
 5 | import numpy as np
 6 | from datetime import datetime
   |                      -------- previous definition of `datetime` here
 7 | from utils import file_exists,log_levels,configure_logging,make_sure_path_exists,progressBar,batched,mapcount,read_map,estimate_lines,…
 8 | from magic_config import config
 9 | from datetime import datetime
   |                      ^^^^^^^^ `datetime` redefined here
10 | from filters.filter_minimal import filter_minimal
11 | from filters.fix_unit import unit_fixing
   |
help: Remove definition: `datetime`

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/finngen_qc/main.py:202:15
    |
200 |     args.chunk_size = max(args.chunk_size,args.mp)
201 |     args.out_file = os.path.join(args.out,f"{args.prefix}_munged.txt")
202 |     if args.gz: args.out_file += ".gz"
    |               ^
203 |
204 |     # Setup pandas
    |

[E401](https://docs.astral.sh/ruff/rules/multiple-imports-on-one-line) [*] Multiple imports on one line
 --> src/kanta/finngen_qc/test/edit_test.py:1:1
  |
1 | import sys,os,random,string
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 | import pandas as pd
3 | import numpy as np
  |
help: Split imports

[E401](https://docs.astral.sh/ruff/rules/multiple-imports-on-one-line) [*] Multiple imports on one line
 --> src/kanta/finngen_qc/utils.py:1:1
  |
1 | import os,logging,sys,errno,gzip,mmap,math
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2 | from itertools import islice,zip_longest
3 | from collections import defaultdict as dd
  |
help: Split imports

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `itertools.zip_longest` imported but unused
 --> src/kanta/finngen_qc/utils.py:2:30
  |
1 | import os,logging,sys,errno,gzip,mmap,math
2 | from itertools import islice,zip_longest
  |                              ^^^^^^^^^^^
3 | from collections import defaultdict as dd
4 | from functools import partial
  |
help: Remove unused import: `itertools.zip_longest`

[F401](https://docs.astral.sh/ruff/rules/unused-import) [*] `http.client` imported but unused
 --> src/kanta/finngen_qc/utils.py:7:23
  |
5 | import pandas as pd
6 | import urllib.request
7 | import http.client as httplib
  |                       ^^^^^^^
8 | dir_path = os.path.dirname(os.path.realpath(__file__))
  |
help: Remove unused import: `http.client`

[E712](https://docs.astral.sh/ruff/rules/true-false-comparison) Avoid equality comparisons to `True`; use `...:` for truth checks
  --> src/kanta/finngen_qc/utils.py:41:60
   |
39 |     #PROCESSING OF INPUT TABLES TO FIX UNITS, FILTER VALUES ETC
40 |     assert args.config['usagi_units']['ADD_INFO:UniqueForLab'].dtype=='bool'
41 |     args.config['usagi_units'] =args.config['usagi_units'][args.config['usagi_units']['ADD_INFO:UniqueForLab'] == True]
   |                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
42 |     args.config['unit_conversion']= args.config['unit_conversion'].rename(columns={'source_unit_valid':'MEASUREMENT_UNIT'})
43 |     args.config['unit_conversion']['only_to_omop_concepts']= args.config['unit_conversion']['only_to_omop_concepts'].astype("Int64")
   |
help: Replace comparison

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/finngen_qc/utils.py:127:41
    |
125 |     # setup error file
126 |     args.err_file = os.path.join(args.out,f"{args.prefix}_err.txt")
127 |     with open(args.err_file,'wt') as err:err.write('\t'.join(args.config['err_cols']) + '\n')
    |                                         ^
128 |     args.warn_file = os.path.join(args.out,f"{args.prefix}_warn.txt")
129 |     with open(args.warn_file,'wt') as warn:warn.write('\t'.join(args.config['err_cols']) + '\n')
    |

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/finngen_qc/utils.py:129:43
    |
127 |     with open(args.err_file,'wt') as err:err.write('\t'.join(args.config['err_cols']) + '\n')
128 |     args.warn_file = os.path.join(args.out,f"{args.prefix}_warn.txt")
129 |     with open(args.warn_file,'wt') as warn:warn.write('\t'.join(args.config['err_cols']) + '\n')
    |                                           ^
130 |
131 |     args.unit_file = os.path.join(args.out,f"{args.prefix}_unit.txt")
    |

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/finngen_qc/utils.py:132:43
    |
131 |     args.unit_file = os.path.join(args.out,f"{args.prefix}_unit.txt")
132 |     with open(args.unit_file,'wt') as unit:unit.write('\t'.join(['ROW_ID','TEST_DATE_TIME','TEST_NAME_ABBREVIATION','old_unit','MEASU…
    |                                           ^
133 |
134 |     args.abbr_file = os.path.join(args.out,f"{args.prefix}_abbr.txt")
    |

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/finngen_qc/utils.py:135:43
    |
134 |     args.abbr_file = os.path.join(args.out,f"{args.prefix}_abbr.txt")
135 |     with open(args.abbr_file,'wt') as abbr:abbr.write('\t'.join(['ROW_ID','TEST_DATE_TIME','old_abbr','MEASUREMENT_UNIT','TEST_NAME_A…
    |                                           ^
136 |
137 | def mapcount(filename):
    |

[E722](https://docs.astral.sh/ruff/rules/bare-except) Do not use bare `except`
   --> src/kanta/finngen_qc/utils.py:143:5
    |
141 |     try:
142 |         return count_lines(filename)
143 |     except:
    |     ^^^^^^
144 |         return 0
    |

[E741](https://docs.astral.sh/ruff/rules/ambiguous-variable-name) Ambiguous variable name: `l`
   --> src/kanta/finngen_qc/utils.py:165:24
    |
163 |         else:
164 |             with gzip.open(f, 'rb') as f:
165 |                 for i, l in enumerate(f):pass
    |                        ^
166 |             note = 'exact'
167 |             size = i+1
    |

[E701](https://docs.astral.sh/ruff/rules/multiple-statements-on-one-line-colon) Multiple statements on one line (colon)
   --> src/kanta/finngen_qc/utils.py:165:41
    |
163 |         else:
164 |             with gzip.open(f, 'rb') as f:
165 |                 for i, l in enumerate(f):pass
    |                                         ^
166 |             note = 'exact'
167 |             size = i+1
    |

Found 42 errors.
[*] 20 fixable with the `--fix` option (6 hidden fixes can be enabled with the `--unsafe-fixes` option).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions