Skip to content

New events, copilot user, and agent support for python 3#56

Open
Leo-Send wants to merge 32 commits into
se-sic:python3-versionfrom
Leo-Send:combined_p3
Open

New events, copilot user, and agent support for python 3#56
Leo-Send wants to merge 32 commits into
se-sic:python3-versionfrom
Leo-Send:combined_p3

Conversation

@Leo-Send

@Leo-Send Leo-Send commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This is a rebase of my previous PR onto the python3-version branch.

Leo-Send added 29 commits June 9, 2026 12:54
This allows for reconstruction of correct commit author if user is
github

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also added one comment for clarity

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also save merge commits
reconstruction of connected events is done by first saving all connected
events that occured at the same time. Then, it is possible to match
connected events iff:
- half of the involved issues are equal, meaning that one issue is
  connected to multiple others
- half rounded up of the involved isses are equal, meaning that we have
  one external connected event and then the previous case with the
remaining issues

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
since data is modified in-place, return of input data is not needed

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
ALso add commit hash if closed by commit

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also rename 'new feature' to 'feature'

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also remove duplicates from type list

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
using empty line reserved for jira components

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also added copyright header

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also minor fixes and removal of math.ceil

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
comments now each have a boolean field that describes whether the
comment contains a suggestion or not

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
dicts for reconstructing connected events are now better explained and
the comments do not disruot the workflow in the run function anymore

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
includes:
- updated comments
- spelling mistake
- fix for potential crash if script is used on old data

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
author postprocessing now also contains a list of known copilot use
names that can be extended to unify more different copilot users

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
the events 'copilot_work_started' and 'copilot_work_finished' now always
have the standard copilot user data

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Method doc updated to reflect new functionality

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
previously, the creator of the issues was falsely matched to the
connected event instead of the user triggering the event

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
unification now done on all files, which should prevent any issues
arising from unknown authors during anonymization
also move all global variables to a new utils file

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Known agentsc such as 'copilot' or 'claude' can now be read, similar to
known bots. They will be flagged as agents during bot processing.

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Add a helper function for creating bot name variants utilizing either
'[bot]' or 'bot' suffix. Also update bot processing to check user buffer
for all variants.

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Add a helper function that given a botname and a list of names, returns
which bot name variant is contained in the list (or None). This is used
whenever we check if a known bot is in the userdata or has been
predicted to be a bot, and means that botnames in the known_bots file do
not need to be duplicated for each variant.
Also, automatically add all known coplilot users to the known_agents
list, and then unify those during author postprocessing.

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also add agents to bot handling, fix formatting for event_info_2 and
subissues
also fix a typo where strings would not have their quotes correctly
removed

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
lock reason is saved in event_info_1

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
docstrings should now more accurately reflect parameters and return
values

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
For consistency with github events

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
previously removed event_info_2 for state_updated event, leading to
crashes of the issue processing. Now, it instead contains an empty
string.
Also fix a minor spelling mistake

Signed-off-by: <s8lesend@stud.uni-saarland.de>
event_info_1 should remain empty in that case

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
These fields are now replaced with empty lists when null

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
changing .keys() call on maps after rebase onto python 3 branch

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
@bockthom bockthom requested a review from Copilot June 9, 2026 13:29

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rebases prior work onto the python3-version branch and extends the issue/bot/author pipelines to better support new GitHub event types (incl. Copilot), agent classification, and additional cross-issue connection handling.

Changes:

  • Extend GitHub issue event processing with Copilot-specific handling, connected-event reconstruction, and additional event metadata (e.g., merge commit hash, commit author for commit_added).
  • Introduce shared GitHub/Copilot user utilities (github_user_utils) and integrate them into bot processing and author postprocessing (incl. optional Copilot user unification).
  • Update both GitHub and JIRA issue processing to normalize issue types (e.g., “New Feature” → “Feature”) and adjust in-place processing patterns (stop relying on returned mutated lists).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
issue_processing/jira_issue_processing.py Normalizes JIRA issue types and adjusts user-data insertion/state event generation.
issue_processing/issue_processing.py Adds Copilot/connection reconstruction logic, extends event metadata, and updates user syncing/output fields.
github_user_utils/github_user_utils.py New shared constants/helpers for GitHub/Copilot user and bot/agent name handling.
github_user_utils/init.py Initializes the new github_user_utils package.
bot_processing/bot_processing.py Adds agent list support and Copilot-agent handling; improves bot-name variant matching.
author_postprocessing/author_postprocessing.py Adds Copilot unification option and extends “GitHub noreply” replacement logic to cover Copilot and commit_added info.
Comments suppressed due to low confidence (1)

author_postprocessing/author_postprocessing.py:116

  • Inside fix_github_browser_commits(), author_data_new is only defined when authors.list exists in the current directory. If issues-github.list is encountered without authors.list (e.g., partial outputs), this will raise UnboundLocalError when building author_name_to_data. Initializing author_data_new per directory avoids that failure mode.
    # Check for all files in the result directory of the project whether they need to be adjusted
    for filepath, _, filenames in walk(data_path):


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread issue_processing/issue_processing.py
Comment thread issue_processing/issue_processing.py
Comment thread issue_processing/issue_processing.py
Comment thread bot_processing/bot_processing.py
Comment thread issue_processing/jira_issue_processing.py
Comment thread issue_processing/jira_issue_processing.py
Comment thread github_user_utils/github_user_utils.py Outdated
Comment thread issue_processing/issue_processing.py
Iteritems() does not work in python3, instrad use items()

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
@Leo-Send

Leo-Send commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

I fixed the relevant suggestions. This PR should be ready to be tested now.

bockthom added 2 commits June 13, 2026 17:54
…ssing

Signed-off-by: Thomas Bock <bockthom@cmu.edu>
As strings are already utf-8 encoded, don't convert them to utf-8 encoded
strings any more.

Signed-off-by: Thomas Bock <bockthom@cmu.edu>
@bockthom

Copy link
Copy Markdown
Collaborator

@Leo-Send Thanks for creating this pull-request. As I ran all python3 scripts of this repo, I had to fix a view missing imports in all files, and there was also one encoding problem that I had to fix. Therefore, I pushed two commits to your branch that contain these fixes. (So please pull before you continue working on this.)

There is one little issue remaining where I need your help:

username.list Does not contain the right name any more. Here is the diff - an example of a singe username:

Python2:
"zzy19970428";"zzy19970428";"could.not.resolve@a66tdd22mt"
Python3:
"zzy19970428";"None";"could.not.resolve@dfrfg3s1ti"

So the difference is: If there is no name, in python2, the username was used as name, and a random e-mail address was created. However, in python3, if there is no name, the name becomes the string "None", and suddenly all users are merged in the database because they all share the name "None". @Leo-Send Could you please figure out where this breakes? The usernames list should be created in issue_processing.py (and it might be also updated in bots_processing.py). If we don't have a name, we should use the username as name, not None.

Thank you!

Everything else of the python3 version works well - I did not spot any differences in the outcomes compared to the python2 version except for this username problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants