New events, copilot user, and agent support for python 3#56
Conversation
This allows for reconstruction of correct commit author if user is github Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also added one comment for clarity Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also save merge commits reconstruction of connected events is done by first saving all connected events that occured at the same time. Then, it is possible to match connected events iff: - half of the involved issues are equal, meaning that one issue is connected to multiple others - half rounded up of the involved isses are equal, meaning that we have one external connected event and then the previous case with the remaining issues Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
since data is modified in-place, return of input data is not needed Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
ALso add commit hash if closed by commit Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also rename 'new feature' to 'feature' Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also remove duplicates from type list Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
using empty line reserved for jira components Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also added copyright header Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also minor fixes and removal of math.ceil Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
comments now each have a boolean field that describes whether the comment contains a suggestion or not Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
dicts for reconstructing connected events are now better explained and the comments do not disruot the workflow in the run function anymore Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
includes: - updated comments - spelling mistake - fix for potential crash if script is used on old data Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
author postprocessing now also contains a list of known copilot use names that can be extended to unify more different copilot users Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
the events 'copilot_work_started' and 'copilot_work_finished' now always have the standard copilot user data Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Method doc updated to reflect new functionality Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
previously, the creator of the issues was falsely matched to the connected event instead of the user triggering the event Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
unification now done on all files, which should prevent any issues arising from unknown authors during anonymization also move all global variables to a new utils file Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Known agentsc such as 'copilot' or 'claude' can now be read, similar to known bots. They will be flagged as agents during bot processing. Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Add a helper function for creating bot name variants utilizing either '[bot]' or 'bot' suffix. Also update bot processing to check user buffer for all variants. Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Add a helper function that given a botname and a list of names, returns which bot name variant is contained in the list (or None). This is used whenever we check if a known bot is in the userdata or has been predicted to be a bot, and means that botnames in the known_bots file do not need to be duplicated for each variant. Also, automatically add all known coplilot users to the known_agents list, and then unify those during author postprocessing. Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
also add agents to bot handling, fix formatting for event_info_2 and subissues also fix a typo where strings would not have their quotes correctly removed Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
lock reason is saved in event_info_1 Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
docstrings should now more accurately reflect parameters and return values Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
For consistency with github events Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
previously removed event_info_2 for state_updated event, leading to crashes of the issue processing. Now, it instead contains an empty string. Also fix a minor spelling mistake Signed-off-by: <s8lesend@stud.uni-saarland.de>
event_info_1 should remain empty in that case Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
These fields are now replaced with empty lists when null Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
changing .keys() call on maps after rebase onto python 3 branch Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
There was a problem hiding this comment.
Pull request overview
This PR rebases prior work onto the python3-version branch and extends the issue/bot/author pipelines to better support new GitHub event types (incl. Copilot), agent classification, and additional cross-issue connection handling.
Changes:
- Extend GitHub issue event processing with Copilot-specific handling, connected-event reconstruction, and additional event metadata (e.g., merge commit hash, commit author for
commit_added). - Introduce shared GitHub/Copilot user utilities (
github_user_utils) and integrate them into bot processing and author postprocessing (incl. optional Copilot user unification). - Update both GitHub and JIRA issue processing to normalize issue types (e.g., “New Feature” → “Feature”) and adjust in-place processing patterns (stop relying on returned mutated lists).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| issue_processing/jira_issue_processing.py | Normalizes JIRA issue types and adjusts user-data insertion/state event generation. |
| issue_processing/issue_processing.py | Adds Copilot/connection reconstruction logic, extends event metadata, and updates user syncing/output fields. |
| github_user_utils/github_user_utils.py | New shared constants/helpers for GitHub/Copilot user and bot/agent name handling. |
| github_user_utils/init.py | Initializes the new github_user_utils package. |
| bot_processing/bot_processing.py | Adds agent list support and Copilot-agent handling; improves bot-name variant matching. |
| author_postprocessing/author_postprocessing.py | Adds Copilot unification option and extends “GitHub noreply” replacement logic to cover Copilot and commit_added info. |
Comments suppressed due to low confidence (1)
author_postprocessing/author_postprocessing.py:116
- Inside
fix_github_browser_commits(),author_data_newis only defined whenauthors.listexists in the current directory. Ifissues-github.listis encountered withoutauthors.list(e.g., partial outputs), this will raiseUnboundLocalErrorwhen buildingauthor_name_to_data. Initializingauthor_data_newper directory avoids that failure mode.
# Check for all files in the result directory of the project whether they need to be adjusted
for filepath, _, filenames in walk(data_path):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Iteritems() does not work in python3, instrad use items() Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
|
I fixed the relevant suggestions. This PR should be ready to be tested now. |
…ssing Signed-off-by: Thomas Bock <bockthom@cmu.edu>
As strings are already utf-8 encoded, don't convert them to utf-8 encoded strings any more. Signed-off-by: Thomas Bock <bockthom@cmu.edu>
|
@Leo-Send Thanks for creating this pull-request. As I ran all python3 scripts of this repo, I had to fix a view missing imports in all files, and there was also one encoding problem that I had to fix. Therefore, I pushed two commits to your branch that contain these fixes. (So please pull before you continue working on this.) There is one little issue remaining where I need your help:
Python2: So the difference is: If there is no name, in python2, the username was used as name, and a random e-mail address was created. However, in python3, if there is no name, the name becomes the string "None", and suddenly all users are merged in the database because they all share the name "None". @Leo-Send Could you please figure out where this breakes? The usernames list should be created in Thank you! Everything else of the python3 version works well - I did not spot any differences in the outcomes compared to the python2 version except for this username problem. |
This is a rebase of my previous PR onto the python3-version branch.