Skip to content

#746 Remove CHAR(n) support from data types supported when creting Hive tables.#747

Merged
yruslan merged 2 commits into
mainfrom
bugfix/746-remove-char-support-from-hive
May 18, 2026
Merged

#746 Remove CHAR(n) support from data types supported when creting Hive tables.#747
yruslan merged 2 commits into
mainfrom
bugfix/746-remove-char-support-from-hive

Conversation

@yruslan
Copy link
Copy Markdown
Collaborator

@yruslan yruslan commented May 18, 2026

Closes #746

Summary by CodeRabbit

  • Bug Fixes
    • Corrected CHAR data type handling when importing database schemas from JDBC connections—CHAR fields are now properly mapped as VARCHAR types to ensure consistent behavior throughout the system
    • Enhanced string field length metadata detection and application with improved validation thresholds to more accurately capture database schema constraints
    • Added dedicated support for UUID field types with proper length validation handling and metadata extraction

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Warning

Rate limit exceeded

@yruslan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 21 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b190d0d-ad08-4982-88e8-5e44c7e090fc

📥 Commits

Reviewing files that changed from the base of the PR and between fd4d604 and c5efad1.

📒 Files selected for processing (1)
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/SparkUtils.scala

Walkthrough

This PR removes CHAR(n) support from JDBC-sourced Spark metadata and standardizes on VARCHAR(n). String-length constants are updated to increase the maximum from 255 to 8192 bytes, type resolution logic is modified to map CHAR metadata to VARCHAR, imports are adjusted, and tests are updated to validate the new mapping behavior.

Changes

CHAR to VARCHAR Migration

Layer / File(s) Summary
String-length constant definitions
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcSparkUtils.scala
MAXIMUM_CHAR_LENGTH (255) is removed; MAXIMUM_VARCHAR_LENGTH (8192) and MAXIMUM_UUID_LENGTH (50) are introduced.
CHAR to VARCHAR type resolution
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/SparkUtils.scala, pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/SparkUtilsSuite.scala
SparkUtils import is updated to reference MAXIMUM_VARCHAR_LENGTH. getStringTypeFromMetadata no longer maps CHAR(len) metadata to CharType; instead, it maps to VarcharType(len). Test assertion for CHAR(12) metadata is updated to expect VarcharType(12).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 No more CHAR constraints to fear,
VARCHAR opens spaces clear!
From 255 to 8192 we soar,
Hive's limits plague us nevermore!
A humble hop toward compatibility fair.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: removing CHAR(n) support and replacing it with VARCHAR(n) for Hive table creation, which aligns with all file modifications.
Linked Issues check ✅ Passed The pull request successfully implements the objective from issue #746 by removing CHAR(n) support, updating constants to use VARCHAR(n) with 8192 limit, and modifying metadata handling to map CHAR to VARCHAR instead of CharType.
Out of Scope Changes check ✅ Passed All changes are directly related to the objective of removing CHAR(n) support from Hive table creation. No unrelated modifications were introduced.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/746-remove-char-support-from-hive

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/SparkUtilsSuite.scala (1)

706-713: ⚡ Quick win

Update test name to reflect VARCHAR behavior.

The test name says "return char type from CHAR_VARCHAR_METADATA_KEY with char" but the test now expects VarcharType(12) instead of CharType(12). Update the test name to accurately describe that CHAR metadata is now mapped to VarcharType.

📝 Proposed test name fix
-    "return char type from CHAR_VARCHAR_METADATA_KEY with char" in {
+    "return varchar type from CHAR_VARCHAR_METADATA_KEY with char" in {
       val metadata = new MetadataBuilder
       metadata.putString(CHAR_VARCHAR_METADATA_KEY, "CHAR(12)")
 
       val stringType = SparkUtils.getStringTypeFromMetadata(metadata.build())
 
       assert(stringType == VarcharType(12))
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/SparkUtilsSuite.scala`
around lines 706 - 713, Rename the failing test's description in
SparkUtilsSuite: change the test name string that currently reads "return char
type from CHAR_VARCHAR_METADATA_KEY with char" to something that reflects
VARCHAR mapping (e.g., "return varchar type from CHAR_VARCHAR_METADATA_KEY with
char") so it matches the assertion that
SparkUtils.getStringTypeFromMetadata(metadata.build()) returns VarcharType(12);
reference the test in SparkUtilsSuite and the CHAR_VARCHAR_METADATA_KEY constant
to locate and update the description only.
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/SparkUtils.scala (1)

420-431: ⚡ Quick win

Update documentation to reflect removal of CharType support.

The method documentation still mentions CharType as a possible return type, but the implementation now only returns VarcharType or StringType (never CharType). Update the doc comment to remove references to CharType.

📝 Proposed documentation fix
   /**
-    * Extracts a string-based data type (CharType, VarcharType, or StringType) from field metadata.
+    * Extracts a string-based data type (VarcharType or StringType) from field metadata.
     *
     * First checks for the presence of character/varchar type metadata key. If found, parses the metadata
-    * value using a pattern to determine if it represents a CHAR or VARCHAR type with a specific length.
+    * value using a pattern to extract the length. Both CHAR and VARCHAR metadata are mapped to VarcharType.
     * If the metadata key is not present, attempts to extract a length value from metadata and creates
     * a VarcharType if successful. Falls back to StringType if no specific type information is found
     * or if the metadata cannot be parsed.
     *
     * `@param` metadata the metadata object containing type information for a field
-    * `@return` the resolved string-based DataType, which can be CharType, VarcharType, or StringType
+    * `@return` the resolved string-based DataType, which can be VarcharType or StringType
     */
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pramen/core/src/main/scala/za/co/absa/pramen/core/utils/SparkUtils.scala`
around lines 420 - 431, Update the ScalaDoc for the string-type extractor in
SparkUtils.scala to remove references to CharType and reflect current behavior:
state that the method (e.g., extractStringBasedDataType) returns either
VarcharType or StringType only, describe how metadata keys are parsed to produce
a VarcharType with length or fall back to StringType, and remove any mention
that CharType may be returned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pramen/core/src/main/scala/za/co/absa/pramen/core/utils/SparkUtils.scala`:
- Around line 420-431: Update the ScalaDoc for the string-type extractor in
SparkUtils.scala to remove references to CharType and reflect current behavior:
state that the method (e.g., extractStringBasedDataType) returns either
VarcharType or StringType only, describe how metadata keys are parsed to produce
a VarcharType with length or fall back to StringType, and remove any mention
that CharType may be returned.

In
`@pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/SparkUtilsSuite.scala`:
- Around line 706-713: Rename the failing test's description in SparkUtilsSuite:
change the test name string that currently reads "return char type from
CHAR_VARCHAR_METADATA_KEY with char" to something that reflects VARCHAR mapping
(e.g., "return varchar type from CHAR_VARCHAR_METADATA_KEY with char") so it
matches the assertion that
SparkUtils.getStringTypeFromMetadata(metadata.build()) returns VarcharType(12);
reference the test in SparkUtilsSuite and the CHAR_VARCHAR_METADATA_KEY constant
to locate and update the description only.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 554d823d-8ad1-4017-8dec-05e5a93f3a5b

📥 Commits

Reviewing files that changed from the base of the PR and between 3e3f035 and fd4d604.

📒 Files selected for processing (3)
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcSparkUtils.scala
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/SparkUtils.scala
  • pramen/core/src/test/scala/za/co/absa/pramen/core/tests/utils/SparkUtilsSuite.scala
💤 Files with no reviewable changes (1)
  • pramen/core/src/main/scala/za/co/absa/pramen/core/utils/JdbcSparkUtils.scala

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Unit Test Coverage

Overall Project 77.26% 🍏
Module Coverage
pramen:core Jacoco Report 78.3% 🍏
Files
Module File Coverage
pramen:core Jacoco Report JdbcSparkUtils.scala 91.75% 🍏
SparkUtils.scala 87.08% 🍏

@yruslan yruslan merged commit a4cc52d into main May 18, 2026
7 checks passed
@yruslan yruslan deleted the bugfix/746-remove-char-support-from-hive branch May 18, 2026 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove support of CHAR(n) for Hive in favor of VARCHAR(n)

1 participant