fix: validate fixed and decimal type sizes#1312
Conversation
bb7ee51 to
4874643
Compare
tanmayrauth
left a comment
There was a problem hiding this comment.
Nice tightening, anchoring the regexes is a real fix, and the validation bounds look right. One question about symmetry with the fixed-literal path.
| func (d DecimalLiteral) Type() Type { | ||
| precision := max(9, d.Scale) | ||
|
|
||
| return DecimalTypeOf(precision, d.Scale) |
There was a problem hiding this comment.
Good catch using max(9, d.Scale) here — without it the new precision/scale validation would panic for literals with scale > 9.
The same reasoning seems to apply to FixedLiteral.Type() at line 1122, which still does FixedTypeOf(len(f)). A zero-length fixed literal (e.g. the zero value, or UnmarshalBinary with empty data) now panics there instead of returning FixedType{0} as it used to. It's an unusual input so maybe not worth much, but since you guarded the decimal side, was leaving the fixed side intentional? A len(f) == 0 guard (or just documenting it) would keep the two paths consistent.
There was a problem hiding this comment.
Thank you for the comment. I have pushed the fix.
4874643 to
94ddf2f
Compare
zeroshade
left a comment
There was a problem hiding this comment.
Nice fix - the bounds match the Iceberg spec (decimal precision 1-38, scale 0..precision; fixed length >= 1), and validating both the exported constructors and the JSON parse path (plus anchoring the fixed[...]/decimal(...) regexes) is the right call. Requesting changes on one validation gap and one new panic edge; the third is just a doc nit. Details inline.
| return fmt.Errorf("%w: %s", ErrInvalidTypeString, typename) | ||
| } | ||
|
|
||
| n, _ := strconv.Atoi(matches[1]) |
There was a problem hiding this comment.
[Major] The strconv.Atoi error is dropped here, and validateFixedLength only rejects n <= 0. On overflow Atoi returns math.MaxInt together with an ErrRange, so fixed[99999999999999999999] is accepted as fixed[math.MaxInt] instead of being rejected - a validation bypass in the exact path this PR hardens. (The decimal branch below is incidentally safe because its > 38 upper bound catches the saturated value; fixed has no upper bound.) Suggest checking the error:
n, err := strconv.Atoi(matches[1])
if err != nil {
return fmt.Errorf("%w: %s: %w", ErrInvalidTypeString, typename, err)
}| // need the real column precision must consult the bound field's type rather | ||
| // than lit.Type(). See https://github.com/apache/iceberg-go/issues/1028. | ||
| func (d DecimalLiteral) Type() Type { | ||
| precision := max(9, d.Scale) |
There was a problem hiding this comment.
[Minor] max(9, d.Scale) guards the lower precision bound but not the upper one: if d.Scale > 38, precision becomes > 38 and the now-validating DecimalTypeOf panics - whereas the old code returned a placeholder without panicking. FixedLiteral.Type() is fully guarded by max(1, len(f)), so this is an asymmetry. If Scale can exceed 38, consider clamping (e.g. precision := min(maxDecimalPrecision, max(9, d.Scale))) or confirm scale is always <= 38.
|
|
||
| func (FixedLiteral) Comparator() Comparator[[]byte] { return bytes.Compare } | ||
| func (f FixedLiteral) Type() Type { return FixedTypeOf(len(f)) } | ||
| func (f FixedLiteral) Type() Type { return FixedTypeOf(max(1, len(f))) } |
There was a problem hiding this comment.
nit: unlike DecimalLiteral.Type() just below (which documents its placeholder precision), this max(1, len(f)) has no comment explaining why an empty FixedLiteral reports fixed[1] (to avoid the new FixedTypeOf panic on length 0). A one-line comment would help the next reader.
Summary
Why
FixedTypeOf and DecimalTypeOf accepted impossible values, and JSON parsing constructed those types directly. That let invalid schema types such as fixed[0], decimal(0, 0), or decimal(2, 5) enter metadata and fail later in Arrow, Parquet, or reader paths.
Testing