From 73d3cc8994fbaf1d1dec1a1539ca7793759a2c97 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Wed, 10 Mar 2021 18:02:01 -0500 Subject: [PATCH 01/19] Rationale --- README.md | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..9e5f856 --- /dev/null +++ b/README.md @@ -0,0 +1,139 @@ +# String Interpolation + +| Field | Value | +|-----------------|-------------------------------------------------------------------| +| DIP: | xxxx | +| Review Count: | 0 | +| Author: | Andrei Alexandrescu
John Colvin jcolvin@symmetryinvestments.com| +| Implementation: | | +| Status: | | + +## Abstract + +Instead of requiring a format string followed by an argument list, string interpolation enables +embedding the arguments in the string itself. + + +## Contents +* [Rationale](#rationale) +* [Prior Work](#prior-work) +* [Description](#description) +* [Breaking Changes and Deprecations](#breaking-changes-and-deprecations) +* [Reference](#reference) +* [Copyright & License](#copyright--license) +* [Reviews](#reviews) + +## Rationale + +A frequent pattern in programming is to create strings (for the purpose of printing to the console, writing to files, etc) by mixing predefined string fragments with data contained in variables. The most straightforward approach to implement that pattern is to intersperse expressions with string fragments in calls to variadic functions: + +```d +void f1(string name) { + writeln("Hello, ", name, "!"); // formats and prints in one go + string s = text("Hello, ", name, "!"); // formats and returns formatted string + ... +} +``` + +A more flexible approach, embodied by the classic `printf` family of C functions, is to use *format specifiers* that provide the string fragments intermixed with formatting directives. Specialized functions take such format specifiers alongside data components, and replace each formatting directive with suitably formatted data: + +```d +void f2(string name) { + writefln("Hello, %40s!", name); // formats and prints in one go + string s = format("Hello, %40s!", name); // formats and returns formatted string + ... +} +``` + +Such an approach observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf) . This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in separation from program logic and swap them as needed. + +However, there are important use cases where separation of logic from display is not only unneeded, but becomes a hindrance: + +- *code generation:* when generating code, the tight integration between the string fragments and the interspersed computation is essential to the process; +- *scripting:* shell scripts use string interpolation very frequently, to the extent that the mechanics of quoting and interpolation is an essential focus of all shell scripting languages; +- *casual printing, tracing, logging, and debugging:* sometimes, such tasks have more focus on the expressions to be printed, than on formatting paraphernalia. + +Such needs are served poorly by either the interspersion approach and the format specifier approach Consider this example adapted from the implementation of `std.bitmanip.bitfields`: + +```d +enum result = text( + "@property bool ", name, "() @safe pure nothrow @nogc const { return ", + "(", store, " & ", maskAllElse, ") != 0;}\n", + "@property void ", name, "(bool v) @safe pure nothrow @nogc { ", + "if (v) ", store, " |= ", maskAllElse, ";", + "else ", store, " &= cast(typeof(", store, "))(-1-cast(typeof(", store, "))", maskAllElse, ");}\n" +); +``` + +(The [original code](https://github.com/dlang/phobos/blob/v2.095.1/std/bitmanip.d#L115) uses string concatenation instead of a call to `text`. We use the latter to simplify the example.) Here, the interspersion mechanics distract from following the correctness of the generated code. An approach based on format specifiers would look as follows: + +```d +enum result = format( + "@property bool %1$s() @safe pure nothrow @nogc const { + return (%2$s & %3$s) != 0; + } + @property void %1$s(bool v) @safe pure nothrow @nogc { + if (v) %2$s |= %3$s; + else %2$s &= cast(typeof(%2$s))(-1-cast(typeof(%2$s))%3$s); + }\n", + name, store, maskAllElse +); +``` + +This form has less syntactic noise and appears as a format string separated from the expressions involved, for which reason we afforded to reformat it in a shape consistent with the generated code. Correctness of the generated code is still difficult to assess on the generated code, for example the reader must mentally map tedious sequences such as `%3$s` to meaningful names such as `maskAllElse` throughout the code snippet. + +By comparison, using a hypothetical interpolation syntax would make the code much easier to follow: + +```d +enum result = text( + i"@property bool {name}() @safe pure nothrow @nogc const { + return ({store} & {maskAllElse}) != 0; + } + @property void {name}(bool v) @safe pure nothrow @nogc { + if (v) {store} |= {maskAllElse}; + else {store} &= cast(typeof({store}))(-1-cast(typeof({store})){maskAllElse}); + }\n" +); +``` + +The latter form has dramatically less syntactic noise and appears as a single string with expressions inside escaped by `{` and `}`. Correctness of the generated code is much easier to assess in the second form as well. + +Let us also look at a shell command example. Assume `url` is an URL and `file` is an filename, both preprocessed as escaped shell strings. To download `url` into `file` without risking corrupt files in case of incomplete downloads, the code below first downloads into a temporary file with the extension `.frag` and then atomically renames it to the correct name: + +```d +executeShell("wget " ~ url ~ " -O" ~ file ~ ".frag && mv " ~ file ~ ".frag " ~ file); +``` + +The version using format specifiers is marginally more readable: + +```d +executeShell("wget %1$s -O%2$s.frag && mv %2$s.frag %2$s", url, file); +``` + +The interpolated form is, again, by far the easiest to follow: + +```d +executeShell(i"wget {url} -O{file}.frag && mv {file}.frag {file}"); +``` + +Last but not least, there are numerous cases in which casual console output can use interpolated strings to reduce on boilerplate and improve clarity: + +```d +writeln("Hello, ", name, ". You are ", age, " years old."); // interspersion +writefln("Hello, %s. You are %s years old.", name, age); // format string +writeln(i"Hello, {name}. You are {age} years old."); // interpolation +``` + +## Prior Work + +* Interpolated strings have been implemented and well-received in many languages. +For many such examples, see [String Interpolation](https://en.wikipedia.org/wiki/String_interpolation). +* [DIP1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), from which this DIP was derived. +* [C#'s implementation](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated#compilation-of-interpolated-strings) which returns a formattable object that user functions can use +* [Javascript's implementation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) which passes `string[], args...` to a builder function very similarly to this proposal +* Jason Helson submitted a DIP [String Syntax for Compile-Time Sequences](https://github.com/dlang/DIPs/pull/140). +* [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988) + +## Description + +TODO From 40fa6ff59018e15b079c3c899e82c7591c021885 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Thu, 11 Mar 2021 20:59:52 -0500 Subject: [PATCH 02/19] Draft in reviewable form --- README.md | 257 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 245 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 9e5f856..be6c472 100644 --- a/README.md +++ b/README.md @@ -10,18 +10,14 @@ ## Abstract -Instead of requiring a format string followed by an argument list, string interpolation enables -embedding the arguments in the string itself. - +Instead of requiring a format string followed by an argument list or interspersing of format fragments with other arguments, string interpolation enables embedding the arguments in the string itself. We propose an extremely simple yet powerful approach of lowering interpolated strings into compile-time tuples (akin to `AliasSeq` in the standard library). We demonstrate how this approach achieves all major objectives of an interpolated strings feature with a minimal footprint on the language definition and support library. ## Contents * [Rationale](#rationale) * [Prior Work](#prior-work) * [Description](#description) * [Breaking Changes and Deprecations](#breaking-changes-and-deprecations) -* [Reference](#reference) * [Copyright & License](#copyright--license) -* [Reviews](#reviews) ## Rationale @@ -45,15 +41,15 @@ void f2(string name) { } ``` -Such an approach observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf) . This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in separation from program logic and swap them as needed. +Such an approach observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. -However, there are important use cases where separation of logic from display is not only unneeded, but becomes a hindrance: +However, there are important use cases where separation of data from format is not only unneeded, but becomes a hindrance: - *code generation:* when generating code, the tight integration between the string fragments and the interspersed computation is essential to the process; - *scripting:* shell scripts use string interpolation very frequently, to the extent that the mechanics of quoting and interpolation is an essential focus of all shell scripting languages; - *casual printing, tracing, logging, and debugging:* sometimes, such tasks have more focus on the expressions to be printed, than on formatting paraphernalia. -Such needs are served poorly by either the interspersion approach and the format specifier approach Consider this example adapted from the implementation of `std.bitmanip.bitfields`: +Such needs are served poorly by either the interspersion approach and the format specifier approach. Consider this example adapted from the implementation of `std.bitmanip.bitfields`: ```d enum result = text( @@ -82,7 +78,7 @@ enum result = format( This form has less syntactic noise and appears as a format string separated from the expressions involved, for which reason we afforded to reformat it in a shape consistent with the generated code. Correctness of the generated code is still difficult to assess on the generated code, for example the reader must mentally map tedious sequences such as `%3$s` to meaningful names such as `maskAllElse` throughout the code snippet. -By comparison, using a hypothetical interpolation syntax would make the code much easier to follow: +By comparison, using the interpolation syntax proposed in this DIP would make the code much easier to follow: ```d enum result = text( @@ -124,16 +120,253 @@ writefln("Hello, %s. You are %s years old.", name, age); // format string writeln(i"Hello, {name}. You are {age} years old."); // interpolation ``` -## Prior Work +### Why Yet Another String Interpolation Proposal? + +This DIP derives from, and owes much to, the previous work on string interpolation in the D community. The abundance of such work raises the question of why a new proposal is needed. + +This DIP is close to the prior work yet different in key aspects as follows: + +- Like [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988), this DIP expands the interpolated string into an argument list. However, unlike that proposal that automatically passes the expansion as an argument list for `std.typecons.tuple`, we place the expansion in a built-in tuple. We will show how using expansion to built-in tuple has significant flexibility and efficiency advantages. +- Like [DIP 1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), this DIP expands the interpolated string into a built-in tuple. Unlike DIP 1027, our proposal does not assemble the string fragments together, has a simpler syntax, and does not foster a format-string-style approach. We will argue that unifying format strings with string interpolation is not a goal worth pursuing. We will also show how custom formatting can be elegantly implemented on the library side with no additional language support, no complication of the syntax, and no loss of efficiency. +- We heed important lessons from [DIP 1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), mainly that while pursuing generality, complexity must be kept under control. In wake of it, we argue that it is not only appropriate, but in fact recommendable, to abandon certain directions of generalization in favor of a drastic reduction in complexity. In particular, we make integration with format-string style approaches a non-goal. + +We will demonstrate how this DIP achieves all major goals of extant proposals with a radically simpler definition and implementation. + +## Related Work * Interpolated strings have been implemented and well-received in many languages. For many such examples, see [String Interpolation](https://en.wikipedia.org/wiki/String_interpolation). -* [DIP1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), from which this DIP was derived. +* [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988) and [DIP1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), from which this DIP was derived. +* [DIP1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), which is as of the time of this writing in review. * [C#'s implementation](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated#compilation-of-interpolated-strings) which returns a formattable object that user functions can use * [Javascript's implementation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) which passes `string[], args...` to a builder function very similarly to this proposal * Jason Helson submitted a DIP [String Syntax for Compile-Time Sequences](https://github.com/dlang/DIPs/pull/140). -* [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988) ## Description +An interpolated string is a regular D string prefixed with the letter `i`, as in `i"Hello"`. No whitespace is allowed between `i` and the opening quote. + +An interpolated string may occur only in one of the following contexts: + +- in the argument list of a function call; +- in the argument list of a constructor call; +- in the argument list of a `mixin`; +- in the argument list of a template instantiation. + +In any other context, interpolated strings are ill-formed. + +For example, the function call expression: + +```d +writeln(i"I ate {apples} and {bananas} totalling {apples + bananas} fruit.") +``` + +is lowered into: + +```d +writeln("I ate ", apples, " and ", bananas, " totalling ", apples + bananas, " fruit."); +``` + +The resulting lowered code is subjected to the usual typechecking and has the same semantics as if the lowered code were present in the source. + +After introducing an intuition of how interpolated string work, let us formalize the syntax and semantics. Lexically: + +``` +InterpolatedString: + i" DoubleQuotedCharacters " +``` + +The `InterpolatedString` appears in the parser grammar as an `InterpolatedExpression`, which is under `PrimaryExpression`. + +``` +InterpolatedExpression: + InterpolatedString + InterpolatedString StringLiterals + +StringLiterals: + StringLiteral + StringLiteral StringLiterals +``` + +Inside an interpolated string, the characters `{` and `}` are of particular interest because the interpolated string will use them as escapes. To make `{` and `}` printable inside an interpolated string, the sequences `{{` and `}}`, respectively, shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar: + +``` +Elements: + Element + Element Elements + +Element: + Character excluding '{' and '}' + '{{' + '}}' + '{' Identifier '}' + '{' Type '}' + '{' Expression '}' +``` + +In the grammar above `Type` is the nonterminal for types, and `Expression` is the nonterminal for general D expressions. + +The `InterpolatedExpression` is converted to a comma-separated list that consists of the string fragments interspersed with the expressions escaped by the characters `{` and `}`. + +Any lexical errors (such as unbalanced `{` and `}`) will be reported during parsing. Semantic checking will ensure that interpolated strings occur only in the contexts specified above. Then, other semantic errors will be reported during the typechecking of the lowered code. + +This concludes the syntax and semantics of the proposed feature. + +### Use Cases + +Although the proposed feature is deceptively simple, its flexibility affords a multitude of use cases. They can be easily assessed with a current D compiler by typing in the lowered code. + +#### Passing arguments to functions + +The simplest use case of interpolated strings, and the most likely to be encountered in practice, is as argument to functions such as `writeln`: + +```d +void main(int argc, string[] argv) { + import std.stdio; + writeln(i"The program {argv[0]} received {argc - 1} arguments."); + // Lowering: ---> + // writeln("The program ", argv[0], " received ", argv.length - 1, " arguments."); +} +``` + +A function such as `writeln` above has no way to know whether it was called via an interpolated string or with interspersed arguments; interpolated strings are purely a call-side device. We consider this a key characteristic of the feature. + +#### Saving the result of interpolation as a tuple + +Although it may seem limiting to impose that interpolated strings expand in a function call, `tuple` offers an immediate and efficient mechanism for storing the result of interpolation. The following program produces the same output as the previous one: + +```d +void main(int argc, string[] argv) { + import std.stdio; + auto t = tuple(i"The program {argv[0]} received {argc - 1} arguments."); + // Lowering: ---> + // auto t = tuple("The program ", argv[0], " received ", argv.length - 1, " arguments."); + writeln(t.expand); +} +``` + +#### Custom formatting + +It may seem that this proposal has a fundamental limitation: what if we want to do some custom formatting on the arguments, such as displaying an integral number in hexadecimal or a floating-point number in scientific format? + +Fortunately, an elegant solution can be implemented on the library side with minimal effort. Consider for example defining a `print` function that supports a *formatting directive* called `hex` that instructs the function to print the integral that follows in hexadecimal. First, the library defines a `hex` constant with a unique type: + +```d +struct Hex {} +immutable Hex hex; +``` + +The library recognizes arguments of type `Hex` as directives to print the next integral in hexadecimal format. On the call side, the user simply inserts `hex` in the interpolated strings appropriately when calling `print`: + +```d +void fun(int x) { + print(i"{x} in hexadecimal is 0x{hex}{x}."); + // Lowering: ---> + // print(x, " in hexadecimal is 0x", hex, x, "."); +} +``` + +There is no need for defining, implementing, and memorizing a mini-language of encoded format specifiers --- everything can be done with plain D values. The approach is reminiscent of [C++ stream manipulators](http://www.cplusplus.com/reference/library/manipulators/), thankfully with a much lower syntactical load. + +For another example, suppose the library sets out to support custom formatting for floating-point numbers, such as width, precision, and scientific notation. It would first define a small data structure that contains the appropriate state: + +```d +struct Fixed { uint width, precision; } +Fixed fixed(uint width = 10, uint precision = 6) { + return Fixed(width, precision); +} + +struct Scientific { dchar sep; } +Scientific scientific(dchar sep = 'E') { + return Scientific(sep); +} +``` + +The formatting library recognizes `Fixed` and `Scientific` as formatting directives controlling formatting, allowing the client to pass them just like any arguments: + +``` +double x = 0.1 + 0.2; +print(i"{x} can be written as {scientific}{x} and its exact value is {fixed(20, 10)}{x}"); +// Lowering: ---> +// print(x, " can be written as ", scientific, x, " and its exact value is ", fixed(20, 10), x); +``` + +#### Use in `mixin` declarations and expressions + +Interpolated strings are allowed in `mixin` declarations and expressions: + +```d +immutable x = "asd", y = 42; +mixin(i"int {x} = {y};"); +// Lowering ---> +// mixin("int ", x, " = ", y, ";"); +auto z = mixin(i"{x} + 5"); +// Lowering ---> +// auto z = mixin(x, " + 5"); +``` + +#### Use in the argument list of template instantiations + +An interpolated string may be present in a template instantiation's argument list. This allows, for example, a parser generator to compose with strings in the grammar definition: + +```d +struct Grammar(string... spec) { ... } + +immutable ident = "( [ char ]+ )" + +alias Calculator = Grammar!( + i"Expression := Term + Term + Term := Factor * Factor + Factor := {ident} | '(' Expression ')' " +); +// Lowering: ---> +// alias Calculator = Grammar!( +// "Expression := Term + Term +// Term := Factor * Factor +// Factor := ", ident, " | '(' Expression ')' " +// ); +``` + +In certain cases, the interpolated string can be passed to `AliasSeq` as well resulting in a sequence of strings interspersed with identifiers: + +```d +int x = 42; +alias Q = AliasSeq!(i"I'm interpolating {x} here."); // OK, use Q as a type +auto q = AliasSeq!(i"I'm interpolating {x} here."); // OK, store as AliasSeq object +``` + +#### Conversion to format-string-style arguments + TODO + +### Limitations and tradeoffs + +Users may be confused that they cannot define variables that are interpolated strings: + +```d +int x = 42; +auto s = i"Let's interpolate {x}!" // Error, interpolated string not allowed here +``` + +The remedy is simple and may be suggested by the text of the error message: use a tuple to store the interpolation, or call a function such as `text` to convert everything to a string: + +```d +int x = 42; +auto s = text(i"Let's interpolate {x}!"); // OK, store as string +auto t = tuple(i"Let's interpolate {x}!"); // OK, store as tuple +alias Q = AliasSeq!(i"Let's interpolate {x}!"); // OK, use as type +auto q = AliasSeq!(i"Let's interpolate {x}!"); // OK, store as AliasSeq +``` + +Functions are not aware whether they got called with an interpolated string or a manually written list of arguments. This confers consistency, simplicity, and uniformity to the approach. + +## Breaking Changes and Deprecations + +Because `InterpolatedString` is a new token, no existing code is broken. + +## Copyright & License + +Copyright (c) 2021 by the D Language Foundation + +Licensed under Creative Commons Zero 1.0 + From e2b97cacd432d75f00430026f0e92681b56fb19c Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Sun, 14 Mar 2021 13:43:18 -0400 Subject: [PATCH 03/19] Brand new take --- README.md | 836 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 464 insertions(+), 372 deletions(-) diff --git a/README.md b/README.md index be6c472..58f2d42 100644 --- a/README.md +++ b/README.md @@ -1,372 +1,464 @@ -# String Interpolation - -| Field | Value | -|-----------------|-------------------------------------------------------------------| -| DIP: | xxxx | -| Review Count: | 0 | -| Author: | Andrei Alexandrescu
John Colvin jcolvin@symmetryinvestments.com| -| Implementation: | | -| Status: | | - -## Abstract - -Instead of requiring a format string followed by an argument list or interspersing of format fragments with other arguments, string interpolation enables embedding the arguments in the string itself. We propose an extremely simple yet powerful approach of lowering interpolated strings into compile-time tuples (akin to `AliasSeq` in the standard library). We demonstrate how this approach achieves all major objectives of an interpolated strings feature with a minimal footprint on the language definition and support library. - -## Contents -* [Rationale](#rationale) -* [Prior Work](#prior-work) -* [Description](#description) -* [Breaking Changes and Deprecations](#breaking-changes-and-deprecations) -* [Copyright & License](#copyright--license) - -## Rationale - -A frequent pattern in programming is to create strings (for the purpose of printing to the console, writing to files, etc) by mixing predefined string fragments with data contained in variables. The most straightforward approach to implement that pattern is to intersperse expressions with string fragments in calls to variadic functions: - -```d -void f1(string name) { - writeln("Hello, ", name, "!"); // formats and prints in one go - string s = text("Hello, ", name, "!"); // formats and returns formatted string - ... -} -``` - -A more flexible approach, embodied by the classic `printf` family of C functions, is to use *format specifiers* that provide the string fragments intermixed with formatting directives. Specialized functions take such format specifiers alongside data components, and replace each formatting directive with suitably formatted data: - -```d -void f2(string name) { - writefln("Hello, %40s!", name); // formats and prints in one go - string s = format("Hello, %40s!", name); // formats and returns formatted string - ... -} -``` - -Such an approach observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. - -However, there are important use cases where separation of data from format is not only unneeded, but becomes a hindrance: - -- *code generation:* when generating code, the tight integration between the string fragments and the interspersed computation is essential to the process; -- *scripting:* shell scripts use string interpolation very frequently, to the extent that the mechanics of quoting and interpolation is an essential focus of all shell scripting languages; -- *casual printing, tracing, logging, and debugging:* sometimes, such tasks have more focus on the expressions to be printed, than on formatting paraphernalia. - -Such needs are served poorly by either the interspersion approach and the format specifier approach. Consider this example adapted from the implementation of `std.bitmanip.bitfields`: - -```d -enum result = text( - "@property bool ", name, "() @safe pure nothrow @nogc const { return ", - "(", store, " & ", maskAllElse, ") != 0;}\n", - "@property void ", name, "(bool v) @safe pure nothrow @nogc { ", - "if (v) ", store, " |= ", maskAllElse, ";", - "else ", store, " &= cast(typeof(", store, "))(-1-cast(typeof(", store, "))", maskAllElse, ");}\n" -); -``` - -(The [original code](https://github.com/dlang/phobos/blob/v2.095.1/std/bitmanip.d#L115) uses string concatenation instead of a call to `text`. We use the latter to simplify the example.) Here, the interspersion mechanics distract from following the correctness of the generated code. An approach based on format specifiers would look as follows: - -```d -enum result = format( - "@property bool %1$s() @safe pure nothrow @nogc const { - return (%2$s & %3$s) != 0; - } - @property void %1$s(bool v) @safe pure nothrow @nogc { - if (v) %2$s |= %3$s; - else %2$s &= cast(typeof(%2$s))(-1-cast(typeof(%2$s))%3$s); - }\n", - name, store, maskAllElse -); -``` - -This form has less syntactic noise and appears as a format string separated from the expressions involved, for which reason we afforded to reformat it in a shape consistent with the generated code. Correctness of the generated code is still difficult to assess on the generated code, for example the reader must mentally map tedious sequences such as `%3$s` to meaningful names such as `maskAllElse` throughout the code snippet. - -By comparison, using the interpolation syntax proposed in this DIP would make the code much easier to follow: - -```d -enum result = text( - i"@property bool {name}() @safe pure nothrow @nogc const { - return ({store} & {maskAllElse}) != 0; - } - @property void {name}(bool v) @safe pure nothrow @nogc { - if (v) {store} |= {maskAllElse}; - else {store} &= cast(typeof({store}))(-1-cast(typeof({store})){maskAllElse}); - }\n" -); -``` - -The latter form has dramatically less syntactic noise and appears as a single string with expressions inside escaped by `{` and `}`. Correctness of the generated code is much easier to assess in the second form as well. - -Let us also look at a shell command example. Assume `url` is an URL and `file` is an filename, both preprocessed as escaped shell strings. To download `url` into `file` without risking corrupt files in case of incomplete downloads, the code below first downloads into a temporary file with the extension `.frag` and then atomically renames it to the correct name: - -```d -executeShell("wget " ~ url ~ " -O" ~ file ~ ".frag && mv " ~ file ~ ".frag " ~ file); -``` - -The version using format specifiers is marginally more readable: - -```d -executeShell("wget %1$s -O%2$s.frag && mv %2$s.frag %2$s", url, file); -``` - -The interpolated form is, again, by far the easiest to follow: - -```d -executeShell(i"wget {url} -O{file}.frag && mv {file}.frag {file}"); -``` - -Last but not least, there are numerous cases in which casual console output can use interpolated strings to reduce on boilerplate and improve clarity: - -```d -writeln("Hello, ", name, ". You are ", age, " years old."); // interspersion -writefln("Hello, %s. You are %s years old.", name, age); // format string -writeln(i"Hello, {name}. You are {age} years old."); // interpolation -``` - -### Why Yet Another String Interpolation Proposal? - -This DIP derives from, and owes much to, the previous work on string interpolation in the D community. The abundance of such work raises the question of why a new proposal is needed. - -This DIP is close to the prior work yet different in key aspects as follows: - -- Like [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988), this DIP expands the interpolated string into an argument list. However, unlike that proposal that automatically passes the expansion as an argument list for `std.typecons.tuple`, we place the expansion in a built-in tuple. We will show how using expansion to built-in tuple has significant flexibility and efficiency advantages. -- Like [DIP 1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), this DIP expands the interpolated string into a built-in tuple. Unlike DIP 1027, our proposal does not assemble the string fragments together, has a simpler syntax, and does not foster a format-string-style approach. We will argue that unifying format strings with string interpolation is not a goal worth pursuing. We will also show how custom formatting can be elegantly implemented on the library side with no additional language support, no complication of the syntax, and no loss of efficiency. -- We heed important lessons from [DIP 1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), mainly that while pursuing generality, complexity must be kept under control. In wake of it, we argue that it is not only appropriate, but in fact recommendable, to abandon certain directions of generalization in favor of a drastic reduction in complexity. In particular, we make integration with format-string style approaches a non-goal. - -We will demonstrate how this DIP achieves all major goals of extant proposals with a radically simpler definition and implementation. - -## Related Work - -* Interpolated strings have been implemented and well-received in many languages. -For many such examples, see [String Interpolation](https://en.wikipedia.org/wiki/String_interpolation). -* [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988) and [DIP1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), from which this DIP was derived. -* [DIP1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), which is as of the time of this writing in review. -* [C#'s implementation](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated#compilation-of-interpolated-strings) which returns a formattable object that user functions can use -* [Javascript's implementation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) which passes `string[], args...` to a builder function very similarly to this proposal -* Jason Helson submitted a DIP [String Syntax for Compile-Time Sequences](https://github.com/dlang/DIPs/pull/140). - -## Description - -An interpolated string is a regular D string prefixed with the letter `i`, as in `i"Hello"`. No whitespace is allowed between `i` and the opening quote. - -An interpolated string may occur only in one of the following contexts: - -- in the argument list of a function call; -- in the argument list of a constructor call; -- in the argument list of a `mixin`; -- in the argument list of a template instantiation. - -In any other context, interpolated strings are ill-formed. - -For example, the function call expression: - -```d -writeln(i"I ate {apples} and {bananas} totalling {apples + bananas} fruit.") -``` - -is lowered into: - -```d -writeln("I ate ", apples, " and ", bananas, " totalling ", apples + bananas, " fruit."); -``` - -The resulting lowered code is subjected to the usual typechecking and has the same semantics as if the lowered code were present in the source. - -After introducing an intuition of how interpolated string work, let us formalize the syntax and semantics. Lexically: - -``` -InterpolatedString: - i" DoubleQuotedCharacters " -``` - -The `InterpolatedString` appears in the parser grammar as an `InterpolatedExpression`, which is under `PrimaryExpression`. - -``` -InterpolatedExpression: - InterpolatedString - InterpolatedString StringLiterals - -StringLiterals: - StringLiteral - StringLiteral StringLiterals -``` - -Inside an interpolated string, the characters `{` and `}` are of particular interest because the interpolated string will use them as escapes. To make `{` and `}` printable inside an interpolated string, the sequences `{{` and `}}`, respectively, shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar: - -``` -Elements: - Element - Element Elements - -Element: - Character excluding '{' and '}' - '{{' - '}}' - '{' Identifier '}' - '{' Type '}' - '{' Expression '}' -``` - -In the grammar above `Type` is the nonterminal for types, and `Expression` is the nonterminal for general D expressions. - -The `InterpolatedExpression` is converted to a comma-separated list that consists of the string fragments interspersed with the expressions escaped by the characters `{` and `}`. - -Any lexical errors (such as unbalanced `{` and `}`) will be reported during parsing. Semantic checking will ensure that interpolated strings occur only in the contexts specified above. Then, other semantic errors will be reported during the typechecking of the lowered code. - -This concludes the syntax and semantics of the proposed feature. - -### Use Cases - -Although the proposed feature is deceptively simple, its flexibility affords a multitude of use cases. They can be easily assessed with a current D compiler by typing in the lowered code. - -#### Passing arguments to functions - -The simplest use case of interpolated strings, and the most likely to be encountered in practice, is as argument to functions such as `writeln`: - -```d -void main(int argc, string[] argv) { - import std.stdio; - writeln(i"The program {argv[0]} received {argc - 1} arguments."); - // Lowering: ---> - // writeln("The program ", argv[0], " received ", argv.length - 1, " arguments."); -} -``` - -A function such as `writeln` above has no way to know whether it was called via an interpolated string or with interspersed arguments; interpolated strings are purely a call-side device. We consider this a key characteristic of the feature. - -#### Saving the result of interpolation as a tuple - -Although it may seem limiting to impose that interpolated strings expand in a function call, `tuple` offers an immediate and efficient mechanism for storing the result of interpolation. The following program produces the same output as the previous one: - -```d -void main(int argc, string[] argv) { - import std.stdio; - auto t = tuple(i"The program {argv[0]} received {argc - 1} arguments."); - // Lowering: ---> - // auto t = tuple("The program ", argv[0], " received ", argv.length - 1, " arguments."); - writeln(t.expand); -} -``` - -#### Custom formatting - -It may seem that this proposal has a fundamental limitation: what if we want to do some custom formatting on the arguments, such as displaying an integral number in hexadecimal or a floating-point number in scientific format? - -Fortunately, an elegant solution can be implemented on the library side with minimal effort. Consider for example defining a `print` function that supports a *formatting directive* called `hex` that instructs the function to print the integral that follows in hexadecimal. First, the library defines a `hex` constant with a unique type: - -```d -struct Hex {} -immutable Hex hex; -``` - -The library recognizes arguments of type `Hex` as directives to print the next integral in hexadecimal format. On the call side, the user simply inserts `hex` in the interpolated strings appropriately when calling `print`: - -```d -void fun(int x) { - print(i"{x} in hexadecimal is 0x{hex}{x}."); - // Lowering: ---> - // print(x, " in hexadecimal is 0x", hex, x, "."); -} -``` - -There is no need for defining, implementing, and memorizing a mini-language of encoded format specifiers --- everything can be done with plain D values. The approach is reminiscent of [C++ stream manipulators](http://www.cplusplus.com/reference/library/manipulators/), thankfully with a much lower syntactical load. - -For another example, suppose the library sets out to support custom formatting for floating-point numbers, such as width, precision, and scientific notation. It would first define a small data structure that contains the appropriate state: - -```d -struct Fixed { uint width, precision; } -Fixed fixed(uint width = 10, uint precision = 6) { - return Fixed(width, precision); -} - -struct Scientific { dchar sep; } -Scientific scientific(dchar sep = 'E') { - return Scientific(sep); -} -``` - -The formatting library recognizes `Fixed` and `Scientific` as formatting directives controlling formatting, allowing the client to pass them just like any arguments: - -``` -double x = 0.1 + 0.2; -print(i"{x} can be written as {scientific}{x} and its exact value is {fixed(20, 10)}{x}"); -// Lowering: ---> -// print(x, " can be written as ", scientific, x, " and its exact value is ", fixed(20, 10), x); -``` - -#### Use in `mixin` declarations and expressions - -Interpolated strings are allowed in `mixin` declarations and expressions: - -```d -immutable x = "asd", y = 42; -mixin(i"int {x} = {y};"); -// Lowering ---> -// mixin("int ", x, " = ", y, ";"); -auto z = mixin(i"{x} + 5"); -// Lowering ---> -// auto z = mixin(x, " + 5"); -``` - -#### Use in the argument list of template instantiations - -An interpolated string may be present in a template instantiation's argument list. This allows, for example, a parser generator to compose with strings in the grammar definition: - -```d -struct Grammar(string... spec) { ... } - -immutable ident = "( [ char ]+ )" - -alias Calculator = Grammar!( - i"Expression := Term + Term - Term := Factor * Factor - Factor := {ident} | '(' Expression ')' " -); -// Lowering: ---> -// alias Calculator = Grammar!( -// "Expression := Term + Term -// Term := Factor * Factor -// Factor := ", ident, " | '(' Expression ')' " -// ); -``` - -In certain cases, the interpolated string can be passed to `AliasSeq` as well resulting in a sequence of strings interspersed with identifiers: - -```d -int x = 42; -alias Q = AliasSeq!(i"I'm interpolating {x} here."); // OK, use Q as a type -auto q = AliasSeq!(i"I'm interpolating {x} here."); // OK, store as AliasSeq object -``` - -#### Conversion to format-string-style arguments - -TODO - -### Limitations and tradeoffs - -Users may be confused that they cannot define variables that are interpolated strings: - -```d -int x = 42; -auto s = i"Let's interpolate {x}!" // Error, interpolated string not allowed here -``` - -The remedy is simple and may be suggested by the text of the error message: use a tuple to store the interpolation, or call a function such as `text` to convert everything to a string: - -```d -int x = 42; -auto s = text(i"Let's interpolate {x}!"); // OK, store as string -auto t = tuple(i"Let's interpolate {x}!"); // OK, store as tuple -alias Q = AliasSeq!(i"Let's interpolate {x}!"); // OK, use as type -auto q = AliasSeq!(i"Let's interpolate {x}!"); // OK, store as AliasSeq -``` - -Functions are not aware whether they got called with an interpolated string or a manually written list of arguments. This confers consistency, simplicity, and uniformity to the approach. - -## Breaking Changes and Deprecations - -Because `InterpolatedString` is a new token, no existing code is broken. - -## Copyright & License - -Copyright (c) 2021 by the D Language Foundation - -Licensed under Creative Commons Zero 1.0 - +# String Interpolation + +| Field | Value | +|-----------------|-------------------------------------------------------------------| +| DIP: | xxxx | +| Review Count: | 0 | +| Author: | Andrei Alexandrescu
John Colvin jcolvin@symmetryinvestments.com| +| Implementation: | | +| Status: | | + +## Abstract + +Textual formatting is often achieved either by APIs relying on formatting strings followed by arguments to be formatted (in the style of `printf`, `std.format.format`, and `std.stdio.writefln`), or by interspersing string fragments with arguments (in the style of `std.conv.text` and `std.stdio.writeln`). String interpolation enables embedding the arguments in the string itself. We propose an extremely simple yet powerful approach of lowering interpolated strings into comma-separated lists that works with both *format-string* style and *interspersion* style with no change to any library function. We demonstrate how this approach achieves all major objectives of an interpolated strings feature with a minimal footprint on the language definition and support library. + +## Contents +* [Rationale](#rationale) +* [Prior Work](#prior-work) +* [Description](#description) +* [Breaking Changes and Deprecations](#breaking-changes-and-deprecations) +* [Copyright & License](#copyright--license) + +## Rationale + +A frequent pattern in programming is to create strings (for the purpose of printing to the console, writing to files, etc) by mixing predefined string fragments with data contained in variables. We identify two distinct approaches to formatting data: *format-string* style and *interspersion* style. + +The most straightforward approach to implement that pattern is to intersperse expressions with string fragments in calls to variadic functions: + +```d +void f1(string name) { + writeln("Hello, ", name, "!"); // formats and prints in one go + string s = text("Hello, ", name, "!"); // formats and returns formatted string + ... +} +``` + +A more flexible approach, embodied by the classic `printf` family of C functions and carried over to D standard library functions such as `std.format.format` and `std.stdio.writefln`, is to use *format strings* that provide the string fragments intermixed with conventionally defined *formatting specifiers*. Specialized functions take such format strings followed by the data to be formatted, and replace each formatting directive with suitably formatted data: + +```d +void f2(string name) { + writefln("Hello, %40s!", name); // formats and prints in one go + string s = format("Hello, %40s!", name); // formats and returns formatted string + ... +} +``` + +Other examples of the format-string style are SQL prepared statements and string templates for formatting HTML documents. The convention used for format specifiers is defined by the respective APIs: + +```d +void f2(string name) { + htmlOutput("Looking for #{}...", name); + auto rows = sql("SELECT * FROM t WHERE name = ? AND active = true", name); + ... +} +``` + +Both approaches have pros and cons. The format-string style observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. The disadvantage of the format-string style is that the expressions to be formatted appear lexically *after* the (possibly long) format string, which makes it difficult to follow how format specifiers sync with their respective arguments. + +The interspersion style is simple, intuitive, and requires learning no convention. However, creating complex outputs becomes cumbersome due to the syntactic heaviness of alternating string literals and other arguments in comma-separated lists. Also, customized formatting (such as rendering an integral in hexadecimal instead of decimal) is not immediate. + +We will demonstrate how both the format-string style and the interspersion style fall short of expectations on three categories of everyday tasks: + +- *code generation:* when generating code, the tight integration between the string literal fragments and the expressions to be inserted is essential to the process; +- *scripting:* shell scripts use string interpolation very frequently, to the extent that the mechanics of quoting and interpolation is an essential focus of all shell scripting languages; +- *casual printing, tracing, logging, and debugging:* sometimes, such tasks have more focus on the expressions to be printed, than on formatting paraphernalia. + +Such needs are served poorly by either the interspersion approach and the format-string approach. Consider an example of code generation adapted from the implementation of `std.bitmanip.bitfields`: + +```d +enum result = text( + "@property bool ", name, "() @safe pure nothrow @nogc const { return ", + "(", store, " & ", maskAllElse, ") != 0;}\n", + "@property void ", name, "(bool v) @safe pure nothrow @nogc { ", + "if (v) ", store, " |= ", maskAllElse, ";", + "else ", store, " &= cast(typeof(", store, "))(-1-cast(typeof(", store, "))", maskAllElse, ");}\n" +); +``` + +(The [original code](https://github.com/dlang/phobos/blob/v2.095.1/std/bitmanip.d#L115) uses string concatenation instead of a call to `text`. We use the latter to simplify the example.) Here, the interspersion mechanics (closing quote, comma, expression, comma, opening quote) distract from following the correctness of the generated code. An approach based on format specifiers would look as follows: + +```d +enum result = format( + "@property bool %s() @safe pure nothrow @nogc const { + return (%s & %s) != 0; + } + @property void %s(bool v) @safe pure nothrow @nogc { + if (v) %s |= %s; + else %s &= cast(typeof(%s))(-1-cast(typeof(%s))%s); + }\n", + name, store, maskAllElse, + name, store, maskAllElse, store, store, store, maskAllElse +); +``` + +This form has less syntactic noise and appears as a format string separated from the expressions involved, for which reason we afforded to reformat it in a shape consistent with the generated code. However, the separation of format from data is clearly an impediment here requiring the reader to mentally track and pair the format specifiers `%s` with the arguments trailing the formatting string. Using positional arguments brings a marginal improvement to the code: + +```d +enum result = format( + "@property bool %1$s() @safe pure nothrow @nogc const { + return (%2$s & %3$s) != 0; + } + @property void %1$s(bool v) @safe pure nothrow @nogc { + if (v) %2$s |= %3$s; + else %2$s &= cast(typeof(%2$s))(-1-cast(typeof(%2$s))%3$s); + }\n", + name, store, maskAllElse +); +``` + +Here, the reader only needs to track the correct use of numbers in the format specifiers and match it with the order in the trailing arguments. Correctness of the generated code is still difficult to assess on the generated code, for example the reader must mentally map tedious sequences such as `%3$s` to meaningful names such as `maskAllElse` throughout the code snippet. + +By comparison, using the interpolation syntax proposed in this DIP would make the code much easier to follow: + +```d +enum result = text( + i"@property bool $name() @safe pure nothrow @nogc const { + return ($store & $maskAllElse) != 0; + } + @property void $name(bool v) @safe pure nothrow @nogc { + if (v) $store |= $maskAllElse; + else $store &= cast(typeof($store))(-1-cast(typeof($store))$maskAllElse); + }\n" +); +``` + +The latter form has dramatically less syntactic noise and appears as a single string with expressions inside escaped by `$`. Correctness of the generated code is much easier to assess in the second form as well. + +Let us also look at a shell command example. Assume `url` is an URL and `file` is an filename, both preprocessed as escaped shell strings. To download `url` into `file` without risking corrupt files in case of incomplete downloads, the code below first downloads into a temporary file with the extension `.frag` and then atomically renames it to the correct name: + +```d +executeShell("wget " ~ url ~ " -O" ~ file ~ ".frag && mv " ~ file ~ ".frag " ~ file); +``` + +The version using format specifiers is marginally more readable: + +```d +// Classic +executeShell("wget %s -O%s.frag && mv %s.frag %s", url, file, file, file); +// Positional +executeShell("wget %1$s -O%2$s.frag && mv %2$s.frag %2$s", url, file); +``` + +The interpolated form is, again, by far the easiest to follow: + +```d +executeShell(i"wget $url -O$file.frag && mv $file.frag $file"); +``` + +Last but not least, there are numerous cases in which casual console output can use interpolated strings to reduce on boilerplate and improve clarity: + +```d +writeln("Hello, ", name, ". You are ", age, " years old."); // interspersion +writefln("Hello, %s. You are %s years old.", name, age); // format string +writeln(i"Hello, $name. You are $age years old."); // interpolation +``` + +### Why Yet Another String Interpolation Proposal? + +This DIP derives from, and owes much to, the previous work on string interpolation in the D community. The abundance of such work raises the question why a new proposal is needed. + +This DIP is close to the prior work yet different in key aspects as follows: + +- Like [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988), this DIP expands the interpolated string into an argument list. However, unlike that proposal that automatically passes the expansion as an argument list for `std.typecons.tuple`, we expand into an argument list. We will show how doing so has significant flexibility and efficiency advantages. +- Like [DIP 1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), this DIP expands the interpolated string into a list. Unlike DIP 1027, which only supports the format-string style, this proposal supports both the interspersion and the format-string style. It also has a simpler syntax and semantics. +- We heed important lessons from [DIP 1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), mainly that while pursuing generality, complexity must be kept under control. In wake of it, we concluded that unification of the format-string style and interspersion approach with a single interpolation syntax is not the appropriate goal. Instead, we recognize the two goals as distinct and propose distinct constructs for them. + +We will demonstrate how this DIP achieves all major goals of extant proposals with a radically simpler definition and implementation. + +## Related Work + +* Interpolated strings have been implemented and well-received in many languages. +For many such examples, see [String Interpolation](https://en.wikipedia.org/wiki/String_interpolation). +* [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988) and [DIP1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), from which this DIP was derived. +* [DIP1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), which is as of the time of this writing in review. +* [C#'s implementation](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated#compilation-of-interpolated-strings) which returns a formattable object that user functions can use +* [Javascript's implementation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) which passes `string[], args...` to a builder function very similarly to this proposal +* Jason Helson submitted a DIP [String Syntax for Compile-Time Sequences](https://github.com/dlang/DIPs/pull/140). + +## Description + +D strings have several syntactical form, among which a few that follow the pattern of a letter followed by the string proper. Such include `r"WYSIWYG strings"`, `q"[delimited strings]"`, and `q{token strings}`. Our proposal follows the same pattern by introducing `i"interpolated strings"` and `f"interpolated formatting strings"`. + +An *interpolated string* is a regular D string prefixed with the letter `i`, as in `i"Hello"`. An *interpolated formatting string* is a regular D string prefixed with the letter `f`, as in `f"world"`. No whitespace is allowed between `i` or `f` and the opening quote. The first expands into the interspersed style, and the second expands into the format-string style. We refer to them as `i`-strings and `f`-strings, respectively. + +An `i`-string or an `f`-string may occur only in one of the following contexts: + +- in the argument list of a function or constructor call; +- in the argument list of a `mixin`; +- in the argument list of a template instantiation. + +In any other context, `i`-string or an `f`-string are ill-formed. + +For an example of `i`-strings, the function call expression: + +```d +writeln(i"I ate $apples apples and $bananas bananas totalling $(apples + bananas) fruit.") +``` + +is lowered into: + +```d +writeln("I ate ", apples, " apples and ", bananas, " bananas totalling ", apples + bananas, " fruit.") +``` + +For an example of `f`-strings, the function call expression: + +```d +writefln(f"I ate %s$apples apples and %s$bananas bananas totalling %s$(apples + bananas) fruit.") +``` + +is lowered into: + +```d +writefln(f"I ate %s apples and %s bananas totalling %s fruit.", apples, bananas, apples + bananas) +``` + +The resulting lowered code is subjected to the usual typechecking and has the same semantics as if the lowered code were present in the source. + +After introducing an intuition of how interpolated string work, let us formalize the syntax and semantics. Lexically: + +``` +InterpolatedString: + i" DoubleQuotedCharacters " +InterpolatedFormattingString: + f" DoubleQuotedCharacters " +``` + +The `InterpolatedString` and `InterpolatedFormattingString` appear in the parser grammar as an `InterpolatedExpression`, which is under `PrimaryExpression`. + +``` +InterpolatedExpression: + InterpolatedString + InterpolatedString StringLiterals + InterpolatedFormattingString + InterpolatedFormattingString StringLiterals +``` + +Inside an interpolated string, the character `$` is of particular interest because the interpolated string will use it as an escape. To render `$` verbatim inside an interpolated string, the sequence `$$` shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar, which is identical for `i`-strings and `f`-strings: + +``` +Elements: + Element + Element Elements + +Element: + Character excluding '$' + '$$' + '$' Identifier + '$(' Type ')' + '$(' Expression ')' +``` + +In the grammar above `Type` is the nonterminal for types, and `Expression` is the nonterminal for general D expressions. + +The `InterpolatedExpression` is lowered to a comma-separated list that consists of the string fragments interspersed with the expressions escaped by `$`. The `InterpolatedFormattingExpression` is lowered to the string literal fragments stitched together, followed by all escaped fragments, in lexical order. + +An `f`-string expansion produces a literal string in the first position even if that would be empty: `fun(f"$x")` lowers to `fun("", x)`, not `fun(x)`. In contrast, `i`-string expansion never produces empty string literals: `fun(i"$x")`expands to `fun(x)` and `fun(i"$x$y")`expands to `fun(x, y)`. + +Any lexical errors (such as a `$` followed by a space, or unbalanced `(` and `)`) will be reported during parsing. Semantic checking will ensure that interpolated strings occur only in the contexts specified above. Then, other semantic errors will be reported during the typechecking of the lowered code. + +This concludes the syntax and semantics of the proposed feature. + +Why choose the `$` when many popular languages and libraries (Python, C++20, C#) use `{` and `}` as escape characters? Also, why use `$(` and `)` as opposed to `${` and `}`, as perhaps a bash user may be more familiar with? + +One essential use of interpolation, specific to D, is for code generation. In generated D code, curly braces `{` and `}` are abundant. Requiring `{{` and `}}` everywhere in the generated code would have been aggravating. Anecdotal evidence has been collected in the creation of this DIP, which initially attempted to use `{` and `}` for escaping: the examples extracted from `std.bitmanip.bitfields` turned out to have numerous bugs, and be difficult to read when corrected: + +```d +// Using `{` and `}` for escaping, similar to Python's f-strings +enum result = text( + i"@property bool {name}() @safe pure nothrow @nogc const {{ + return ({store} & {maskAllElse}) != 0; + }} + @property void {name}(bool v) @safe pure nothrow @nogc {{ + if (v) {store} |= {maskAllElse}; + else {store} &= cast(typeof({store}))(-1-cast(typeof({store})){maskAllElse}); + }}\n" +); +``` + +In contrast, occurrences of `$` in D code are rare --- indeed more likely to be present in generated code that uses interpolation itself. This makes `$` a disproportionately strong candidate compared to other choices. + +The second question --- why not use `${` and `}` instead of `$(` and `)` --- has a simple answer: the elements to group inside the escape sequences are expressions, not statements. There already exists a syntax for grouping expressions, and that's surrounding them with `(` and `)`, which closes the case by invoking the [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment). + +### Use Cases + +Although the proposed feature is deceptively simple, its flexibility affords a multitude of use cases. They can be easily assessed with a current D compiler by typing in the lowered code. + +#### Passing arguments to functions + +The simplest use case of interpolated strings, and the most likely to be encountered in practice, is as argument to functions such as `writeln`, `text`, `writefln`, or `format`: + +```d +void main(string[] args) { + import std.stdio; + writeln(i"The program $(args[0]) received $(args.length - 1) arguments."); + // Lowering: ---> + // writeln("The program ", args[0], " received ", args.length - 1, " arguments."); + + writefln(f"The program %s$(args[0]) received %s$(args.length - 1) arguments."); + // Lowering: ---> + // writefln("The program %s received %s arguments.", args[0], args.length - 1); + + auto s = sqlExec(f"INSERT INTO runs VALUES(?$(args[0]), ?$(args.length - 1))"); + // Lowering: ---> + // auto s = sqlExec("INSERT INTO runs VALUES(?, ?)", args[0], args.length - 1); +} +``` + +A function such as `std.stdio.writeln` or `std.stdio.writefln` above has no way to know whether it was called via an interpolated string or with arguments specified as in the corresponding lowering; interpolated strings are purely a call-side mechanism. We consider this a key characteristic of the proposed feature that drastically simplifies its definition and interoperation with existing code. + +For `f`-strings, the convention for format specifiers varies with the API --- for example, `printf` and `std.stdio.writefln` use the well-known `%`-prefixed specifiers, whereas SQL traditionally uses `?`. For that reason and to keep complexity to a minimum, `f`-strings do not try to be clever and provide their own convention and translation mechanism; instead, they just concatenate the string literal fragments together just like the user wrote them, and follows them with the interpolated expressions. An `f`-string does not create any text. + +#### Saving the result of interpolation as a tuple + +Although it may seem limiting to impose that interpolated strings expand in a function call, `tuple` offers an immediate and efficient mechanism for storing the result of interpolation. The following program produces the same output as the previous one: + +```d +void main(string[] args) { + import std.stdio; + auto t1 = tuple(i"The program $(args[0]) received $(args.length - 1) arguments."); + // Lowering: ---> + // auto t1 = tuple("The program ", args[0], " received ", args.length - 1, " arguments."); + writeln(t1.expand); + + auto t2 = tuple(f"The program %s$(args[0]) received %s$(args.length - 1) arguments."); + // Lowering: ---> + // auto t2 = tuple("The program %s received %s arguments.", args[0], args.length - 1); + writefln(t2.expand); +} +``` + +#### Manipulator-style formatting + +[C++'s iostreams](http://www.cplusplus.com/reference/iolibrary/) introduced the notion of [*stream manipulators*](https://en.cppreference.com/w/cpp/io/manip) --- special functions interspersed with the data to print, which direct the way data is to be formatted. For example, the C++ `std::dec` and `std::hex` manipulators instruct the formatting engine to format the following integral in decimal and hexadecimal, respectively: + +```C++ +// C++ code +#include +void fun(int x) { + std::cout << std::dec << x << " in hexadecimal is 0x" << std::hex << x << ".\n"; +} +``` + +The approach is obviously extensible by simply adding new manipulators. Unfortunately, C++ stream manipulators have developed a poor reputation because they are very heavy syntactically --- interspersion is done with the visually prominent `<<` and the user must choose between polluting their namespace and using the `std::` prefix for scope resolution with each manipulator. (These issues and other unrelated ones motivated the introduction of the `` facility in C++20.) + +Using interpolation, a manipulators-based approach can be used elegantly and implemented with minimal effort. Consider for example using stream manipulators such as `dec` and `hex` for `writeln` by using an `i`-string: + +```d +void fun(int x) { + writeln(i"$dec$x in hexadecimal is 0x$hex$x."); + // Lowering: ---> + // writeln(dec, x, " in hexadecimal is 0x", hex, x, "."); +} +``` + +There is no need for defining, implementing, and memorizing a mini-language of encoded format specifiers --- all formatting can be done with D language expressions. Continuing the example, the library can just as easily define parameterized formatting for floating-point numbers, such as width, precision, and scientific notation: + +```d +void fun(double x) { + writeln(i"$x can be written as $scientific$x and its exact value is $(fixed(20, 10))$x"); + // Lowering: ---> + // writeln(x, " can be written as ", scientific, x, " and its exact value is ", fixed(20, 10), x); +} +``` + +#### Use in `mixin` declarations and expressions + +`i`-strings (but not `f`-strings) are allowed in `mixin` declarations and expressions: + +```d +immutable x = "asd", y = 42; +mixin(i"int $x = $y;"); +// Lowering ---> +// mixin("int ", x, " = ", y, ";"); +auto z = mixin(i"$x + 5"); +// Lowering ---> +// auto z = mixin(x, " + 5"); +``` + +#### Use in the argument list of template instantiations + +An interpolated string may be present in the argument list of a template instantiation. This allows, for example, a parser generator to compose with strings in the grammar definition: + +```d +struct Grammar(spec...) { ... } + +immutable identifier = "( [ character ]+ )" + +alias Calculator = Grammar!( + i"Expression := Term + Term + Term := Factor * Factor + Factor := $identifier | '(' Expression ')' " +); +// Lowering: ---> +// alias Calculator = Grammar!( +// "Expression := Term + Term +// Term := Factor * Factor +// Factor := ", identifier, " | '(' Expression ')' " +// ); +``` + +In certain cases, the interpolated string can be passed to `AliasSeq` as well resulting in a sequence of strings interspersed with identifiers: + +```d +int x = 42; +alias Q = AliasSeq!(i"I'm interpolating $x here."); // OK, use Q as a type +auto q = AliasSeq!(i"I'm interpolating $x here."); // OK, store as AliasSeq object +``` + +### Limitations and tradeoffs + +Users may be confused that they cannot define variables that are interpolated strings: + +```d +int x = 42; +auto s = i"Let's interpolate $x!" // Error, interpolated string not allowed here +``` + +The remedy is simple and may be suggested by the text of the error message: use a tuple to store the interpolation, or call a function such as `text` or `format` to convert everything to a string: + +```d +int x = 42; +auto s = text(i"Let's interpolate $x!"); // OK, store as string +auto t = tuple(i"Let's interpolate $x!"); // OK, store as tuple +alias Q = AliasSeq!(i"Let's interpolate $x!"); // OK, use as type +auto q = AliasSeq!(i"Let's interpolate $x!"); // OK, store as AliasSeq +``` + +As mentioned, functions or templates are not aware whether they received an interpolated string or a manually written list of arguments. This confers consistency, simplicity, and uniformity to the approach. + +It is not possible for an `f`-string to pass the string literal as a template argument and the interpolated expressions as run-time arguments: + +```d +void main(string[] args) { + import std.stdio; + // No equivalent using interpolation + writefln!"The program %s received %s arguments."(args[0], args.length - 1); +} +``` + +`f`-strings do not provide special rules to match format specifiers (such as `%s` in `std.stdio.writefln` or `?` in SQL prepared statements) against arguments: + +```d +void fun(int x) { + writefln(f"Adding %s$x..."); // prints "Adding 42 ..." + writefln(f"Adding $x..."); // prints "Adding ..." + sqlExec(f"INSERT INTO t VALUES(?$x)"); // OK + sqlExec(f"INSERT INTO t VALUES($x?)"); // OK, equivalent + sqlExec(f"INSERT INTO t$x VALUES(?)"); // OK, equivalent but perverse + sqlExec(f"INSERT INTO t VALUES($x)"); // Runtime error, missing '?' binding +} +``` + +We consider that adding special mechanisms to adapt `f`-strings to a variety of formatting conventions is disproportionately complex and adds its own liabilities. + +## Breaking Changes and Deprecations + +Because `InterpolatedString` and `InterpolatedFormatString` are new tokens, no existing code is broken. + +## Copyright & License + +Copyright (c) 2021 by the D Language Foundation + +Licensed under Creative Commons Zero 1.0 + From 08d3fa859d9f340bddffa7a27c47878295b4bfe7 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Sun, 14 Mar 2021 18:00:21 -0400 Subject: [PATCH 04/19] Add expansion inside a pragma(msg) --- README.md | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 58f2d42..d838aa8 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ void f1(string name) { } ``` -A more flexible approach, embodied by the classic `printf` family of C functions and carried over to D standard library functions such as `std.format.format` and `std.stdio.writefln`, is to use *format strings* that provide the string fragments intermixed with conventionally defined *formatting specifiers*. Specialized functions take such format strings followed by the data to be formatted, and replace each formatting directive with suitably formatted data: +A more flexible approach, embodied by the classic `printf` family of C functions and carried over to D standard library functions such as `std.format.format` and `std.stdio.writefln`, is to use *format strings* that provide the string fragments intermixed with conventionally defined *formatting specifiers*. Specialized functions take such format strings followed by the arguments to be formatted and replace each formatting directive with suitably formatted data: ```d void f2(string name) { @@ -43,7 +43,7 @@ void f2(string name) { } ``` -Other examples of the format-string style are SQL prepared statements and string templates for formatting HTML documents. The convention used for format specifiers is defined by the respective APIs: +Other examples of the format-string style are string templates for formatting HTML documents and SQL prepared statements. The convention used for format specifiers is defined by the respective APIs: ```d void f2(string name) { @@ -53,7 +53,7 @@ void f2(string name) { } ``` -Both approaches have pros and cons. The format-string style observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. The disadvantage of the format-string style is that the expressions to be formatted appear lexically *after* the (possibly long) format string, which makes it difficult to follow how format specifiers sync with their respective arguments. +Each approach --- interspersion style and format-string style --- has its pros and cons. The format-string style observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. The disadvantage of the format-string style is that the expressions to be formatted appear lexically *after* the (possibly long) format string, which makes it difficult to follow how format specifiers sync with their respective arguments. The interspersion style is simple, intuitive, and requires learning no convention. However, creating complex outputs becomes cumbersome due to the syntactic heaviness of alternating string literals and other arguments in comma-separated lists. Also, customized formatting (such as rendering an integral in hexadecimal instead of decimal) is not immediate. @@ -185,6 +185,7 @@ An `i`-string or an `f`-string may occur only in one of the following contexts: - in the argument list of a function or constructor call; - in the argument list of a `mixin`; +- in the argument list of a `pragma(msg)` directive; - in the argument list of a template instantiation. In any other context, `i`-string or an `f`-string are ill-formed. @@ -376,6 +377,26 @@ auto z = mixin(i"$x + 5"); // auto z = mixin(x, " + 5"); ``` +#### Use in `pragma(msg)` directives + +`i`-strings (but not `f`-strings) are allowed in `pragma(msg)` directives: + +```d +enum x = 42; +pragma(msg, i"x = $x."); +// Lowering ---> +// pragma(msg, "x = ", x, "."); +``` + +Note that `pragma(msg)` is already variadic. Currently `assert` and `static assert` are not variadic, so they need to be helped with `text` or `format`: + +```d +void fun(int x)(int y) { + static assert(x < 42, text(i"x is $x, should be less than 42")); + assert(y > 42, format(f"y is $y, should be greater than 42")); +} +``` + #### Use in the argument list of template instantiations An interpolated string may be present in the argument list of a template instantiation. This allows, for example, a parser generator to compose with strings in the grammar definition: From 3be293109d049e0563ca8b8e54a0e6e7a5afec1b Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Tue, 23 Mar 2021 18:56:55 -0400 Subject: [PATCH 05/19] Integrate Walter's review --- README.md | 58 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/README.md b/README.md index d838aa8..c410dd4 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ Textual formatting is often achieved either by APIs relying on formatting string ## Rationale -A frequent pattern in programming is to create strings (for the purpose of printing to the console, writing to files, etc) by mixing predefined string fragments with data contained in variables. We identify two distinct approaches to formatting data: *format-string* style and *interspersion* style. +A frequent pattern in programming is to create strings (for the purpose of printing to the console, writing to files, etc) by mixing predefined string fragments with data contained in variables. We identify two distinct approaches to formatting data: *interspersion* style and *format-string* style. The most straightforward approach to implement that pattern is to intersperse expressions with string fragments in calls to variadic functions: @@ -33,7 +33,7 @@ void f1(string name) { } ``` -A more flexible approach, embodied by the classic `printf` family of C functions and carried over to D standard library functions such as `std.format.format` and `std.stdio.writefln`, is to use *format strings* that provide the string fragments intermixed with conventionally defined *formatting specifiers*. Specialized functions take such format strings followed by the arguments to be formatted and replace each formatting directive with suitably formatted data: +A more flexible approach, embodied by the classic `printf` family of C functions and carried over to D standard library functions such as `std.format.format` and `std.stdio.writefln`, is to use *format strings* that contain conventionally defined *formatting specifiers*. Specialized functions take such format strings followed by the arguments to be formatted and replace each formatting directive with suitably formatted data: ```d void f2(string name) { @@ -61,9 +61,9 @@ We will demonstrate how both the format-string style and the interspersion style - *code generation:* when generating code, the tight integration between the string literal fragments and the expressions to be inserted is essential to the process; - *scripting:* shell scripts use string interpolation very frequently, to the extent that the mechanics of quoting and interpolation is an essential focus of all shell scripting languages; -- *casual printing, tracing, logging, and debugging:* sometimes, such tasks have more focus on the expressions to be printed, than on formatting paraphernalia. +- *casual printing, tracing, logging, and debugging:* often, such tasks have more focus on the expressions to be printed, than on separating formatting paraphernalia from the data to be formatted. -Such needs are served poorly by either the interspersion approach and the format-string approach. Consider an example of code generation adapted from the implementation of `std.bitmanip.bitfields`: +The following examples show that such needs are served poorly by either the interspersion approach and the format-string approach. Consider an example of code generation adapted from the implementation of `std.bitmanip.bitfields`: ```d enum result = text( @@ -91,7 +91,7 @@ enum result = format( ); ``` -This form has less syntactic noise and appears as a format string separated from the expressions involved, for which reason we afforded to reformat it in a shape consistent with the generated code. However, the separation of format from data is clearly an impediment here requiring the reader to mentally track and pair the format specifiers `%s` with the arguments trailing the formatting string. Using positional arguments brings a marginal improvement to the code: +This form has less syntactic noise and appears as a format string separated from the expressions involved, for which reason we afforded to reformat it in a shape consistent with the generated code. However, the separation of format from data is an impediment here requiring the reader to mentally track and pair the format specifiers `%s` with the arguments trailing the formatting string. The repetition of arguments after the formatting string is also problematic. Using positional arguments brings a marginal improvement to the code because the arguments must be passed only once and referred by their position: ```d enum result = format( @@ -106,7 +106,7 @@ enum result = format( ); ``` -Here, the reader only needs to track the correct use of numbers in the format specifiers and match it with the order in the trailing arguments. Correctness of the generated code is still difficult to assess on the generated code, for example the reader must mentally map tedious sequences such as `%3$s` to meaningful names such as `maskAllElse` throughout the code snippet. +Here, the reader only needs to track the correct use of numbers in the format specifiers and match it with the order in the trailing arguments. Correctness of the generated code is still difficult to assess, for example the reader must mentally map tedious sequences such as `%3$s` to meaningful names such as `maskAllElse` throughout the code snippet. By comparison, using the interpolation syntax proposed in this DIP would make the code much easier to follow: @@ -133,13 +133,11 @@ executeShell("wget " ~ url ~ " -O" ~ file ~ ".frag && mv " ~ file ~ ".frag " ~ f The version using format specifiers is marginally more readable: ```d -// Classic -executeShell("wget %s -O%s.frag && mv %s.frag %s", url, file, file, file); -// Positional -executeShell("wget %1$s -O%2$s.frag && mv %2$s.frag %2$s", url, file); +executeShell("wget %s -O%s.frag && mv %s.frag %s", url, file, file, file); // classic +executeShell("wget %1$s -O%2$s.frag && mv %2$s.frag %2$s", url, file); // positional ``` -The interpolated form is, again, by far the easiest to follow: +The interpolated form is, again, the easiest to follow: ```d executeShell(i"wget $url -O$file.frag && mv $file.frag $file"); @@ -150,7 +148,7 @@ Last but not least, there are numerous cases in which casual console output can ```d writeln("Hello, ", name, ". You are ", age, " years old."); // interspersion writefln("Hello, %s. You are %s years old.", name, age); // format string -writeln(i"Hello, $name. You are $age years old."); // interpolation +writeln(i"Hello, $name. You are $age years old."); // interpolation ``` ### Why Yet Another String Interpolation Proposal? @@ -184,11 +182,12 @@ An *interpolated string* is a regular D string prefixed with the letter `i`, as An `i`-string or an `f`-string may occur only in one of the following contexts: - in the argument list of a function or constructor call; -- in the argument list of a `mixin`; -- in the argument list of a `pragma(msg)` directive; +- in the argument list of a `mixin` (`i`-strings only); +- in the argument list of a `pragma(msg)` directive (`i`-strings only); +- in the argument list (starting with the second argument) of an `assert` or `static assert` invocation, contingent to fixing [Issue 17378](https://issues.dlang.org/show_bug.cgi?id=17378) (`i`-strings only); and - in the argument list of a template instantiation. -In any other context, `i`-string or an `f`-string are ill-formed. +In any other context, `i`-string or an `f`-string are illegal. For an example of `i`-strings, the function call expression: @@ -225,14 +224,19 @@ InterpolatedFormattingString: f" DoubleQuotedCharacters " ``` -The `InterpolatedString` and `InterpolatedFormattingString` appear in the parser grammar as an `InterpolatedExpression`, which is under `PrimaryExpression`. +The `InterpolatedString` and `InterpolatedFormattingString` appear in the parser grammar as an `InterpolatedList`, which is under `ArgumentList`. ``` -InterpolatedExpression: - InterpolatedString - InterpolatedString StringLiterals - InterpolatedFormattingString - InterpolatedFormattingString StringLiterals +ArgumentList: + AssignExpression + AssignExpression , + AssignExpression , ArgumentList + InterpolatedString + InterpolatedString , + InterpolatedString , ArgumentList + InterpolatedFormattingString + InterpolatedFormattingString , + InterpolatedFormattingString , ArgumentList ``` Inside an interpolated string, the character `$` is of particular interest because the interpolated string will use it as an escape. To render `$` verbatim inside an interpolated string, the sequence `$$` shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar, which is identical for `i`-strings and `f`-strings: @@ -247,16 +251,16 @@ Element: '$$' '$' Identifier '$(' Type ')' - '$(' Expression ')' + '$(' AssignExpression ')' ``` -In the grammar above `Type` is the nonterminal for types, and `Expression` is the nonterminal for general D expressions. +In the grammar above `Type` is the nonterminal for types, and `AssignExpression` is the nonterminal for general D assignment expressions. For details refer to the current grammar at https://dlang.org/spec/grammar.html. -The `InterpolatedExpression` is lowered to a comma-separated list that consists of the string fragments interspersed with the expressions escaped by `$`. The `InterpolatedFormattingExpression` is lowered to the string literal fragments stitched together, followed by all escaped fragments, in lexical order. +The `InterpolatedString` is lowered to a comma-separated list that consists of the string fragments interspersed with the expressions escaped by `$`. The `InterpolatedFormattingString` is lowered to the string literal fragments stitched together, followed by all escaped fragments, in lexical order. An `f`-string expansion produces a literal string in the first position even if that would be empty: `fun(f"$x")` lowers to `fun("", x)`, not `fun(x)`. In contrast, `i`-string expansion never produces empty string literals: `fun(i"$x")`expands to `fun(x)` and `fun(i"$x$y")`expands to `fun(x, y)`. -Any lexical errors (such as a `$` followed by a space, or unbalanced `(` and `)`) will be reported during parsing. Semantic checking will ensure that interpolated strings occur only in the contexts specified above. Then, other semantic errors will be reported during the typechecking of the lowered code. +Any lexical errors (such as a `$` followed by a space, or unbalanced `(` and `)`) will be reported during parsing. Then, other semantic errors will be reported during the typechecking of the lowered code. This concludes the syntax and semantics of the proposed feature. @@ -446,7 +450,7 @@ alias Q = AliasSeq!(i"Let's interpolate $x!"); // OK, use as type auto q = AliasSeq!(i"Let's interpolate $x!"); // OK, store as AliasSeq ``` -As mentioned, functions or templates are not aware whether they received an interpolated string or a manually written list of arguments. This confers consistency, simplicity, and uniformity to the approach. +As mentioned, functions or templates are not aware whether they received an interpolated string or a manually written list of arguments. This confers consistency, simplicity, predictability, and uniformity to the approach. It is not possible for an `f`-string to pass the string literal as a template argument and the interpolated expressions as run-time arguments: @@ -471,7 +475,7 @@ void fun(int x) { } ``` -We consider that adding special mechanisms to adapt `f`-strings to a variety of formatting conventions is disproportionately complex and adds its own liabilities. +We consider that adding special mechanisms to adapt `f`-strings to a variety of formatting conventions is disproportionately complex and adds its own liabilities. To mitigate this issue, at least for functions such as `std.stdio.format` and `std.stdio.writeln` the compiler can extent the curreny verification of format specifiers to `f`-strings as well. ## Breaking Changes and Deprecations From df042c857bf0a7542443114a6de50dd593529030 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Wed, 24 Mar 2021 10:58:26 -0400 Subject: [PATCH 06/19] email Co-authored-by: John Colvin --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index c410dd4..bd5dec1 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ |-----------------|-------------------------------------------------------------------| | DIP: | xxxx | | Review Count: | 0 | -| Author: | Andrei Alexandrescu
John Colvin jcolvin@symmetryinvestments.com| +| Author: | Andrei Alexandrescu
John Colvin john.loughran.colvin@gmail.com | | Implementation: | | | Status: | | @@ -486,4 +486,3 @@ Because `InterpolatedString` and `InterpolatedFormatString` are new tokens, no e Copyright (c) 2021 by the D Language Foundation Licensed under Creative Commons Zero 1.0 - From 8c5b18384f89d826aa0e30dd92cd792a0a09c421 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Wed, 24 Mar 2021 10:59:31 -0400 Subject: [PATCH 07/19] Update README.md Co-authored-by: John Colvin --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bd5dec1..fb0c7c7 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ ## Abstract -Textual formatting is often achieved either by APIs relying on formatting strings followed by arguments to be formatted (in the style of `printf`, `std.format.format`, and `std.stdio.writefln`), or by interspersing string fragments with arguments (in the style of `std.conv.text` and `std.stdio.writeln`). String interpolation enables embedding the arguments in the string itself. We propose an extremely simple yet powerful approach of lowering interpolated strings into comma-separated lists that works with both *format-string* style and *interspersion* style with no change to any library function. We demonstrate how this approach achieves all major objectives of an interpolated strings feature with a minimal footprint on the language definition and support library. +Textual formatting is often achieved either by APIs relying on formatting strings followed by arguments to be formatted (in the style of `printf`, `std.format.format`, and `std.stdio.writefln`), or by interspersing string fragments with arguments (in the style of `std.conv.text` and `std.stdio.writeln`). String interpolation enables embedding the arguments in the string itself. We propose an extremely simple yet powerful approach of lowering interpolated strings into comma-separated lists that works with both *format-string* style and *interspersion* style with no change to any existing library function. We demonstrate how this approach achieves all major objectives of an interpolated strings feature with a minimal footprint on the language definition and support library. ## Contents * [Rationale](#rationale) From 42e212def5c58276d5161d228a91a7e3baf6ddd3 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Wed, 24 Mar 2021 11:02:13 -0400 Subject: [PATCH 08/19] Update README.md Co-authored-by: John Colvin --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index fb0c7c7..32a2a88 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ void f2(string name) { } ``` -Each approach --- interspersion style and format-string style --- has its pros and cons. The format-string style observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. The disadvantage of the format-string style is that the expressions to be formatted appear lexically *after* the (possibly long) format string, which makes it difficult to follow how format specifiers sync with their respective arguments. +Each approach --- interspersion style and format-string style --- has its pros and cons. The format-string style observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. The disadvantage of the format-string style is that the expressions to be formatted appear lexically *separate from* the (possibly long) format string, which makes it difficult to follow which format specifiers correspond to their respective arguments. The interspersion style is simple, intuitive, and requires learning no convention. However, creating complex outputs becomes cumbersome due to the syntactic heaviness of alternating string literals and other arguments in comma-separated lists. Also, customized formatting (such as rendering an integral in hexadecimal instead of decimal) is not immediate. From 1f70605b2bd776700dabe9d697a8c2d43bd4a6bf Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Wed, 24 Mar 2021 11:05:41 -0400 Subject: [PATCH 09/19] Update README.md Co-authored-by: John Colvin --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 32a2a88..c74b50f 100644 --- a/README.md +++ b/README.md @@ -335,7 +335,7 @@ void main(string[] args) { #### Manipulator-style formatting -[C++'s iostreams](http://www.cplusplus.com/reference/iolibrary/) introduced the notion of [*stream manipulators*](https://en.cppreference.com/w/cpp/io/manip) --- special functions interspersed with the data to print, which direct the way data is to be formatted. For example, the C++ `std::dec` and `std::hex` manipulators instruct the formatting engine to format the following integral in decimal and hexadecimal, respectively: +[C++'s iostreams](http://www.cplusplus.com/reference/iolibrary/) introduced the notion of [*stream manipulators*](https://en.cppreference.com/w/cpp/io/manip) --- special functions interspersed with the data to print, which direct the way data is to be formatted. For example, the C++ `std::dec` and `std::hex` manipulators instruct the formatting engine to format the following integer in decimal and hexadecimal, respectively: ```C++ // C++ code From 03f7305aa5abab4433bbd101031f9cf65e93aa27 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Wed, 24 Mar 2021 11:09:13 -0400 Subject: [PATCH 10/19] Update README.md Co-authored-by: John Colvin --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c74b50f..6996361 100644 --- a/README.md +++ b/README.md @@ -475,7 +475,7 @@ void fun(int x) { } ``` -We consider that adding special mechanisms to adapt `f`-strings to a variety of formatting conventions is disproportionately complex and adds its own liabilities. To mitigate this issue, at least for functions such as `std.stdio.format` and `std.stdio.writeln` the compiler can extent the curreny verification of format specifiers to `f`-strings as well. +We consider that adding special mechanisms to adapt `f`-strings to a variety of formatting conventions is disproportionately complex and adds its own liabilities. To mitigate this issue, at least for functions such as `std.stdio.format` and `std.stdio.writeln` the compiler can extend the current verification of format specifiers to `f`-strings as well. ## Breaking Changes and Deprecations From 99500c4f8e67a343b5d9972a08c0ef40f5bafa29 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Wed, 24 Mar 2021 17:28:40 -0400 Subject: [PATCH 11/19] Integrate feedback sans full fledged formatting strings --- README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 6996361..bba05a6 100644 --- a/README.md +++ b/README.md @@ -53,9 +53,9 @@ void f2(string name) { } ``` -Each approach --- interspersion style and format-string style --- has its pros and cons. The format-string style observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms, such as Model-View-Controller, web development, and UX design. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. The disadvantage of the format-string style is that the expressions to be formatted appear lexically *separate from* the (possibly long) format string, which makes it difficult to follow which format specifiers correspond to their respective arguments. +Each approach --- interspersion style and format-string style --- has its pros and cons. The format-string style observes the important principle of [separating logic from display](https://www.cs.usfca.edu/~parrt/papers/mvc.templates.pdf). This principle is well respected in a variety of programming paradigms and domains, such as Model-View-Controller, UX design, and web development. Localization and internationalization applications and libraries can store all display artifacts in complete separation from program logic and swap them as needed. In a perfect separation model, there is no access to computation in the formatting strings at all, even as much as a simple addition or (in the case of `printf` format strings) even the names of the variables being printed. The disadvantage of the format-string style is that the expressions to be formatted appear lexically *separate from* the (possibly long) format string, which makes it difficult to follow which format specifiers correspond to their respective arguments. -The interspersion style is simple, intuitive, and requires learning no convention. However, creating complex outputs becomes cumbersome due to the syntactic heaviness of alternating string literals and other arguments in comma-separated lists. Also, customized formatting (such as rendering an integral in hexadecimal instead of decimal) is not immediate. +The interspersion style is simple, intuitive, and requires learning no convention. However, creating complex outputs becomes cumbersome due to the syntactic heaviness of alternating string literals and other arguments in comma-separated lists. Also, customized formatting (such as rendering an integral in hexadecimal instead of decimal) is not supported. We will demonstrate how both the format-string style and the interspersion style fall short of expectations on three categories of everyday tasks: @@ -239,7 +239,7 @@ ArgumentList: InterpolatedFormattingString , ArgumentList ``` -Inside an interpolated string, the character `$` is of particular interest because the interpolated string will use it as an escape. To render `$` verbatim inside an interpolated string, the sequence `$$` shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar, which is identical for `i`-strings and `f`-strings: +Inside an interpolated string, the character `$` is of particular interest because the interpolated string will use it as an escape. If `$` is not followed by an open paren or an identifier, its meaning is unchanged. If `$` is followed by an identifier (which starts with a `_` or alphabetic character) or open parenthesis, the `$` acts as an escape character introducing an interpolated identifier or expression. To render `$` verbatim when followed by an identifier or an open paranthesis, the sequence `$$` shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar, which is identical for `i`-strings and `f`-strings: ``` Elements: @@ -249,6 +249,7 @@ Elements: Element: Character excluding '$' '$$' + '$' Character other than '$', '_', 'a'-'z', or 'A'-'Z' '$' Identifier '$(' Type ')' '$(' AssignExpression ')' @@ -281,7 +282,7 @@ enum result = text( ); ``` -In contrast, occurrences of `$` in D code are rare --- indeed more likely to be present in generated code that uses interpolation itself. This makes `$` a disproportionately strong candidate compared to other choices. +In contrast, occurrences of `$` followed by an open parenthesis or an identifier in D code are rare --- indeed more likely to be present in generated code that uses interpolation itself. This makes `$` a disproportionately strong candidate compared to other choices. The second question --- why not use `${` and `}` instead of `$(` and `)` --- has a simple answer: the elements to group inside the escape sequences are expressions, not statements. There already exists a syntax for grouping expressions, and that's surrounding them with `(` and `)`, which closes the case by invoking the [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment). From a174c97d2c65c9a88b802f5586dee2545b3e3091 Mon Sep 17 00:00:00 2001 From: Andrei Alexandrescu Date: Tue, 11 May 2021 22:08:07 -0400 Subject: [PATCH 12/19] Remove f-strings, keep i-strings with interpolation header --- README.md | 268 ++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 168 insertions(+), 100 deletions(-) diff --git a/README.md b/README.md index bba05a6..0edba63 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ ## Abstract -Textual formatting is often achieved either by APIs relying on formatting strings followed by arguments to be formatted (in the style of `printf`, `std.format.format`, and `std.stdio.writefln`), or by interspersing string fragments with arguments (in the style of `std.conv.text` and `std.stdio.writeln`). String interpolation enables embedding the arguments in the string itself. We propose an extremely simple yet powerful approach of lowering interpolated strings into comma-separated lists that works with both *format-string* style and *interspersion* style with no change to any existing library function. We demonstrate how this approach achieves all major objectives of an interpolated strings feature with a minimal footprint on the language definition and support library. +Textual formatting is often achieved by APIs relying either on format specification strings followed by arguments to be formatted (in the style of `printf`, `std.format.format`, and `std.stdio.writefln`), or on interspersing arguments of string and non-string types (in the style of `std.conv.text` and `std.stdio.writeln`). String interpolation enables a style of formatting that embeds arguments within a literal string. We propose an extremely simple yet powerful approach of lowering interpolated strings into comma-separated lists that works with both *format-string* style and *interspersion* style with no or minimal changes to functions to take advantage of such interpolated strings. We demonstrate how this approach achieves all major objectives of an interpolated strings feature with a minimal footprint on the language definition and support library. ## Contents * [Rationale](#rationale) @@ -21,9 +21,9 @@ Textual formatting is often achieved either by APIs relying on formatting string ## Rationale -A frequent pattern in programming is to create strings (for the purpose of printing to the console, writing to files, etc) by mixing predefined string fragments with data contained in variables. We identify two distinct approaches to formatting data: *interspersion* style and *format-string* style. +A frequent pattern in programming is to create strings (for the purpose of printing to the console, writing to files, etc) by mixing predefined string fragments (literal strings) with data contained in variables. We identify two distinct approaches to formatting data: *interspersion* style and *format-string* style. -The most straightforward approach to implement that pattern is to intersperse expressions with string fragments in calls to variadic functions: +The most straightforward approach is to intersperse expressions with string literals in calls to variadic functions: ```d void f1(string name) { @@ -33,7 +33,7 @@ void f1(string name) { } ``` -A more flexible approach, embodied by the classic `printf` family of C functions and carried over to D standard library functions such as `std.format.format` and `std.stdio.writefln`, is to use *format strings* that contain conventionally defined *formatting specifiers*. Specialized functions take such format strings followed by the arguments to be formatted and replace each formatting directive with suitably formatted data: +A more flexible approach, embodied by the classic `printf` family of C functions and carried over to D standard library functions such as `std.format.format` and `std.stdio.writefln`, is to use *format strings* that contain conventionally defined *format specifiers*. Specialized functions take such format strings followed by the arguments to be formatted and replace each format specifier with suitably formatted data: ```d void f2(string name) { @@ -47,8 +47,8 @@ Other examples of the format-string style are string templates for formatting HT ```d void f2(string name) { - htmlOutput("Looking for #{}...", name); - auto rows = sql("SELECT * FROM t WHERE name = ? AND active = true", name); + htmlOutput("Looking for #{}...", name); // specifier is #{} + auto rows = sql("SELECT * FROM t WHERE name = ?", name); // specifier is a question mark ... } ``` @@ -57,7 +57,7 @@ Each approach --- interspersion style and format-string style --- has its pros a The interspersion style is simple, intuitive, and requires learning no convention. However, creating complex outputs becomes cumbersome due to the syntactic heaviness of alternating string literals and other arguments in comma-separated lists. Also, customized formatting (such as rendering an integral in hexadecimal instead of decimal) is not supported. -We will demonstrate how both the format-string style and the interspersion style fall short of expectations on three categories of everyday tasks: +Below we provide evidence to the difficulties of both the format-string style and the interspersion style for three categories of typical tasks: - *code generation:* when generating code, the tight integration between the string literal fragments and the expressions to be inserted is essential to the process; - *scripting:* shell scripts use string interpolation very frequently, to the extent that the mechanics of quoting and interpolation is an essential focus of all shell scripting languages; @@ -75,7 +75,7 @@ enum result = text( ); ``` -(The [original code](https://github.com/dlang/phobos/blob/v2.095.1/std/bitmanip.d#L115) uses string concatenation instead of a call to `text`. We use the latter to simplify the example.) Here, the interspersion mechanics (closing quote, comma, expression, comma, opening quote) distract from following the correctness of the generated code. An approach based on format specifiers would look as follows: +(The [original code](https://github.com/dlang/phobos/blob/v2.095.1/std/bitmanip.d#L115) uses string concatenation instead of a call to `text`. We use the latter to simplify the example.) Here, the interspersion mechanics (closing quote, comma, expression, comma, opening quote) distract the reader from following the correctness of the generated code. An approach based on format specifiers would look as follows: ```d enum result = format( @@ -122,7 +122,7 @@ enum result = text( ); ``` -The latter form has dramatically less syntactic noise and appears as a single string with expressions inside escaped by `$`. Correctness of the generated code is much easier to assess in the second form as well. +The latter form has dramatically less syntactic noise and appears as a single string with expressions inside escaped by `$`. Correctness of the generated code is much easier to assess as well. Let us also look at a shell command example. Assume `url` is an URL and `file` is an filename, both preprocessed as escaped shell strings. To download `url` into `file` without risking corrupt files in case of incomplete downloads, the code below first downloads into a temporary file with the extension `.frag` and then atomically renames it to the correct name: @@ -157,9 +157,9 @@ This DIP derives from, and owes much to, the previous work on string interpolati This DIP is close to the prior work yet different in key aspects as follows: -- Like [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988), this DIP expands the interpolated string into an argument list. However, unlike that proposal that automatically passes the expansion as an argument list for `std.typecons.tuple`, we expand into an argument list. We will show how doing so has significant flexibility and efficiency advantages. -- Like [DIP 1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), this DIP expands the interpolated string into a list. Unlike DIP 1027, which only supports the format-string style, this proposal supports both the interspersion and the format-string style. It also has a simpler syntax and semantics. -- We heed important lessons from [DIP 1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), mainly that while pursuing generality, complexity must be kept under control. In wake of it, we concluded that unification of the format-string style and interspersion approach with a single interpolation syntax is not the appropriate goal. Instead, we recognize the two goals as distinct and propose distinct constructs for them. +- Like [Jonathan Marler's Interpolated Strings](http://github.com/dlang/dmd/pull/7988), this DIP expands the interpolated string into an argument list. However, unlike that proposal that automatically passes the expansion as an argument list for `std.typecons.tuple`, we expand into an argument list and leave the rest to user code. We will show how doing so has significant flexibility and efficiency advantages. +- Like [DIP 1027](https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md), this DIP expands the interpolated string into an argument list. Unlike DIP 1027, which only supports the format-string style, this proposal supports both the interspersion and the format-string style. It also has a simpler syntax and semantics. +- We heed important lessons from [DIP 1036](https://github.com/dlang/DIPs/blob/master/DIPs/DIP1036.md), mainly that while pursuing generality, complexity must be kept under control. We will demonstrate how this DIP achieves all major goals of extant proposals with a radically simpler definition and implementation. @@ -175,21 +175,21 @@ For many such examples, see [String Interpolation](https://en.wikipedia.org/wiki ## Description -D strings have several syntactical form, among which a few that follow the pattern of a letter followed by the string proper. Such include `r"WYSIWYG strings"`, `q"[delimited strings]"`, and `q{token strings}`. Our proposal follows the same pattern by introducing `i"interpolated strings"` and `f"interpolated formatting strings"`. +D strings have several syntactical form, among which a few that follow the pattern of a letter followed by the string proper. Such include `r"WYSIWYG strings"`, `q"[delimited strings]"`, and `q{token strings}`. Our proposal follows the same pattern by introducing `i"interpolated strings"`. -An *interpolated string* is a regular D string prefixed with the letter `i`, as in `i"Hello"`. An *interpolated formatting string* is a regular D string prefixed with the letter `f`, as in `f"world"`. No whitespace is allowed between `i` or `f` and the opening quote. The first expands into the interspersed style, and the second expands into the format-string style. We refer to them as `i`-strings and `f`-strings, respectively. +An *interpolated string* is a regular D string prefixed with the letter `i`, as in `i"Hello"`. No whitespace is allowed between `i` and the opening quote. We refer to these construct as `i`-strings. -An `i`-string or an `f`-string may occur only in one of the following contexts: +An `i`-string is allowed in source code only in one of the following contexts: - in the argument list of a function or constructor call; -- in the argument list of a `mixin` (`i`-strings only); -- in the argument list of a `pragma(msg)` directive (`i`-strings only); -- in the argument list (starting with the second argument) of an `assert` or `static assert` invocation, contingent to fixing [Issue 17378](https://issues.dlang.org/show_bug.cgi?id=17378) (`i`-strings only); and +- in the argument list of a `mixin`; +- in the argument list of a `pragma(msg)` directive; +- in the argument list (starting with the second argument) of an `assert` or `static assert` invocation, contingent to fixing [Issue 17378](https://issues.dlang.org/show_bug.cgi?id=17378); and - in the argument list of a template instantiation. -In any other context, `i`-string or an `f`-string are illegal. +In any other context, `i`-strings are illegal. -For an example of `i`-strings, the function call expression: +For an example of `i`-strings usage, the function call expression: ```d writeln(i"I ate $apples apples and $bananas bananas totalling $(apples + bananas) fruit.") @@ -198,20 +198,10 @@ writeln(i"I ate $apples apples and $bananas bananas totalling $(apples + bananas is lowered into: ```d -writeln("I ate ", apples, " apples and ", bananas, " bananas totalling ", apples + bananas, " fruit.") +writeln(__header, "I ate ", apples, " apples and ", bananas, " bananas totalling ", apples + bananas, " fruit.") ``` -For an example of `f`-strings, the function call expression: - -```d -writefln(f"I ate %s$apples apples and %s$bananas bananas totalling %s$(apples + bananas) fruit.") -``` - -is lowered into: - -```d -writefln(f"I ate %s apples and %s bananas totalling %s fruit.", apples, bananas, apples + bananas) -``` +The `__header` value, to be discussed later, is generated by the compiler and contains compile-time information about the interpolated string. The resulting lowered code is subjected to the usual typechecking and has the same semantics as if the lowered code were present in the source. @@ -220,11 +210,9 @@ After introducing an intuition of how interpolated string work, let us formalize ``` InterpolatedString: i" DoubleQuotedCharacters " -InterpolatedFormattingString: - f" DoubleQuotedCharacters " ``` -The `InterpolatedString` and `InterpolatedFormattingString` appear in the parser grammar as an `InterpolatedList`, which is under `ArgumentList`. +The `InterpolatedString` appears in the parser grammar as an `InterpolatedList`, which is under `ArgumentList`. ``` ArgumentList: @@ -234,12 +222,9 @@ ArgumentList: InterpolatedString InterpolatedString , InterpolatedString , ArgumentList - InterpolatedFormattingString - InterpolatedFormattingString , - InterpolatedFormattingString , ArgumentList ``` -Inside an interpolated string, the character `$` is of particular interest because the interpolated string will use it as an escape. If `$` is not followed by an open paren or an identifier, its meaning is unchanged. If `$` is followed by an identifier (which starts with a `_` or alphabetic character) or open parenthesis, the `$` acts as an escape character introducing an interpolated identifier or expression. To render `$` verbatim when followed by an identifier or an open paranthesis, the sequence `$$` shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar, which is identical for `i`-strings and `f`-strings: +Inside an interpolated string, the character `$` is of particular interest because the interpolated string will use it as an escape. If `$` is not followed by an open paren or an identifier, its meaning is unchanged. If `$` is followed by an identifier (which starts with a `_` or alphabetic character) or open parenthesis, the `$` acts as an escape character introducing an interpolated identifier or expression. To render `$` verbatim when followed by an identifier or an open paranthesis, the sequence `$$` shall be used. The contents of the `InterpolatedExpression` must conform to the following grammar: ``` Elements: @@ -257,14 +242,105 @@ Element: In the grammar above `Type` is the nonterminal for types, and `AssignExpression` is the nonterminal for general D assignment expressions. For details refer to the current grammar at https://dlang.org/spec/grammar.html. -The `InterpolatedString` is lowered to a comma-separated list that consists of the string fragments interspersed with the expressions escaped by `$`. The `InterpolatedFormattingString` is lowered to the string literal fragments stitched together, followed by all escaped fragments, in lexical order. +The `InterpolatedString` is lowered to a comma-separated list that consists of a header followed by the string fragments interspersed with the expressions escaped by `$`. + +Any lexical errors (such as unbalanced `(` and `)`) will be reported during parsing. Then, semantic errors will be reported during the typechecking of the lowered code. + +### Normalization of interspersion + +The lowering is normalized such that: + +(a) A `__header` object is always the first element in the resulting argument list; +(b) The rest of the resulting argument list always starts with a string literal; +(c) The argument list always ends with a string literal; and +(d) The argument list always alternates between literal strings and expressions, i.e. there are never two consecutive literal strings or two consecutive expressions. + +To effect these rules, the lowering may introduce empty strings as follows. + +If an interpolated string starts with an escape sequence, an empty string is always introduced before it in the expansion. Example: + +```D +writeln("$name, hi!"); +``` + +is lowered to: + +```D +writeln(__header, "", name, " hi!"); +``` + +If an interpolated string ends with an escape sequence, an empty string is always introduced after it in the expansion. Example: + +```D +writeln("Hello, world$exclamation"); +``` + +is lowered to: + +```D +writeln(__header, "Hello, world", exclamation); +``` + +Finally, if an `i`-string contains two consecutive expansions, the lowering will introduce an empty string literal in between. Example: + +```D +writeln("Hello, $name$exclamation How are you?"); +``` + +is lowered to: + +```D +writeln(__header, "Hello", name, "", exclamation, " How are you?"); +``` + +These rules can apply simultaneously on the same `i`-string. Example: + +```D +writeln("$greeting, $name$exclamation"); +``` + +is lowered to: + +```D +writeln(__header, "", $greeting, ", ", name, "", exclamation, ""); +``` + +The purpose of normalization, as can be seen in the given examples, is to always have the argument list in a fixed pattern: *header*, *string-literal*, *expression*, *string-literal*, *expression*, ..., *string-literal*. + +## The Interpolation Header + +Every `i`-string lowering introduces a header rvalue object that so far we conventionally denoted as `__header` (the exact name is uniquely compiler-generated and hence inaccessible). The header is a stateless `struct` that contains exclusively compile-time information about the interpolation, generated from the following template: -An `f`-string expansion produces a literal string in the first position even if that would be empty: `fun(f"$x")` lowers to `fun("", x)`, not `fun(x)`. In contrast, `i`-string expansion never produces empty string literals: `fun(i"$x")`expands to `fun(x)` and `fun(i"$x$y")`expands to `fun(x, y)`. +```D +struct InterpolationHeader(_parts...) { + alias parts = _parts; + string toString() { return null; } +} +``` + +The argument to the instantiation of `Header` is the interpolation string deconstructed into parts and normalized. Example: + +```D +writeln("$greeting, $name$exclamation"); +``` + +is lowered to: + +```D +writeln(InterpolationHeader!("", "greeting", ", ", "name", "", "exclamation", "")(), + "", $greeting, ", ", name, "", exclamation, ""); +``` + +Note how the header gives the callee complete access to the strings corresponding to the expressions passed in. + +Due to normalization, `parts` always has an odd number of elements. Strings at even indices in `parts` always originate in the literal fragments of the interpolated string. Strings at odd indices are always string representations of D expressions. -Any lexical errors (such as a `$` followed by a space, or unbalanced `(` and `)`) will be reported during parsing. Then, other semantic errors will be reported during the typechecking of the lowered code. +The header object has a trivial `toString` method that expands to the `null` string. This method makes it possible to pass interpolated strings directly to functions such as `writeln` and `text` because these functions detect and use `toString` to convert unknown data types to strings. More sophisticated functions that do want to detect interpolated strings can detect the presence of `Header` objects by simple introspection. This concludes the syntax and semantics of the proposed feature. +### Discussion: The Choice of Escape Grammar + Why choose the `$` when many popular languages and libraries (Python, C++20, C#) use `{` and `}` as escape characters? Also, why use `$(` and `)` as opposed to `${` and `}`, as perhaps a bash user may be more familiar with? One essential use of interpolation, specific to D, is for code generation. In generated D code, curly braces `{` and `}` are abundant. Requiring `{{` and `}}` everywhere in the generated code would have been aggravating. Anecdotal evidence has been collected in the creation of this DIP, which initially attempted to use `{` and `}` for escaping: the examples extracted from `std.bitmanip.bitfields` turned out to have numerous bugs, and be difficult to read when corrected: @@ -282,15 +358,15 @@ enum result = text( ); ``` -In contrast, occurrences of `$` followed by an open parenthesis or an identifier in D code are rare --- indeed more likely to be present in generated code that uses interpolation itself. This makes `$` a disproportionately strong candidate compared to other choices. +In contrast, occurrences of `$` followed by an open parenthesis or an identifier in D code are rare --- indeed most likely to be present in generated code that uses interpolation itself. This makes `$` a disproportionately strong candidate compared to other choices. The second question --- why not use `${` and `}` instead of `$(` and `)` --- has a simple answer: the elements to group inside the escape sequences are expressions, not statements. There already exists a syntax for grouping expressions, and that's surrounding them with `(` and `)`, which closes the case by invoking the [principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment). -### Use Cases +## Use Cases Although the proposed feature is deceptively simple, its flexibility affords a multitude of use cases. They can be easily assessed with a current D compiler by typing in the lowered code. -#### Passing arguments to functions +### Passing arguments to functions The simplest use case of interpolated strings, and the most likely to be encountered in practice, is as argument to functions such as `writeln`, `text`, `writefln`, or `format`: @@ -299,41 +375,53 @@ void main(string[] args) { import std.stdio; writeln(i"The program $(args[0]) received $(args.length - 1) arguments."); // Lowering: ---> - // writeln("The program ", args[0], " received ", args.length - 1, " arguments."); - - writefln(f"The program %s$(args[0]) received %s$(args.length - 1) arguments."); - // Lowering: ---> - // writefln("The program %s received %s arguments.", args[0], args.length - 1); + // writeln(InterpolationHeader!("The program ", "args[0]", " received ", "args.length - 1", " arguments.")(), + // "The program ", args[0], " received ", args.length - 1, " arguments."); - auto s = sqlExec(f"INSERT INTO runs VALUES(?$(args[0]), ?$(args.length - 1))"); + auto s = sqlExec(i"INSERT INTO runs VALUES ($(args[0]), $(args.length - 1))"); // Lowering: ---> - // auto s = sqlExec("INSERT INTO runs VALUES(?, ?)", args[0], args.length - 1); + // auto s = sqlExec(InterpolationHeader!("INSERT INTO runs VALUES(", "args[0]", ", ", "args.length - 1", ")")(), + // args[0], $(args.length - 1)); } ``` -A function such as `std.stdio.writeln` or `std.stdio.writefln` above has no way to know whether it was called via an interpolated string or with arguments specified as in the corresponding lowering; interpolated strings are purely a call-side mechanism. We consider this a key characteristic of the proposed feature that drastically simplifies its definition and interoperation with existing code. +A function such as `std.stdio.writeln` may choose to uniformly convert the header to string by means of calling its `toString` method, thus essentially working with interpolated strings without modification. The second possibility is that the function detects the header but "skips" it and ignores the information it provides, which is easy to accommodate on the function implementer's side. Finally, a function may choose to fully support `i`-strings with specialized semantics. We consider this flexibility a key characteristic of the proposed feature that drastically simplifies both its definition, unrestandability, and interoperation with new and existing code. + +A format-string-style function such as `writefln` does not work unchanged with interpolated strings because the lowering is unhelpful: + +```D +writefln(i"Hello, %s$name %s$surname!"); +// Lowering: ---> +// writefln(InterpolationHeader!("Hello, %s", "name", " %s", "surname", "!")(), +// name, surname); +``` + +However, there is enough information in the interpolated string to allow `writefln` to work transparently with interpolated strings by detecting and processing during compilation the header information: -For `f`-strings, the convention for format specifiers varies with the API --- for example, `printf` and `std.stdio.writefln` use the well-known `%`-prefixed specifiers, whereas SQL traditionally uses `?`. For that reason and to keep complexity to a minimum, `f`-strings do not try to be clever and provide their own convention and translation mechanism; instead, they just concatenate the string literal fragments together just like the user wrote them, and follows them with the interpolated expressions. An `f`-string does not create any text. +```D +// Possible semantics: format spec precedes argument +writefln(i"Hello, %s$name %s$surname!"); +// Result: +// "Hello, John Smith!" +``` #### Saving the result of interpolation as a tuple -Although it may seem limiting to impose that interpolated strings expand in a function call, `tuple` offers an immediate and efficient mechanism for storing the result of interpolation. The following program produces the same output as the previous one: +Although it may seem limiting to impose that interpolated strings expand to an argument list (most often in a call to a user-provided function call), `tuple` offers an immediate and efficient mechanism for storing the result of interpolation. The following program produces the same output as the previous one: ```d void main(string[] args) { import std.stdio; auto t1 = tuple(i"The program $(args[0]) received $(args.length - 1) arguments."); // Lowering: ---> - // auto t1 = tuple("The program ", args[0], " received ", args.length - 1, " arguments."); + // auto t1 = tuple(InterpolationHeader!("The program ", "args[0]", " received ", "args.length - 1", " arguments.")(), + // "The program ", args[0], " received ", args.length - 1, " arguments."); writeln(t1.expand); - - auto t2 = tuple(f"The program %s$(args[0]) received %s$(args.length - 1) arguments."); - // Lowering: ---> - // auto t2 = tuple("The program %s received %s arguments.", args[0], args.length - 1); - writefln(t2.expand); } ``` +With the existing implementation, `tuple` will save the interpolation header as its first data member. Of course, it is easy to adjust its implementation to drop it if deemed necessary. + #### Manipulator-style formatting [C++'s iostreams](http://www.cplusplus.com/reference/iolibrary/) introduced the notion of [*stream manipulators*](https://en.cppreference.com/w/cpp/io/manip) --- special functions interspersed with the data to print, which direct the way data is to be formatted. For example, the C++ `std::dec` and `std::hex` manipulators instruct the formatting engine to format the following integer in decimal and hexadecimal, respectively: @@ -346,53 +434,60 @@ void fun(int x) { } ``` -The approach is obviously extensible by simply adding new manipulators. Unfortunately, C++ stream manipulators have developed a poor reputation because they are very heavy syntactically --- interspersion is done with the visually prominent `<<` and the user must choose between polluting their namespace and using the `std::` prefix for scope resolution with each manipulator. (These issues and other unrelated ones motivated the introduction of the `` facility in C++20.) +The approach is obviously extensible by simply adding new manipulators. Unfortunately, C++ stream manipulators have developed a poor reputation because they are syntactically heavy --- interspersion is done with the visually prominent `<<` and the user must choose between polluting their namespace and using the `std::` prefix for scope resolution with each manipulator. (These issues and other unrelated ones motivated the introduction of the `` facility in C++20.) Using interpolation, a manipulators-based approach can be used elegantly and implemented with minimal effort. Consider for example using stream manipulators such as `dec` and `hex` for `writeln` by using an `i`-string: -```d +```D void fun(int x) { writeln(i"$dec$x in hexadecimal is 0x$hex$x."); // Lowering: ---> - // writeln(dec, x, " in hexadecimal is 0x", hex, x, "."); + // writeln(InterpolationHeader!("", "dec", "", "x", " in hexadecimal is 0x", "hex", "", "x", ".")(), + // "", dec, "", x, " in hexadecimal is 0x", hex, "", x, "."); } ``` -There is no need for defining, implementing, and memorizing a mini-language of encoded format specifiers --- all formatting can be done with D language expressions. Continuing the example, the library can just as easily define parameterized formatting for floating-point numbers, such as width, precision, and scientific notation: +There is no need for defining, implementing, and memorizing a *sui generis* mini-language of encoded format specifiers --- all formatting can be done with D language expressions. Continuing the example, the library can just as easily define parameterized formatting for floating-point numbers, such as width, precision, and scientific notation: ```d void fun(double x) { - writeln(i"$x can be written as $scientific$x and its exact value is $(fixed(20, 10))$x"); + writeln(i"$x can be written as $scientific$x or $(fixed(20, 10))$x."); // Lowering: ---> - // writeln(x, " can be written as ", scientific, x, " and its exact value is ", fixed(20, 10), x); + // writeln(InterpolationHeader!("", "x", " can be written as ", scientific, "", x, " or ", fixed(20, 10), "", x, ".")(), + // x, " can be written as ", scientific, "", x, " or ", fixed(20, 10), "", x, "."); } ``` #### Use in `mixin` declarations and expressions -`i`-strings (but not `f`-strings) are allowed in `mixin` declarations and expressions: +`i`-strings are allowed in `mixin` declarations and expressions. ```d immutable x = "asd", y = 42; mixin(i"int $x = $y;"); // Lowering ---> -// mixin("int ", x, " = ", y, ";"); +// mixin(InterpolationHeader!("int ", "x", " = ", "y", ";")(), +// "int ", x, " = ", y, ";"); auto z = mixin(i"$x + 5"); // Lowering ---> // auto z = mixin(x, " + 5"); ``` +The interpolation header is ignored by `mixin`s. + #### Use in `pragma(msg)` directives -`i`-strings (but not `f`-strings) are allowed in `pragma(msg)` directives: +`i`-strings are allowed in `pragma(msg)` directives: ```d enum x = 42; pragma(msg, i"x = $x."); // Lowering ---> -// pragma(msg, "x = ", x, "."); +// pragma(msg, InterpolationHeader!("x = ", "x", ".")(), "x = ", x, "."); ``` +`i`-strings are allowed in `pragma(msg)` directives. + Note that `pragma(msg)` is already variadic. Currently `assert` and `static assert` are not variadic, so they need to be helped with `text` or `format`: ```d @@ -417,7 +512,7 @@ alias Calculator = Grammar!( Factor := $identifier | '(' Expression ')' " ); // Lowering: ---> -// alias Calculator = Grammar!( +// alias Calculator = Grammar!(InterpolationHeader!(...)(), // "Expression := Term + Term // Term := Factor * Factor // Factor := ", identifier, " | '(' Expression ')' " @@ -451,36 +546,9 @@ alias Q = AliasSeq!(i"Let's interpolate $x!"); // OK, use as type auto q = AliasSeq!(i"Let's interpolate $x!"); // OK, store as AliasSeq ``` -As mentioned, functions or templates are not aware whether they received an interpolated string or a manually written list of arguments. This confers consistency, simplicity, predictability, and uniformity to the approach. - -It is not possible for an `f`-string to pass the string literal as a template argument and the interpolated expressions as run-time arguments: - -```d -void main(string[] args) { - import std.stdio; - // No equivalent using interpolation - writefln!"The program %s received %s arguments."(args[0], args.length - 1); -} -``` - -`f`-strings do not provide special rules to match format specifiers (such as `%s` in `std.stdio.writefln` or `?` in SQL prepared statements) against arguments: - -```d -void fun(int x) { - writefln(f"Adding %s$x..."); // prints "Adding 42 ..." - writefln(f"Adding $x..."); // prints "Adding ..." - sqlExec(f"INSERT INTO t VALUES(?$x)"); // OK - sqlExec(f"INSERT INTO t VALUES($x?)"); // OK, equivalent - sqlExec(f"INSERT INTO t$x VALUES(?)"); // OK, equivalent but perverse - sqlExec(f"INSERT INTO t VALUES($x)"); // Runtime error, missing '?' binding -} -``` - -We consider that adding special mechanisms to adapt `f`-strings to a variety of formatting conventions is disproportionately complex and adds its own liabilities. To mitigate this issue, at least for functions such as `std.stdio.format` and `std.stdio.writeln` the compiler can extend the current verification of format specifiers to `f`-strings as well. - ## Breaking Changes and Deprecations -Because `InterpolatedString` and `InterpolatedFormatString` are new tokens, no existing code is broken. +Because `InterpolatedString` is a new token, no existing code is broken. ## Copyright & License From 9c4ad9757203ff5fe171abe0df869c32e5ef2041 Mon Sep 17 00:00:00 2001 From: Nick Treleaven Date: Tue, 1 Jun 2021 17:58:08 +0100 Subject: [PATCH 13/19] Simple fixes Tweak wording. Fix lowering. Replace a remaining f-string. --- README.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 0edba63..5e83629 100644 --- a/README.md +++ b/README.md @@ -175,14 +175,14 @@ For many such examples, see [String Interpolation](https://en.wikipedia.org/wiki ## Description -D strings have several syntactical form, among which a few that follow the pattern of a letter followed by the string proper. Such include `r"WYSIWYG strings"`, `q"[delimited strings]"`, and `q{token strings}`. Our proposal follows the same pattern by introducing `i"interpolated strings"`. +D strings have several syntactical forms, among which are those that follow the pattern of a letter followed by the string proper. Such include `r"WYSIWYG strings"`, `q"[delimited strings]"`, and `q{token strings}`. Our proposal follows the same pattern by introducing `i"interpolated strings"`. -An *interpolated string* is a regular D string prefixed with the letter `i`, as in `i"Hello"`. No whitespace is allowed between `i` and the opening quote. We refer to these construct as `i`-strings. +An *interpolated string* is a regular D string prefixed with the letter `i`, as in `i"Hello"`. No whitespace is allowed between `i` and the opening quote. We refer to these constructs as `i`-strings. An `i`-string is allowed in source code only in one of the following contexts: - in the argument list of a function or constructor call; -- in the argument list of a `mixin`; +- in the argument list of a string `mixin`; - in the argument list of a `pragma(msg)` directive; - in the argument list (starting with the second argument) of an `assert` or `static assert` invocation, contingent to fixing [Issue 17378](https://issues.dlang.org/show_bug.cgi?id=17378); and - in the argument list of a template instantiation. @@ -381,11 +381,11 @@ void main(string[] args) { auto s = sqlExec(i"INSERT INTO runs VALUES ($(args[0]), $(args.length - 1))"); // Lowering: ---> // auto s = sqlExec(InterpolationHeader!("INSERT INTO runs VALUES(", "args[0]", ", ", "args.length - 1", ")")(), - // args[0], $(args.length - 1)); + // "INSERT INTO runs VALUES(", args[0], ", ", args.length - 1, ")"); } ``` -A function such as `std.stdio.writeln` may choose to uniformly convert the header to string by means of calling its `toString` method, thus essentially working with interpolated strings without modification. The second possibility is that the function detects the header but "skips" it and ignores the information it provides, which is easy to accommodate on the function implementer's side. Finally, a function may choose to fully support `i`-strings with specialized semantics. We consider this flexibility a key characteristic of the proposed feature that drastically simplifies both its definition, unrestandability, and interoperation with new and existing code. +A function such as `std.stdio.writeln` may choose to uniformly convert the header to string by means of calling its `toString` method, thus essentially working with interpolated strings without modification. The second possibility is that the function detects the header but "skips" it and ignores the information it provides, which is easy to accommodate on the function implementer's side. Finally, a function may choose to fully support `i`-strings with specialized semantics. We consider this flexibility a key characteristic of the proposed feature that drastically simplifies both its definition, understandability, and interoperation with new and existing code. A format-string-style function such as `writefln` does not work unchanged with interpolated strings because the lowering is unhelpful: @@ -393,7 +393,7 @@ A format-string-style function such as `writefln` does not work unchanged with i writefln(i"Hello, %s$name %s$surname!"); // Lowering: ---> // writefln(InterpolationHeader!("Hello, %s", "name", " %s", "surname", "!")(), -// name, surname); +// "Hello, %s", name, " %s", surname, "!"); ``` However, there is enough information in the interpolated string to allow `writefln` to work transparently with interpolated strings by detecting and processing during compilation the header information: @@ -470,7 +470,8 @@ mixin(i"int $x = $y;"); // "int ", x, " = ", y, ";"); auto z = mixin(i"$x + 5"); // Lowering ---> -// auto z = mixin(x, " + 5"); +// auto z = mixin(InterpolationHeader!("", "x", " + 5")(), +// "", x, " + 5"); ``` The interpolation header is ignored by `mixin`s. @@ -483,7 +484,8 @@ The interpolation header is ignored by `mixin`s. enum x = 42; pragma(msg, i"x = $x."); // Lowering ---> -// pragma(msg, InterpolationHeader!("x = ", "x", ".")(), "x = ", x, "."); +// pragma(msg, InterpolationHeader!("x = ", "x", ".")(), +// "x = ", x, "."); ``` `i`-strings are allowed in `pragma(msg)` directives. @@ -493,7 +495,7 @@ Note that `pragma(msg)` is already variadic. Currently `assert` and `static asse ```d void fun(int x)(int y) { static assert(x < 42, text(i"x is $x, should be less than 42")); - assert(y > 42, format(f"y is $y, should be greater than 42")); + assert(y > 42, format(i"y is %s$y, should be greater than 42")); } ``` From d51623552485f4063eaa390f65cea705bed2edfc Mon Sep 17 00:00:00 2001 From: Nick Treleaven Date: Tue, 1 Jun 2021 18:24:35 +0100 Subject: [PATCH 14/19] Improve AliasSeq comments Q is a value sequence, not a type. Also rename Q -> p, because it holds values, not types. --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 0edba63..7118572 100644 --- a/README.md +++ b/README.md @@ -523,8 +523,8 @@ In certain cases, the interpolated string can be passed to `AliasSeq` as well re ```d int x = 42; -alias Q = AliasSeq!(i"I'm interpolating $x here."); // OK, use Q as a type -auto q = AliasSeq!(i"I'm interpolating $x here."); // OK, store as AliasSeq object +alias p = AliasSeq!(i"I'm interpolating $x here."); // p is a value sequence +auto q = AliasSeq!(i"I'm interpolating $x here."); // q is an implicit tuple ``` ### Limitations and tradeoffs @@ -542,8 +542,8 @@ The remedy is simple and may be suggested by the text of the error message: use int x = 42; auto s = text(i"Let's interpolate $x!"); // OK, store as string auto t = tuple(i"Let's interpolate $x!"); // OK, store as tuple -alias Q = AliasSeq!(i"Let's interpolate $x!"); // OK, use as type -auto q = AliasSeq!(i"Let's interpolate $x!"); // OK, store as AliasSeq +alias p = AliasSeq!(i"Let's interpolate $x!"); // OK, value sequence +auto q = AliasSeq!(i"Let's interpolate $x!"); // OK, store as an implicit tuple ``` ## Breaking Changes and Deprecations From f6b6e84b44f2503163edd66341593cc8acffdd1f Mon Sep 17 00:00:00 2001 From: "Quirin F. Schroll" Date: Wed, 8 Dec 2021 18:23:26 +0100 Subject: [PATCH 15/19] Corrected error --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0edba63..49ffd5e 100644 --- a/README.md +++ b/README.md @@ -278,7 +278,7 @@ writeln("Hello, world$exclamation"); is lowered to: ```D -writeln(__header, "Hello, world", exclamation); +writeln(__header, "Hello, world", exclamation, ""); ``` Finally, if an `i`-string contains two consecutive expansions, the lowering will introduce an empty string literal in between. Example: From ca0489c342e9517ca71ff4e3e1e0d07071377aad Mon Sep 17 00:00:00 2001 From: "Adam D. Ruppe" Date: Tue, 17 Oct 2023 21:06:41 -0400 Subject: [PATCH 16/19] fqn of interpolation header --- README.md | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index a485fe2..c58aa31 100644 --- a/README.md +++ b/README.md @@ -312,6 +312,9 @@ The purpose of normalization, as can be seen in the given examples, is to always Every `i`-string lowering introduces a header rvalue object that so far we conventionally denoted as `__header` (the exact name is uniquely compiler-generated and hence inaccessible). The header is a stateless `struct` that contains exclusively compile-time information about the interpolation, generated from the following template: ```D +// given a core module you can import to match with an is expression if desired +module core.interpolation; + struct InterpolationHeader(_parts...) { alias parts = _parts; string toString() { return null; } @@ -327,7 +330,10 @@ writeln("$greeting, $name$exclamation"); is lowered to: ```D -writeln(InterpolationHeader!("", "greeting", ", ", "name", "", "exclamation", "")(), +// note the fullly qualified name here including import +// is to ensure the header name cannot be overridden by +// any user type and is thus uniquely identified +writeln(.object.imported!"core.interpolation".InterpolationHeader!("", "greeting", ", ", "name", "", "exclamation", "")(), "", $greeting, ", ", name, "", exclamation, ""); ``` @@ -375,12 +381,12 @@ void main(string[] args) { import std.stdio; writeln(i"The program $(args[0]) received $(args.length - 1) arguments."); // Lowering: ---> - // writeln(InterpolationHeader!("The program ", "args[0]", " received ", "args.length - 1", " arguments.")(), + // writeln(.object.imported!"core.interpolation".InterpolationHeader!("The program ", "args[0]", " received ", "args.length - 1", " arguments.")(), // "The program ", args[0], " received ", args.length - 1, " arguments."); auto s = sqlExec(i"INSERT INTO runs VALUES ($(args[0]), $(args.length - 1))"); // Lowering: ---> - // auto s = sqlExec(InterpolationHeader!("INSERT INTO runs VALUES(", "args[0]", ", ", "args.length - 1", ")")(), + // auto s = sqlExec(.object.imported!"core.interpolation".InterpolationHeader!("INSERT INTO runs VALUES(", "args[0]", ", ", "args.length - 1", ")")(), // "INSERT INTO runs VALUES(", args[0], ", ", args.length - 1, ")"); } ``` @@ -392,7 +398,7 @@ A format-string-style function such as `writefln` does not work unchanged with i ```D writefln(i"Hello, %s$name %s$surname!"); // Lowering: ---> -// writefln(InterpolationHeader!("Hello, %s", "name", " %s", "surname", "!")(), +// writefln(.object.imported!"core.interpolation".InterpolationHeader!("Hello, %s", "name", " %s", "surname", "!")(), // "Hello, %s", name, " %s", surname, "!"); ``` @@ -414,7 +420,7 @@ void main(string[] args) { import std.stdio; auto t1 = tuple(i"The program $(args[0]) received $(args.length - 1) arguments."); // Lowering: ---> - // auto t1 = tuple(InterpolationHeader!("The program ", "args[0]", " received ", "args.length - 1", " arguments.")(), + // auto t1 = tuple(.object.imported!"core.interpolation".InterpolationHeader!("The program ", "args[0]", " received ", "args.length - 1", " arguments.")(), // "The program ", args[0], " received ", args.length - 1, " arguments."); writeln(t1.expand); } @@ -442,7 +448,7 @@ Using interpolation, a manipulators-based approach can be used elegantly and imp void fun(int x) { writeln(i"$dec$x in hexadecimal is 0x$hex$x."); // Lowering: ---> - // writeln(InterpolationHeader!("", "dec", "", "x", " in hexadecimal is 0x", "hex", "", "x", ".")(), + // writeln(.object.imported!"core.interpolation".InterpolationHeader!("", "dec", "", "x", " in hexadecimal is 0x", "hex", "", "x", ".")(), // "", dec, "", x, " in hexadecimal is 0x", hex, "", x, "."); } ``` @@ -453,7 +459,7 @@ There is no need for defining, implementing, and memorizing a *sui generis* mini void fun(double x) { writeln(i"$x can be written as $scientific$x or $(fixed(20, 10))$x."); // Lowering: ---> - // writeln(InterpolationHeader!("", "x", " can be written as ", scientific, "", x, " or ", fixed(20, 10), "", x, ".")(), + // writeln(.object.imported!"core.interpolation".InterpolationHeader!("", "x", " can be written as ", scientific, "", x, " or ", fixed(20, 10), "", x, ".")(), // x, " can be written as ", scientific, "", x, " or ", fixed(20, 10), "", x, "."); } ``` @@ -466,11 +472,11 @@ void fun(double x) { immutable x = "asd", y = 42; mixin(i"int $x = $y;"); // Lowering ---> -// mixin(InterpolationHeader!("int ", "x", " = ", "y", ";")(), +// mixin(.object.imported!"core.interpolation".InterpolationHeader!("int ", "x", " = ", "y", ";")(), // "int ", x, " = ", y, ";"); auto z = mixin(i"$x + 5"); // Lowering ---> -// auto z = mixin(InterpolationHeader!("", "x", " + 5")(), +// auto z = mixin(.object.imported!"core.interpolation".InterpolationHeader!("", "x", " + 5")(), // "", x, " + 5"); ``` @@ -484,7 +490,7 @@ The interpolation header is ignored by `mixin`s. enum x = 42; pragma(msg, i"x = $x."); // Lowering ---> -// pragma(msg, InterpolationHeader!("x = ", "x", ".")(), +// pragma(msg, .object.imported!"core.interpolation".InterpolationHeader!("x = ", "x", ".")(), // "x = ", x, "."); ``` @@ -514,7 +520,7 @@ alias Calculator = Grammar!( Factor := $identifier | '(' Expression ')' " ); // Lowering: ---> -// alias Calculator = Grammar!(InterpolationHeader!(...)(), +// alias Calculator = Grammar!(.object.imported!"core.interpolation".InterpolationHeader!(...)(), // "Expression := Term + Term // Term := Factor * Factor // Factor := ", identifier, " | '(' Expression ')' " From 15e8b5a7aea1030924d3edeb231a87f58ad3698f Mon Sep 17 00:00:00 2001 From: "Adam D. Ruppe" Date: Wed, 18 Oct 2023 22:16:05 -0400 Subject: [PATCH 17/19] nitpicks --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index c58aa31..06f7889 100644 --- a/README.md +++ b/README.md @@ -260,7 +260,7 @@ To effect these rules, the lowering may introduce empty strings as follows. If an interpolated string starts with an escape sequence, an empty string is always introduced before it in the expansion. Example: ```D -writeln("$name, hi!"); +writeln(i"$name, hi!"); ``` is lowered to: @@ -272,7 +272,7 @@ writeln(__header, "", name, " hi!"); If an interpolated string ends with an escape sequence, an empty string is always introduced after it in the expansion. Example: ```D -writeln("Hello, world$exclamation"); +writeln(i"Hello, world$exclamation"); ``` is lowered to: @@ -284,7 +284,7 @@ writeln(__header, "Hello, world", exclamation, ""); Finally, if an `i`-string contains two consecutive expansions, the lowering will introduce an empty string literal in between. Example: ```D -writeln("Hello, $name$exclamation How are you?"); +writeln(i"Hello, $name$exclamation How are you?"); ``` is lowered to: @@ -296,13 +296,13 @@ writeln(__header, "Hello", name, "", exclamation, " How are you?"); These rules can apply simultaneously on the same `i`-string. Example: ```D -writeln("$greeting, $name$exclamation"); +writeln(i"$greeting, $name$exclamation"); ``` is lowered to: ```D -writeln(__header, "", $greeting, ", ", name, "", exclamation, ""); +writeln(__header, "", greeting, ", ", name, "", exclamation, ""); ``` The purpose of normalization, as can be seen in the given examples, is to always have the argument list in a fixed pattern: *header*, *string-literal*, *expression*, *string-literal*, *expression*, ..., *string-literal*. @@ -324,7 +324,7 @@ struct InterpolationHeader(_parts...) { The argument to the instantiation of `Header` is the interpolation string deconstructed into parts and normalized. Example: ```D -writeln("$greeting, $name$exclamation"); +writeln(i"$greeting, $name$exclamation"); ``` is lowered to: @@ -334,7 +334,7 @@ is lowered to: // is to ensure the header name cannot be overridden by // any user type and is thus uniquely identified writeln(.object.imported!"core.interpolation".InterpolationHeader!("", "greeting", ", ", "name", "", "exclamation", "")(), - "", $greeting, ", ", name, "", exclamation, ""); + "", greeting, ", ", name, "", exclamation, ""); ``` Note how the header gives the callee complete access to the strings corresponding to the expressions passed in. From 2ae3195f875805de213f96adbb0fcbafe599b1c8 Mon Sep 17 00:00:00 2001 From: "Adam D. Ruppe" Date: Sun, 22 Oct 2023 13:30:48 -0400 Subject: [PATCH 18/19] final update --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 06f7889..a06e3e2 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,16 @@ # String Interpolation +This proposal is obsolete. See this dmd PR instead: + +https://github.com/dlang/dmd/pull/15715 + | Field | Value | |-----------------|-------------------------------------------------------------------| | DIP: | xxxx | | Review Count: | 0 | | Author: | Andrei Alexandrescu
John Colvin john.loughran.colvin@gmail.com | | Implementation: | | -| Status: | | +| Status: Obsolete | | ## Abstract From b8190c30dc4fe147ce8f87216537f9873f0a6e0c Mon Sep 17 00:00:00 2001 From: "Adam D. Ruppe" Date: Sun, 22 Oct 2023 13:31:24 -0400 Subject: [PATCH 19/19] final update --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a06e3e2..a0d9495 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ https://github.com/dlang/dmd/pull/15715 | Review Count: | 0 | | Author: | Andrei Alexandrescu
John Colvin john.loughran.colvin@gmail.com | | Implementation: | | -| Status: Obsolete | | +| Status: | Obsolete | ## Abstract