Discussion:
[protobuf] Proto v2 ambiguities in parsed identifiers
Michael Powell
2018-11-29 14:14:37 UTC
Permalink
Hello,

In the Protocol Buffer v2 Field specification there is a notion of
messageType and enumType. Casually reading the specification, it is
easy to miss this, but there is nothing there to tell anyone whether a
messageType or an enumType was encountered, except perhaps a second
pass verification.

Which brings me to the second question, how does one interpret the
identifiers involved in the type specification? Also, what's with the
leading dot? Please explain. Thanks!

Also, minor question, is the trailing semi-colon a typo? i.e.
"fieldNumber = intLit;". I gather that perhaps it is a typo but I
wanted to clarify that as well.

// Identifiers
ident = letter { letter | decimalDigit | "_" }
// ...
messageName = ident
enumName = ident
// ...
messageType = [ "." ] { ident "." } messageName
enumType = [ "." ] { ident "." } enumName
// ...
// Field
label = "required" | "optional" | "repeated"
type = "double" | "float" | "int32" | "int64" | "uint32" | "uint64" |
"sint32" | "sint64" | "fixed32" | "fixed64" | "sfixed32" | "sfixed64"
| "bool" | "string" | "bytes" | messageType | enumType
fieldNumber = intLit; // <- is the trailing semicolon here a typo?

http://developers.google.com/protocol-buffers/docs/reference/proto2-spec

Best,

Michael Powell
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Josh Humphries
2018-11-29 18:07:40 UTC
Permalink
You know it's a message vs. enum (or invalid identifier) only after linking.

To link, you need the file's entire transitive closure (e.g. its imports
and their imports and so on) already parsed (and, preferably, already
linked). You can then *resolve* the type name, basically using C++ scoping
rules. The leading dot tells it that the identifier is fully-qualified --
so you don't need to worry about scoping rules (which allows for
partially-qualified or unqualified names) but instead search for the symbol
by full name. After you have resolved the name to an element, you then know
the element's type -- whether it was a message or an enum (or if it was
invalid, such as no matching element or referring to a service or method
instead of a type).

An element's full name is the file package (if any), all enclosing element
names (note that in an "extend" block, the extendee is *not* considered an
element name: extend blocks have no name), and then item itself.

Example:

syntax = "proto3";
package foo.bar;
import "google/protobuf/descriptor.proto";

extend google.protobuf.MessageOptions {
string some_custom_option = 9999;* // fqn = foo.bar.some_custom_option*
}

enum SomeEnum {* // fqn = foo.bar.SomeEnum*
VAL0 = 0;* // fqn = foo.bar.SomeEnum.VAL0*
}

message SomeMsg {* // fqn = foo.bar.SomeMsg*
string name = 1;* // fqn = foo.bar.SomeMsg.name
<http://foo.bar.SomeMsg.name>*

*// refers to top-level SomeEnum above*
.foo.bar.SomeEnum enum2 = 3; *// fqn = foo.bar.SomeMsg.enum2*

*// refers to nested SomeEnum below*
SomeEnum enum = 2; *// fqn = foo.bar.SomeMsg.enum*

enum SomeEnum { *// fqn = foo.bar.SomeMsg.SomeEnum*
VAL0 = 0; *// fqn = foo.bar.SomeMsg.SomeEnum.VAL0*
}

* // In this scope, can refer to SomeEnum w/out qualifier, as SomeEnum.*
* // Could also use partial-qualifiers, like SomeMsg.SomeEnum,
bar.SomeMsg.SomeEnum,*
* // or foo.bar.SomeMsg.SomeEnum. **That last one is actually
fully-qualified, but protoc*
* // doesn't know that, so it must use same scope rules to resolve symbol
name. However,*
* // using .foo.bar.SomeMsg.SomeEnum (leading dot) means the current
lexical scope*
* // doesn't matter, so symbol resolution can take "short cut".*
* //*
* // As you can see, compound package names work like** nested C++
namespaces, not like*
* // Java or .NET package names.*

message NestedMsg { *// fqn = foo.bar.SomeMsg.NestedMsg*
extends google.protobuf.FieldOption {
uint64 some_other_option = 9999; *// fqn =
foo.bar.SomeMsg.NestedMsg.some_other_option*
}
}
}

----
*Josh Humphries*
Post by Michael Powell
Hello,
In the Protocol Buffer v2 Field specification there is a notion of
messageType and enumType. Casually reading the specification, it is
easy to miss this, but there is nothing there to tell anyone whether a
messageType or an enumType was encountered, except perhaps a second
pass verification.
Which brings me to the second question, how does one interpret the
identifiers involved in the type specification? Also, what's with the
leading dot? Please explain. Thanks!
Also, minor question, is the trailing semi-colon a typo? i.e.
"fieldNumber = intLit;". I gather that perhaps it is a typo but I
wanted to clarify that as well.
// Identifiers
ident = letter { letter | decimalDigit | "_" }
// ...
messageName = ident
enumName = ident
// ...
messageType = [ "." ] { ident "." } messageName
enumType = [ "." ] { ident "." } enumName
// ...
// Field
label = "required" | "optional" | "repeated"
type = "double" | "float" | "int32" | "int64" | "uint32" | "uint64" |
"sint32" | "sint64" | "fixed32" | "fixed64" | "sfixed32" | "sfixed64"
| "bool" | "string" | "bytes" | messageType | enumType
fieldNumber = intLit; // <- is the trailing semicolon here a typo?
http://developers.google.com/protocol-buffers/docs/reference/proto2-spec
Best,
Michael Powell
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Michael Powell
2018-11-29 18:35:10 UTC
Permalink
Post by Josh Humphries
You know it's a message vs. enum (or invalid identifier) only after linking.
Thanks very much for the clarification. I sort of figured that was the case.
Post by Josh Humphries
To link, you need the file's entire transitive closure (e.g. its imports and their imports and so on) already parsed (and, preferably, already linked). You can then resolve the type name, basically using C++ scoping rules. The leading dot tells it that the identifier is fully-qualified -- so you don't need to worry about scoping rules (which allows for partially-qualified or unqualified names) but instead search for the symbol by full name. After you have resolved the name to an element, you then know the element's type -- whether it was a message or an enum (or if it was invalid, such as no matching element or referring to a service or method instead of a type).
An element's full name is the file package (if any), all enclosing element names (note that in an "extend" block, the extendee is not considered an element name: extend blocks have no name), and then item itself.
syntax = "proto3";
package foo.bar;
import "google/protobuf/descriptor.proto";
extend google.protobuf.MessageOptions {
string some_custom_option = 9999; // fqn = foo.bar.some_custom_option
}
enum SomeEnum { // fqn = foo.bar.SomeEnum
VAL0 = 0; // fqn = foo.bar.SomeEnum.VAL0
}
message SomeMsg { // fqn = foo.bar.SomeMsg
string name = 1; // fqn = foo.bar.SomeMsg.name
// refers to top-level SomeEnum above
.foo.bar.SomeEnum enum2 = 3; // fqn = foo.bar.SomeMsg.enum2
// refers to nested SomeEnum below
SomeEnum enum = 2; // fqn = foo.bar.SomeMsg.enum
enum SomeEnum { // fqn = foo.bar.SomeMsg.SomeEnum
VAL0 = 0; // fqn = foo.bar.SomeMsg.SomeEnum.VAL0
}
// In this scope, can refer to SomeEnum w/out qualifier, as SomeEnum.
// Could also use partial-qualifiers, like SomeMsg.SomeEnum, bar.SomeMsg.SomeEnum,
// or foo.bar.SomeMsg.SomeEnum. That last one is actually fully-qualified, but protoc
// doesn't know that, so it must use same scope rules to resolve symbol name. However,
// using .foo.bar.SomeMsg.SomeEnum (leading dot) means the current lexical scope
// doesn't matter, so symbol resolution can take "short cut".
//
// As you can see, compound package names work like nested C++ namespaces, not like
// Java or .NET package names.
message NestedMsg { // fqn = foo.bar.SomeMsg.NestedMsg
extends google.protobuf.FieldOption {
uint64 some_other_option = 9999; // fqn = foo.bar.SomeMsg.NestedMsg.some_other_option
}
}
}
----
Josh Humphries
Post by Michael Powell
Hello,
In the Protocol Buffer v2 Field specification there is a notion of
messageType and enumType. Casually reading the specification, it is
easy to miss this, but there is nothing there to tell anyone whether a
messageType or an enumType was encountered, except perhaps a second
pass verification.
Which brings me to the second question, how does one interpret the
identifiers involved in the type specification? Also, what's with the
leading dot? Please explain. Thanks!
Also, minor question, is the trailing semi-colon a typo? i.e.
"fieldNumber = intLit;". I gather that perhaps it is a typo but I
wanted to clarify that as well.
// Identifiers
ident = letter { letter | decimalDigit | "_" }
// ...
messageName = ident
enumName = ident
// ...
messageType = [ "." ] { ident "." } messageName
enumType = [ "." ] { ident "." } enumName
// ...
// Field
label = "required" | "optional" | "repeated"
type = "double" | "float" | "int32" | "int64" | "uint32" | "uint64" |
"sint32" | "sint64" | "fixed32" | "fixed64" | "sfixed32" | "sfixed64"
| "bool" | "string" | "bytes" | messageType | enumType
fieldNumber = intLit; // <- is the trailing semicolon here a typo?
http://developers.google.com/protocol-buffers/docs/reference/proto2-spec
Best,
Michael Powell
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Loading...