[protobuf] [Proto2] Language spec help

Discussion:

Michael Powell

2018-10-31 16:07:15 UTC

Hello,

I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
please:

In the "String literals" section, what does this mean:

charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/

Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?

Thanks much in advance!

Best regards,

Michael Powell

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Michael Powell

2018-10-31 16:22:14 UTC

Permalink

Concerning Constant, literally from the v2 spec:

syntax = "syntax" "=" quote "proto2" quote ";"

Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?

If accurate, that just seems to me to be lazy spec authorship...

Thanks!

Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell

Michael Powell

2018-10-31 16:23:04 UTC

Permalink

Rather, Syntax section, excuse me...

Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!

'Adam Cozzette' via Protocol Buffers

2018-10-31 17:17:30 UTC

Permalink

I think that specification has suffered a little bit of neglect (sorry
about that), because in practice our C++ parser is really the de facto
standard and we have not recently made an effort to go through and make
sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.

By the way, have you considered just reusing the C++ parser that's included
in protoc? You can call protoc with the --descriptor_set_out flag to have
it parse your .proto file and produce a serialized FileDescriptorSet proto
as output. Then at that point it's easy to parse the descriptors using just
about any language we support, and that should give you all the information
you need, without the need for a new .proto file parser.

Post by Michael Powell
Rather, Syntax section, excuse me...

--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Michael Powell

2018-10-31 17:26:34 UTC

Permalink

Thanks for that bit of clarification.

By the way, have you considered just reusing the C++ parser that's included in protoc? You can call protoc with the --descriptor_set_out flag to have it parse your .proto file and produce a serialized FileDescriptorSet proto as output. Then at that point it's easy to parse the descriptors using just about any language we support, and that should give you all the information you need, without the need for a new .proto file parser.

Good point, it's a possibility? Does it protoc to C#? It would be less
difficult, I think, for me to Reflect through that and generate the
boilerplate that I want, probably, than spinning up a full on parser
replete with its AST.

Post by Michael Powell
Rather, Syntax section, excuse me...

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

'Adam Cozzette' via Protocol Buffers

2018-10-31 17:57:21 UTC

Permalink

+Jie Luo <***@google.com> who knows the most about C#

The one thing that might be a problem is that C# does not yet have full
support for proto2, though that work is in progress (see this most recent pull
request <https://github.com/protocolbuffers/protobuf/pull/5183>). That
could make it hard to parse the descriptor protos from C#, since
descriptor.proto is a proto2 file. On the other hand, if you're using
proto3 you could also forget about parsing the descriptors yourself but
just use protobuf reflection to examine your messages within a C# program.

When you say you want to generate boilerplate, do you mean that you want to
generate C# code at build time? If so then another option is to just use
another language like C++ or Java to output your C# code.

Post by 'Adam Cozzette' via Protocol Buffers

Post by 'Adam Cozzette' via Protocol Buffers
I think that specification has suffered a little bit of neglect (sorry

about that), because in practice our C++ parser is really the de facto
standard and we have not recently made an effort to go through and make
sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.
Thanks for that bit of clarification.

Post by 'Adam Cozzette' via Protocol Buffers
By the way, have you considered just reusing the C++ parser that's

included in protoc? You can call protoc with the --descriptor_set_out flag
to have it parse your .proto file and produce a serialized
FileDescriptorSet proto as output. Then at that point it's easy to parse
the descriptors using just about any language we support, and that should
give you all the information you need, without the need for a new .proto
file parser.
Good point, it's a possibility? Does it protoc to C#? It would be less
difficult, I think, for me to Reflect through that and generate the
boilerplate that I want, probably, than spinning up a full on parser
replete with its AST.

Post by 'Adam Cozzette' via Protocol Buffers

Post by Michael Powell
Rather, Syntax section, excuse me...

that