Discussion:
[protobuf] [Proto2] Language spec help
Michael Powell
2018-10-31 16:07:15 UTC
Permalink
Hello,

I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
please:

In the "String literals" section, what does this mean:

charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/

Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?

Thanks much in advance!

Best regards,

Michael Powell
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Michael Powell
2018-10-31 16:22:14 UTC
Permalink
Concerning Constant, literally from the v2 spec:

syntax = "syntax" "=" quote "proto2" quote ";"

Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?

If accurate, that just seems to me to be lazy spec authorship...

Thanks!
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Michael Powell
2018-10-31 16:23:04 UTC
Permalink
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
'Adam Cozzette' via Protocol Buffers
2018-10-31 17:17:30 UTC
Permalink
I think that specification has suffered a little bit of neglect (sorry
about that), because in practice our C++ parser is really the de facto
standard and we have not recently made an effort to go through and make
sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.

By the way, have you considered just reusing the C++ parser that's included
in protoc? You can call protoc with the --descriptor_set_out flag to have
it parse your .proto file and produce a serialized FileDescriptorSet proto
as output. Then at that point it's easy to parse the descriptors using just
about any language we support, and that should give you all the information
you need, without the need for a new .proto file parser.
Post by Michael Powell
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Michael Powell
2018-10-31 17:26:34 UTC
Permalink
I think that specification has suffered a little bit of neglect (sorry about that), because in practice our C++ parser is really the de facto standard and we have not recently made an effort to go through and make sure the official spec matches it perfectly. My reading of that string (/[^\0\n\\]/) is that it's a regular expression saying "any character other than null, newline, or backslash." But in general I would say the best bet is to resolve ambiguities by looking at what the C++ parser does.
Thanks for that bit of clarification.
By the way, have you considered just reusing the C++ parser that's included in protoc? You can call protoc with the --descriptor_set_out flag to have it parse your .proto file and produce a serialized FileDescriptorSet proto as output. Then at that point it's easy to parse the descriptors using just about any language we support, and that should give you all the information you need, without the need for a new .proto file parser.
Good point, it's a possibility? Does it protoc to C#? It would be less
difficult, I think, for me to Reflect through that and generate the
boilerplate that I want, probably, than spinning up a full on parser
replete with its AST.
Post by Michael Powell
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
'Adam Cozzette' via Protocol Buffers
2018-10-31 17:57:21 UTC
Permalink
+Jie Luo <***@google.com> who knows the most about C#

The one thing that might be a problem is that C# does not yet have full
support for proto2, though that work is in progress (see this most recent pull
request <https://github.com/protocolbuffers/protobuf/pull/5183>). That
could make it hard to parse the descriptor protos from C#, since
descriptor.proto is a proto2 file. On the other hand, if you're using
proto3 you could also forget about parsing the descriptors yourself but
just use protobuf reflection to examine your messages within a C# program.

When you say you want to generate boilerplate, do you mean that you want to
generate C# code at build time? If so then another option is to just use
another language like C++ or Java to output your C# code.
Post by 'Adam Cozzette' via Protocol Buffers
Post by 'Adam Cozzette' via Protocol Buffers
I think that specification has suffered a little bit of neglect (sorry
about that), because in practice our C++ parser is really the de facto
standard and we have not recently made an effort to go through and make
sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.
Thanks for that bit of clarification.
Post by 'Adam Cozzette' via Protocol Buffers
By the way, have you considered just reusing the C++ parser that's
included in protoc? You can call protoc with the --descriptor_set_out flag
to have it parse your .proto file and produce a serialized
FileDescriptorSet proto as output. Then at that point it's easy to parse
the descriptors using just about any language we support, and that should
give you all the information you need, without the need for a new .proto
file parser.
Good point, it's a possibility? Does it protoc to C#? It would be less
difficult, I think, for me to Reflect through that and generate the
boilerplate that I want, probably, than spinning up a full on parser
replete with its AST.
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
On Wed, Oct 31, 2018 at 12:07 PM Michael Powell <
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say
that
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
Post by Michael Powell
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google
Groups "Protocol Buffers" group.
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
'Adam Cozzette' via Protocol Buffers
2018-10-31 21:19:18 UTC
Permalink
I'm not sure how soon we can expect the proto2 support in C#. Jie, do you
happen to know how close it is to being complete?

But if you're just trying to generate C# code at build time, I was thinking
you could also use for example a C++ binary to generate the C# code,
without needing SWIG or anything.

It would go something like this:
- Run "protoc --descriptor_set_out=... path/to/my.proto > descriptors.out"
- Run your C++ program which reads in the descriptors and generates C# code
- Then build the C# project as usual
Post by 'Adam Cozzette' via Protocol Buffers
+Jie Luo who knows the most about C#
The one thing that might be a problem is that C# does not yet have full
support for proto2, though that work is in progress (see this most recent
pull request). That could make it hard to parse the descriptor protos from
C#, since descriptor.proto is a proto2 file. On the other hand, if you're
using proto3 you could also forget about parsing the descriptors yourself
but just use protobuf reflection to examine your messages within a C#
program.
Herein lies the rub... The proto I am dealing with is v2 at the
moment, yes. How soon do you think the PR would be verified, accepted,
etc?
When you say you want to generate boilerplate, do you mean that you want
to generate C# code at build time? If so then another option is to just use
another language like C++ or Java to output your C# code.
Exactly. I am working from a .proto file and want to output some
non-message adapter type stuff for a Google.OrTools C# adapter I'm
working on, which uses this internally. I could churn through the
.proto manually by hand, but I figure this is a prime candidate for a
build-time code generation approach, particularly with adequate proto
buffer spec comprehension.
In terms of introducing another language, you mean with something like
SWIG comprehension? If it's fairly seamless, I may consider that, as
Proto2 -> C++ -> SWIG -> C# -> Code Gen
That's a bit of a pipeline, but it may be doable.
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
I think that specification has suffered a little bit of neglect
(sorry about that), because in practice our C++ parser is really the de
facto standard and we have not recently made an effort to go through and
make sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.
Post by Michael Powell
Thanks for that bit of clarification.
Post by 'Adam Cozzette' via Protocol Buffers
By the way, have you considered just reusing the C++ parser that's
included in protoc? You can call protoc with the --descriptor_set_out flag
to have it parse your .proto file and produce a serialized
FileDescriptorSet proto as output. Then at that point it's easy to parse
the descriptors using just about any language we support, and that should
give you all the information you need, without the need for a new .proto
file parser.
Post by Michael Powell
Good point, it's a possibility? Does it protoc to C#? It would be less
difficult, I think, for me to Reflect through that and generate the
boilerplate that I want, probably, than spinning up a full on parser
replete with its AST.
Post by 'Adam Cozzette' via Protocol Buffers
On Wed, Oct 31, 2018 at 12:22 PM Michael Powell <
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or
"proto2",
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
On Wed, Oct 31, 2018 at 12:07 PM Michael Powell <
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification
starting
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
with v2. I need a little help interpreting one of the lines if
you
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say
that
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
there are escaped characters in the sequence? Or am I to take
that
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
string literally? Or notwithstanding the enclosing forward
slashes?
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google
Groups "Protocol Buffers" group.
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
To unsubscribe from this group and stop receiving emails from it,
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
'Jie Luo' via Protocol Buffers
2018-10-31 22:56:55 UTC
Permalink
Post by 'Adam Cozzette' via Protocol Buffers
I'm not sure how soon we can expect the proto2 support in C#. Jie, do you
happen to know how close it is to being complete?
I don't think it will be very soon. We do not have people in google working
on C# (I only review PRs). Only one person from third party working on
proto2 support. So far, field presence has been submitted and group is
ongoing. extension and default value are still missing
Post by 'Adam Cozzette' via Protocol Buffers
But if you're just trying to generate C# code at build time, I was
thinking you could also use for example a C++ binary to generate the C#
code, without needing SWIG or anything.
- Run "protoc --descriptor_set_out=... path/to/my.proto > descriptors.out"
- Run your C++ program which reads in the descriptors and generates C# code
- Then build the C# project as usual
Post by 'Adam Cozzette' via Protocol Buffers
+Jie Luo who knows the most about C#
The one thing that might be a problem is that C# does not yet have full
support for proto2, though that work is in progress (see this most recent
pull request). That could make it hard to parse the descriptor protos from
C#, since descriptor.proto is a proto2 file. On the other hand, if you're
using proto3 you could also forget about parsing the descriptors yourself
but just use protobuf reflection to examine your messages within a C#
program.
Herein lies the rub... The proto I am dealing with is v2 at the
moment, yes. How soon do you think the PR would be verified, accepted,
etc?
When you say you want to generate boilerplate, do you mean that you
want to generate C# code at build time? If so then another option is to
just use another language like C++ or Java to output your C# code.
Exactly. I am working from a .proto file and want to output some
non-message adapter type stuff for a Google.OrTools C# adapter I'm
working on, which uses this internally. I could churn through the
.proto manually by hand, but I figure this is a prime candidate for a
build-time code generation approach, particularly with adequate proto
buffer spec comprehension.
In terms of introducing another language, you mean with something like
SWIG comprehension? If it's fairly seamless, I may consider that, as
Proto2 -> C++ -> SWIG -> C# -> Code Gen
That's a bit of a pipeline, but it may be doable.
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
I think that specification has suffered a little bit of neglect
(sorry about that), because in practice our C++ parser is really the de
facto standard and we have not recently made an effort to go through and
make sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.
Post by Michael Powell
Thanks for that bit of clarification.
Post by 'Adam Cozzette' via Protocol Buffers
By the way, have you considered just reusing the C++ parser that's
included in protoc? You can call protoc with the --descriptor_set_out flag
to have it parse your .proto file and produce a serialized
FileDescriptorSet proto as output. Then at that point it's easy to parse
the descriptors using just about any language we support, and that should
give you all the information you need, without the need for a new .proto
file parser.
Post by Michael Powell
Good point, it's a possibility? Does it protoc to C#? It would be less
difficult, I think, for me to Reflect through that and generate the
boilerplate that I want, probably, than spinning up a full on parser
replete with its AST.
Post by 'Adam Cozzette' via Protocol Buffers
On Wed, Oct 31, 2018 at 9:23 AM Michael Powell <
On Wed, Oct 31, 2018 at 12:22 PM Michael Powell <
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or
"proto2",
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
On Wed, Oct 31, 2018 at 12:07 PM Michael Powell <
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification
starting
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
with v2. I need a little help interpreting one of the lines if
you
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to
say that
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
there are escaped characters in the sequence? Or am I to take
that
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
string literally? Or notwithstanding the enclosing forward
slashes?
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Post by Michael Powell
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google
Groups "Protocol Buffers" group.
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
To unsubscribe from this group and stop receiving emails from it,
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Michael Powell
2018-11-02 22:44:27 UTC
Permalink
Post by 'Adam Cozzette' via Protocol Buffers
I think that specification has suffered a little bit of neglect (sorry
about that), because in practice our C++ parser is really the de facto
standard and we have not recently made an effort to go through and make
sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.
I have a bit of an AST problem, I wonder if you have some C++ insight to
help sort it out. If I translate the grammar on its face to an AST via the
parser rules, I end up with a MessageBody that must be forward declared, in
C++ parlance, for use in both Group and Message. Does that sound about
right? There are probably ways around that, such as introducing AST
pointers, things of this nature; a bit beyond the scope of this discussion,
but, assuming Boost Spirit would be able to handle that? (TBD via
Boost/Spirit forums.)

Provided my internal AST works, that's really all I need, to navigate it
and generate code given that.

By the way, have you considered just reusing the C++ parser that's included
Post by 'Adam Cozzette' via Protocol Buffers
in protoc? You can call protoc with the --descriptor_set_out flag to have
it parse your .proto file and produce a serialized FileDescriptorSet proto
as output. Then at that point it's easy to parse the descriptors using just
about any language we support, and that should give you all the information
you need, without the need for a new .proto file parser.
I'm considering this as an approach, but I am a bit confused as to the
starting point. It seems like it all starts "compiling" the proto .protos
into the plugin implementation language, i.e. C++? Or perhaps I am missing
something? Also, do I need to have the source aligned with the protoc
version also readily available? i.e. with requisite libs, includes, etc?
Seems like possibly yes.

Getting past the sort of "getting started" baby steps, it seems
straightforward enough to receive the compiler request, navigate the protoc
metadata API, and return a response. It's the getting past part that I'm
not exactly certain what the starting point really is.
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Michael Powell
2018-11-02 22:45:32 UTC
Permalink
Post by Michael Powell
Post by 'Adam Cozzette' via Protocol Buffers
I think that specification has suffered a little bit of neglect (sorry
about that), because in practice our C++ parser is really the de facto
standard and we have not recently made an effort to go through and make
sure the official spec matches it perfectly. My reading of that string
(/[^\0\n\\]/) is that it's a regular expression saying "any character other
than null, newline, or backslash." But in general I would say the best bet
is to resolve ambiguities by looking at what the C++ parser does.
I have a bit of an AST problem, I wonder if you have some C++ insight to
help sort it out. If I translate the grammar on its face to an AST via the
parser rules, I end up with a MessageBody that must be forward declared, in
C++ parlance, for use in both Group and Message. Does that sound about
right? There are probably ways around that, such as introducing AST
pointers, things of this nature; a bit beyond the scope of this discussion,
but, assuming Boost Spirit would be able to handle that? (TBD via
Boost/Spirit forums.)
FYI, https://wandbox.org/permlink/WeRqkmDR93Wqu8BI
Post by Michael Powell
Provided my internal AST works, that's really all I need, to navigate it
and generate code given that.
By the way, have you considered just reusing the C++ parser that's
Post by 'Adam Cozzette' via Protocol Buffers
included in protoc? You can call protoc with the --descriptor_set_out flag
to have it parse your .proto file and produce a serialized
FileDescriptorSet proto as output. Then at that point it's easy to parse
the descriptors using just about any language we support, and that should
give you all the information you need, without the need for a new .proto
file parser.
I'm considering this as an approach, but I am a bit confused as to the
starting point. It seems like it all starts "compiling" the proto .protos
into the plugin implementation language, i.e. C++? Or perhaps I am missing
something? Also, do I need to have the source aligned with the protoc
version also readily available? i.e. with requisite libs, includes, etc?
Seems like possibly yes.
Getting past the sort of "getting started" baby steps, it seems
straightforward enough to receive the compiler request, navigate the protoc
metadata API, and return a response. It's the getting past part that I'm
not exactly certain what the starting point really is.
Post by 'Adam Cozzette' via Protocol Buffers
Post by Michael Powell
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google
Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Josh Humphries
2018-11-01 15:15:04 UTC
Permalink
Re: syntax: It really means string literal, whose value must be "proto2" or
"proto3".

Looks like it's been cleaned up, but not long ago it also had an incorrect
definition for the service syntax, suggesting an alternate way to define
streaming methods that protoc did not actually support.

Anyhow, I've gone through the exercise of writing a parser in Go
<https://godoc.org/github.com/jhump/protoreflect/desc/protoparse> and
learned a lot. The parser I wrote leverages yacc, which may provide a
useful starting point for creating a parser in C#:
https://github.com/jhump/protoreflect/blob/master/desc/protoparse/proto.y

I know the code in the productions is all Go, but the layout of the grammar
is more or less right. (I have tests that load a variety of proto files,
with a variety of language features therein, and makes sure that my package
produces equivalent descriptors to protoc.)

----
*Josh Humphries*
Post by Michael Powell
Rather, Syntax section, excuse me...
Post by Michael Powell
syntax = "syntax" "=" quote "proto2" quote ";"
Do I read that correctly you can expect either 'proto2' or "proto2",
but never 'proto2" nor "proto2' ?
If accurate, that just seems to me to be lazy spec authorship...
Thanks!
Post by Michael Powell
Hello,
I am writing a parser for the Proto language specification starting
with v2. I need a little help interpreting one of the lines if you
charValue = hexEscape | octEscape | charEscape | /[^\0\n\\]/
Specifically, the trailing list of character soup? I want to say that
there are escaped characters in the sequence? Or am I to take that
string literally? Or notwithstanding the enclosing forward slashes?
Thanks much in advance!
Best regards,
Michael Powell
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Loading...