Skip to content
This repository was archived by the owner on Oct 20, 2024. It is now read-only.

forgot about strings#6

Open
heavyk wants to merge 2 commits into
blvz:masterfrom
heavyk:another-backtick-update
Open

forgot about strings#6
heavyk wants to merge 2 commits into
blvz:masterfrom
heavyk:another-backtick-update

Conversation

@heavyk

@heavyk heavyk commented Jul 22, 2015

Copy link
Copy Markdown

this fixes the case console.log \'lala, a-variable

@heavyk

heavyk commented Jul 22, 2015

Copy link
Copy Markdown
Author

oh yeah, and ampersands too

@heavyk

heavyk commented Jul 29, 2015

Copy link
Copy Markdown
Author

just pushed another update for ) livescript does not inlcude it in backticks

eg. this is correct
screen shot 2015-07-30 at 00 18 50

@heavyk

heavyk commented Jul 29, 2015

Copy link
Copy Markdown
Author

well, shit the weird thing is this:

# fine
console.log \)
# also fine
console.log \)lala
# syntax error: unexpected ')'
console.log \lala)lala

@gabeio

gabeio commented Jul 29, 2015

Copy link
Copy Markdown

is the syntax error from livescript(compile) or atom(parser)?

@heavyk

heavyk commented Jul 29, 2015

Copy link
Copy Markdown
Author

it's from livescript.
appears that, if the first char is an ')' it's accepted..

@gabeio

gabeio commented Jul 29, 2015

Copy link
Copy Markdown

yeah that's so you can do something like:

console.log(\asdf\))
console.log(\asdf)

@heavyk

heavyk commented Jul 29, 2015

Copy link
Copy Markdown
Author

ok, I got it (kinda) but I want to simplify the regex. this is what I have now:

match: "\\\\[\\w\\W][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*"

do you know how to do a regex which basically says: any char after the \ \\\\[\\w\\W] but any subsequent chars can be anything except for a ')' ???

seems we could simplify the above mess to two rules (otherwise I'd have to add an exception for every unicode char -- cause for example both of these compile fine:

screen shot 2015-07-30 at 01 00 58

// Generated by LiveScript 1.4.0
(function(){
  console.log('this', 'is', 'livescript');
  console.log(yay);
  console.log(')');
  console.log(')lal%&*(!@§a');
  console.log(')hello:£¢€°·‚‚Ƨl%&*(!@§a');
}).call(this);

@heavyk

heavyk commented Jul 29, 2015

Copy link
Copy Markdown
Author

my wording sucks. sorry bout that. do you know a regex which will match any letter except for ')' ??

@gabeio

gabeio commented Jul 29, 2015

Copy link
Copy Markdown

usually a . means any character (not sure about this version of regex) and as for anything but you can do a (?!\)) meaning can't match this group(which only is )) so try something like:

match: "\\\\[\\.][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*(?!\\))"

but I am not sure if that works... because the \\. I changed...

@heavyk

heavyk commented Jul 29, 2015

Copy link
Copy Markdown
Author

they look to be compiled RegExp ... so I'm testing them in the console like this:

var r = new RegExp("\\\\[\\w\\W][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\:\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*")
'\\)hello:£¢€°·‚‚Ƨl%&*(!@§a'.match(r)

ok, gonna try your suggestion

@heavyk

heavyk commented Jul 29, 2015

Copy link
Copy Markdown
Author

I can't get it to work. according to this comment ... http://stackoverflow.com/questions/6851921/negative-lookahead-regular-expression#comment8148005_6851958

I would need to know the whole line. (^ ... $) for that technique to work in js ... I dunno if that's even right. this is way over my head right now. I honestly just learned about negative look-ahead ...

@gabeio do see an easy way for this:

'\\)hello:§()'.match(new RegExp("\\\\[\\w\\W][\\$\\.\\/\\%\\^\\@\\#\\&\\*\\:\\'\\\"\\!\\=\\+\\[\\]\\(\\{\\}\\<\\>\\w-]*"))
["\)hello:"]
// to become this: ???
["\)hello:§("]

for now, I'm giving up :/

@98devin

98devin commented Feb 13, 2016

Copy link
Copy Markdown

I know this is a really old topic, but I think there's a simple solution. Rather than use a character class whitelisting acceptable characters, blacklist the bad ones.
That's done in general by using [^ insert chars here] where the ^ character means everything NOT in the class when put at the beginning.

That said, this works in the engine javascript uses at least:

'\\)hello:§()'.match /\\[\w\W][^\)\]\s]*/  #=> '\\)hello:§('

I'm not sure if this regex is foolproof though, or if it will work here, but it's likely.

@heavyk

heavyk commented Feb 13, 2016

Copy link
Copy Markdown
Author

well, either way, this version is a huge improvement on what's published in apm. I'll probably revisit this though, because the other day I had strange formatting.

either way, I want to figure out how to use LS's tokenizer directly instead of using regexp.

@98devin

98devin commented Feb 13, 2016

Copy link
Copy Markdown

Interesting idea; do any other syntax plugins on apm use their own engine? I just wonder how complicated that would be to set up.

As for other backslash string problems, they currently don't have the right priority since any # character inside will begin a comment...

Is the priority just based on the ordering in the file? If so that's an easy fix probably.

@heavyk

heavyk commented Feb 13, 2016

Copy link
Copy Markdown
Author

Interesting idea; do any other syntax plugins on apm use their own engine?

I looked a while back and didn't see any, so that doesn't mean it doesn't exist. if not raise an issue on atom's tracker asking how it could be done.

Is the priority just based on the ordering in the file?

I don't remember right now. I just remember how complicated it was, and since I have little real knowledge of regexp that's what forced me to see if I could implement the existing tokenizer

@98devin

98devin commented Feb 14, 2016

Copy link
Copy Markdown

I think it might be a good idea to look through all the regexes used in the grammar for redundancies and things to improve because of problems like this, even more so because the current available package conflicts with the language definitions (such as allowing ] and ) anywhere in a backslash string).

I couldn't find a good source on what engine Atom uses for regex, but it seems to be either javascript's or something called oniguruma. In any case they should be similar for the most part, so I'll try to understand the project as it is now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants