Updated URL regex for hashtags to disregard hashes used mid-sentence. #2361
Summary
Closes #2361
Simply, users would have one of their 5 hashtags used if posting a url with a hash in the query string, for example the invalid URL https://www.minds.com/#minds
The changed regex matches when:
- The first character is a space, or the start of a line.
- Followed by a hash.
- Followed atleast one word character
#Lorem#ipsum #dolor.sit amet, #consectetur adipiscing e#lit, https://loremipsum.io/#123#sed #do #eiusmod #tempo1=incididunt #132323 #@ #w #[ #] ' #word \ #labore@et dolore #magna aliqua. #Ut #enim #ad .minim #veniam, ^ #quis
#nostrud exercitation ullamco #laboris nisi
ut #aliquip ex ea commodo#consequat. Duis ()'#aute sad'#irure dolor
#in reprehen;'
#derit in@#voluptate velit essehttp: ci#https://loremipsum.io/#123llum dolore https://loremipsum.io/#123 https://loremipsum.io/#eu https://loremipsum.io/#fugiat https://loremipsum.io/#123 nulla pariatur. #Excepteur #sint #occaecat cupidatat
#non
#proident,
a #sunt in culpa qui officia deserunt mollit anim id est laborum.
Steps to test
This one is a bit more "get in there and try your best to break it" in terms of testing, but that said, to get a foot in the door:
- Go on the site, make a post with 5 hashtags; try to trip the Regex up and make some invalid tags to see how it responds e.g. hello#world, @#hello.
- Try a URL with a hashtag in.
- Check the hashtags links go through to the correct place.
- Try to use the hashtag selector to bypass the limit
- Ensure when you add a tag it shows up in the selector
Estimated Regression Scope
The only thing making me pause to question is the old Regex goes like this/: /(^|\s||)#(\w+)/gim,
. It could be the source of a bug, or it could be a Regex trick I'm missing.
Take the first group (^|\s||)
- to me that looks like a bug. I'm not aware of any double OR operand in regex nor can I find a reference for it.
This appears to stop the \s
from being matched, and I presume also the ^
. Thus on production #hello#world counts as 2 hashtags.
This will change occurrences of that on production. #hello#world will match the hashtag #hello, and ignore the #world.
I think this is correct behavior, but changing it needs discussion, because it may invalidate people's tags in posts that for whatever reason, like to #combine#their#tags#like#this
added 1 commit
- 2ae89295 - Changing spec test periods to 24h so tests pass - may need to revert
added 7 commits
-
779ade54...43727865 - 5 commits from branch
master
- 68f95577 - Merge branch 'master' of gitlab.com:minds/front into fix/hashtag-pipes-2361
- 37cf12f6 - Merge branch 'fix/hashtag-pipes-2361' of gitlab.com:minds/front into fix/hashtag-pipes-2361
-
779ade54...43727865 - 5 commits from branch
39 39 ); 40 40 }); 41 41 42 it('should transform when # preceded by [] ', () => { 42 it('should not transform when # preceded by [] ', () => { 43 43 const string = 'textstring [#name'; 44 44 const transformedString = pipe.transform(<any>string); 45 expect(transformedString).toContain( 45 expect(transformedString).not.toContain( 46 46 '<a href="/newsfeed/global/top;hashtag=name;period=7d' 47 47 ); 48 48 }); 49 49 50 it('should transform when # preceded by () ', () => { 50 it('should not transform when # preceded by () ', () => { - Owner
Why?
39 39 ); 40 40 }); 41 41 42 it('should transform when # preceded by [] ', () => { 42 it('should not transform when # preceded by [] ', () => { - Owner
Why?
23 23 pipe = new TagsPipe(featuresServiceMock, siteServiceMock); 24 24 }); 25 25 26 it('should transform when # in the middle ', () => { 26 it('should not transform when # in the middle ', () => { - Owner
Why?
39 39 } else if (this.featureService.has('top-feeds')) { 40 40 return `${ 41 41 m.match[1] 42 }<a href="/newsfeed/global/top;hashtag=${m.match[2].toLowerCase()};period=7d">#${ 42 }<a href="/newsfeed/global/top;hashtag=${m.match[2].toLowerCase()};period=24h">#${ - Owner
Intended?
188 188 expect(transformedString).toContain('<a class="tag" href="/name"'); 189 189 expect(transformedString).toContain('<a class="tag" href="/name1"'); 190 190 expect(transformedString).toContain( 191 '<a href="/newsfeed/global/top;hashtag=hash1;period=7d' 191 '<a href="/newsfeed/global/top;hashtag=hash1;period=24h' 192 192 ); 193 expect(transformedString).toContain( 193 expect(transformedString).not.toContain( - Owner
?
- Owner
I don't understand why we are making this change. The existing test cases are intended.
added scoped label
- Developer
/(^|\s||)#(\w+)/gim;
looks like it was intended to be/(^|\s)#(\w+)/gim;
, as I can't find any trace of a double alternation in regex. Maybe there is a reason?This had led me to think the test cases I've edited here are the correct way around.
Anyway, the issue is that URLs with hashes count as a hashtag, what we can do is either rewrite the query that allows
#tags#like#this
, what we don't want is to accepttags#like#this
.Or I can just check if its a URL before counting it as a tag.
Edited by Ben Hayward closed