Updated URL regex for hashtags to disregard hashes used mid-sentence. #2361
Summary
Closes #2361
Simply, users would have one of their 5 hashtags used if posting a url with a hash in the query string, for example the invalid URL https://www.minds.com/#minds
The changed regex matches when:
- The first character is a space, or the start of a line.
- Followed by a hash.
- Followed atleast one word character
#Lorem#ipsum #dolor.sit amet, #consectetur adipiscing e#lit, https://loremipsum.io/#123#sed #do #eiusmod #tempo1=incididunt #132323 #@ #w #[ #] ' #word \ #labore@et dolore #magna aliqua. #Ut #enim #ad .minim #veniam, ^ #quis
#nostrud exercitation ullamco #laboris nisi
ut #aliquip ex ea commodo#consequat. Duis ()'#aute sad'#irure dolor
#in reprehen;'
#derit in@#voluptate velit essehttp: ci#https://loremipsum.io/#123llum dolore https://loremipsum.io/#123 https://loremipsum.io/#eu https://loremipsum.io/#fugiat https://loremipsum.io/#123 nulla pariatur. #Excepteur #sint #occaecat cupidatat
#non
#proident,
a #sunt in culpa qui officia deserunt mollit anim id est laborum.
Steps to test
This one is a bit more "get in there and try your best to break it" in terms of testing, but that said, to get a foot in the door:
- Go on the site, make a post with 5 hashtags; try to trip the Regex up and make some invalid tags to see how it responds e.g. hello#world, @#hello.
- Try a URL with a hashtag in.
- Check the hashtags links go through to the correct place.
- Try to use the hashtag selector to bypass the limit
- Ensure when you add a tag it shows up in the selector
Estimated Regression Scope
The only thing making me pause to question is the old Regex goes like this/: /(^|\s||)#(\w+)/gim,
. It could be the source of a bug, or it could be a Regex trick I'm missing.
Take the first group (^|\s||)
- to me that looks like a bug. I'm not aware of any double OR operand in regex nor can I find a reference for it.
This appears to stop the \s
from being matched, and I presume also the ^
. Thus on production #hello#world counts as 2 hashtags.
This will change occurrences of that on production. #hello#world will match the hashtag #hello, and ignore the #world.
I think this is correct behavior, but changing it needs discussion, because it may invalidate people's tags in posts that for whatever reason, like to #combine#their#tags#like#this
added 1 commit
- 2ae89295 - Changing spec test periods to 24h so tests pass - may need to revert