this post was submitted on 29 Jan 2025
29 points (93.9% liked)

Linux

49393 readers
1654 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

Hi everyone !

I'm in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it's way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !

With everything I gathered around the web, It seems it's rather a complicated regex and sed substitution, here we go !

What Am I trying to achieve?

I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo...

Convert the following string:

[Some text](#Header%20Linking%20MARKDOWN.md)

Into

[Some text](#header-linking-markdown.md)

As you can see those are the following requirement:

  • Pattern: [Some text](#link%20to%20header.md)
  • Only edit what's between parentheses
  • Replace space (%20) with -
  • Everything as lowercase
  • Links are sometimes in nested parentheses
    • e.g. (look here [Some text](#link%20to%20header.md))
  • Do not change a line that begins with https (external links)

While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/

What I tried

The furthest I got was the following:

sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase

sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -

These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn't work with nested parentheses. Also this would change every %20 occurrence in the file.

The closest solution I found on stackoverflow looks similar but wasn't able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.


I would appreciate any help even if a change of tool is needed, however I'm more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !

Thanks in advance.

you are viewing a single comment's thread
view the rest of the comments
[–] bizdelnick@lemmy.ml 2 points 1 day ago

NB: global substitution s///g is not applicable here because you need to perform new substitutions in a substituted text. Both sed regexp syntaxes (basic and extended) don't support lookarounds that could solve this issue.