Coding a Rich Text Parser Because Why Not
Have you ever had things be too easy? You have that urge to unnecessarily turn up the difficulty on doing something? Have you considered writing a parser?
What's a Parser?
A parser is a fancy way of saying a program that takes data in one format, and then interprets the data into the rules of another format. You have almost certainly acted as a human parser of information at multiple points in your life. I have a doctor for a mother so I've parsing scribbles into English for nearly my entire life. Here's an example more people might relate to. If you took high school math you're probably familiar with the acronym BEDMAS (the one true lord). This stands for brackets, exponents, division, multiplication, addition, subtraction. This is is the order of operations and is very important information when humans are parsing text as a mathematical problem.
For example
4 + 3 x 7 = ?
Without BEDMAS you may interpret order of operations from left to right 4 + 3 is 7 and 7 x 7 is 49. However because we know BEDMAS is part of the rules when parsing a mathematical equation we know 3 x 7 is the first operation producing 21, then we add 4 to produce 25.
Without common parsing rules for mathematical notation we'd constantly arrive at different answers to problems.
Converter vs Parser
A converter and a parser are usually closely related, but not the exact same thing. A converter acts as a translation layer between two compatible formats that are hopefully equivalent. In this scenario the rules of both formats are well known, and the data is already in a native format. An example of a converter that many people may be familiar with is an MP3 converter.
Back in ye olden times music came on CDs and the entire musical library of the universe wasn't on Spotify. If you had a CD player that was all well and good, but what if you had an MP3 player? No matter how hard you try you will not be able to slam your CD into an MP3 player to make the music happen. You think surely I can just put this CD into my computer and copy the MP3s right off it. Well you can't. The CD doesn't have MP3s on it all it's a completely different format specific to music stored on CDs.
So what are you to do? Well you turn that CD into MP3 files using an MP3 converter of course.
You go on the internet, type "free MP3 converter" into the search bar, then and click the first link. What's the worst that could happen? After that's done downloading you slap your CD into the disk drive and run the converter. It sees your CD and you hit the start button. It starts working away reading the CD and spits out nice little MP3 files that you could swear sound just as good as the original. Those guys ranting about FLAC at the CD shop didn't know what they were talking about.
To do the conversion you didn't have to interpret the data on the CD as audio, it's already encoded as CD audio. You don't have to dynamically interpret the data, the standard is 44100 samples every second at 16bits.
Now if you were going in the opposite direction you would need a parser for MP3 files before converting. The MP3 file spec contains many different possibilities for encodings, MPEG versions, protection bits, channels, sampling rates, etc.
For Why?
Well you see Wix stores rich text in a format that is not HTML. It's actually a JSON structure object that describes the rich text formatting, but actually displaying it is the job of a converter. Wix actually has a parser available. I tried installing it but it spit out some kind of error about needing React 16 instead of 18, and the documentation had a concerning lean towards constructing the content client side.
I'm unnecessarily also using NextJS so I want server side rendering dang it 😠
It doesn't seem like there's an easily accessible document on the Wix rich text formatting specification. It seems like the whole thing is pretty new, and also their Github repo for this is returning a 404 at the time. So instead of waiting for that I'm just going to do it live.
How?
Reverse engineering. You can learn a lot just by reading the output of some content that you already know how it should be formatting because you did it. From there you just have to start chipping away at the biggest most commonly used functionality, then work your way towards edge cases.
Implementation
All put details about the full implementation here when it's done
Step 1
Figure out how to call the APIs
Step 2
Get data from the APIs
Step 3
Switch statements to interpret the data from the APIs
Step 4
Convert that data to readable content
Step 5
...
Step 6
Profit 😎
Unrelated Section of Every Wix Rich Text Editor Feature in Use
H1
H2
H3
H4
H5
H6
Paragraph
This text is bold, italicized, underlined, highlighted everything
A link to Google.com
Fourscore and seven years ago was 87 years ago
This text is right aligned
Idented content
First things first
second things next
things
in
no
particular
order
Manual edited html
const text = 'Hello World!' console.log(text)
Team | Talent | Memes | Result |
Chiefs | 10/10 | Kermit Me Homes here | Annoyingly good |
Broncos | 3/10 | So many 😂 | The most interesting dumpster fire to follow as a nonfan |
Lions | 7/10 |
| Perfect mixture of hard nose football and and laughable stupidity |