Jump to content
Jet Set Willy & Manic Miner Community

Truncated Posts


IRF

Recommended Posts

8 hours ago, IRF said:

I think you hit the nail on the head when you referred to non-standard characters, Andy. 🙂

Because "Roddenwald" is actually spelt with an umlaut over the 'e', as in: "Roddënwald".  There are still a few posts that remain truncated after "Rodd", namely these ones:

https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/
(That's the first post in the topic, so the title of the topic is also affected.)
https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11740
https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11742
https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11802

So it's very likely that the 'special e' (which comes just after the 'dd') was what caused the problem there.

There are also a couple of posts where Danny seems to refer to Fabian (the author of the game) as 'Fabi'.  Now, Danny isn't usually that informal, so I wonder whether the second 'a' in Fabian has an accent over it - i.e. Fabián? - and so those posts are also getting truncated because they involve another non-standard character?  One of the posts potentially affected is this one, which ends with: "One can see that a lot of work has already gone into this project, and the results are impressive. I hope that you will continue it, Fabi"
https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11773
However, I think that's one of the ones that you've already checked/fixed.  Furthermore, Danny's final remark (in quotes above) seems to be made 'in conclusion', so I suspect the only thing still missing now from that post is the final two letters of Fabian's name.

On the other hand, this post of Danny's is very short and ends with "Fabi", so there might be some missing stuff from here:
https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11822

In conclusion, I think we've got to the bottom of the problem - which is a great relief! - and it probably isn't as widespread as we initially feared - which is even more of a relief!!

 

Yes, it seems that once the converter found something "it did not like too much" it simply truncated that post down, which it should not of done. It should of really just offered an option to carry on, stop or at least logged its actions. I can tell you it did not complain of -any- errors!

To be fair the code is well written so quite how it managed to fall through I'm not completely sure.

I agree I do not think this is as widespread as initially thought, its only likely to effect special characters, and then not all of them either.

8 hours ago, jetsetdanny said:

Yes, exactly, I did put an accent there as there should be one in the Spanish spelling of the name. Apparently, I shouldn't have bothered... 😯

No-one was to know. Not your fault! 🙂

8 hours ago, IRF said:

I think it's safe to use such special characters now though - it's only a historic problem from before the site was moved over, as evidenced by the fact that I managed to post the special characters in my post earlier tonight. 👍

I'd agree with this, because the 'damage' as such was caused by the conversion tool, which was ran a long time ago.

It should not of been knocked unconscious and truncated data when it seemed "did not know what to do" , its not something I can file as a bug now anyway given the previous version would not be supported. I found further evidence of this on Wikipieda too:

Quote

It is possible in UTF-8 (or any other variable-length encoding) to split or

truncate a string in the middle of a character. If the two pieces are not re-appended later before interpretation as characters, this can introduce an invalid sequence at both the end of the previous section and the start of the next, and some decoders will not preserve these bytes and result in data loss. Because UTF-8 is self-synchronizing this will however never introduce a different valid character, and it is also fairly easy to move the truncation point backward to the start of a character.

Quote
3 hours ago, jetsetdanny said:

That you were able to post them is one thing, but whether your post could be successfully brought back from a backup in its entirety is another, I think.

Before the meltdown we were able to post those characters, too - only we didn't know that they would mark the impending doom... 🙃data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==

 

I cannot (as best as I can) foresee any issues moving forwards as the table is fully UTF8MB4 ( see here ) so there should be sufficient "space" for most sane character encoding schemes. I do appreciate the concern however.

To this end to eliminate worries, I've created a temporary topic in the 'testing forum' , please feel free to quote and copy/paste/type in non-standard characters, although I'd rather the topic does not span more than one page.

In a few days I'll attempt to restore this on my own local server and provide a screenshot of its outcome.

Link to comment
Share on other sites

Further digging (thanks @IRF for pointing something out) revealed further confirmation of my thoughts:

 

Old unconverted data example:

Quote

Thanks, I'll check it.
 

Have you tried Roddënwald, IRF? What do you think?

New converted data example (what was here after the convertor did its erm 'magic)

Quote

Thanks, I'll check it.

 

Have you tried Rodd

 

Can be plainly and clearly seen its "given up" as soon as it hit the " ë " character.

On a better note again, new data here now 🙂

Quote

Thanks, I'll check it.
 

Have you tried Roddënwald, IRF? What do you think?

For reference, that post in question is this one

Link to comment
Share on other sites

The following two topics were found to contain truncated posts caused by the charset. I had decided a sane way to look at this further without SQL was to simply view all posts of the member, this is done via their profile and is available to all members anyway. From there I examined each topic that was posted in for any signs of issues, I had to look further than the posts concerned in case any had been quoted etc.

I've restored them. To be fair it was two large and two small posts:

The post here by myself in "JSW Dark Souls" was restored as the existing copy had no visible content

The large post here by jetsetdanny in "Madam Blavskja's Carnival Macabre 48K" was restored as it had been truncated down to a few words!

The two other posts are in the latter topic too, although the restoration of those was small as only a few words were missing, ie not really more than one sentence. Here and Here for reference.

 

I'd be surprised to find much if any more 'damage' in posts now though really as not many members tend to use those characters and if its done in the future it won't effect anything. 🙂

 

Link to comment
Share on other sites

12 hours ago, jetsetdanny said:

Thanks, Andy, especially for restoring my post with the announcement of the release of "Madam Blavskja's Carnival Macabre 48K", which is very important from my personal perspective! 👍

Most welcome. 🙂 You may wish to edit it to tidy the text formatting a little bit. But the data is there, that's the main thing.

On 4/20/2021 at 8:39 AM, Spider said:

...  I've created a temporary topic in the 'testing forum' , please feel free to quote and copy/paste/type in non-standard characters, although I'd rather the topic does not span more than one page.

In a few days I'll attempt to restore this on my own local server and provide a screenshot of its outcome.

Regarding my quote, I've taken a backup and restored it. The results match the test topic. 🙂 Job done!

 

Link to comment
Share on other sites

The final elusive " missing posts " in the " Jet Set Jason: In Roddënwald " topic were finally restored properly and into the correct places within the topic. Some small loss of formatting has naturally occurred however the content is there! 🙂

For reference the posts in question:

One post here by Korzy_iz_Adb from 9th March

One post here by IRF from 24th March

Two post's here and here by IRF from 26th March

 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.