jetsetdanny Posted April 20, 2021 Report Share Posted April 20, 2021 That you were able to post them is one thing, but whether your post could be successfully brought back from a backup in its entirety is another, I think. Before the meltdown we were able to post those characters, too - only we didn't know that they would mark the impending doom... 🙃 IRF 1 Quote Link to comment Share on other sites More sharing options...
Spider Posted April 20, 2021 Report Share Posted April 20, 2021 8 hours ago, IRF said: I think you hit the nail on the head when you referred to non-standard characters, Andy. 🙂 Because "Roddenwald" is actually spelt with an umlaut over the 'e', as in: "Roddënwald". There are still a few posts that remain truncated after "Rodd", namely these ones: https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/ (That's the first post in the topic, so the title of the topic is also affected.) https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11740 https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11742 https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11802 So it's very likely that the 'special e' (which comes just after the 'dd') was what caused the problem there. There are also a couple of posts where Danny seems to refer to Fabian (the author of the game) as 'Fabi'. Now, Danny isn't usually that informal, so I wonder whether the second 'a' in Fabian has an accent over it - i.e. Fabián? - and so those posts are also getting truncated because they involve another non-standard character? One of the posts potentially affected is this one, which ends with: "One can see that a lot of work has already gone into this project, and the results are impressive. I hope that you will continue it, Fabi" https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11773 However, I think that's one of the ones that you've already checked/fixed. Furthermore, Danny's final remark (in quotes above) seems to be made 'in conclusion', so I suspect the only thing still missing now from that post is the final two letters of Fabian's name. On the other hand, this post of Danny's is very short and ends with "Fabi", so there might be some missing stuff from here: https://jswmm.co.uk/topic/532-jet-set-jason-in-rodd/?do=findComment&comment=11822 In conclusion, I think we've got to the bottom of the problem - which is a great relief! - and it probably isn't as widespread as we initially feared - which is even more of a relief!! Yes, it seems that once the converter found something "it did not like too much" it simply truncated that post down, which it should not of done. It should of really just offered an option to carry on, stop or at least logged its actions. I can tell you it did not complain of -any- errors! To be fair the code is well written so quite how it managed to fall through I'm not completely sure. I agree I do not think this is as widespread as initially thought, its only likely to effect special characters, and then not all of them either. 8 hours ago, jetsetdanny said: Yes, exactly, I did put an accent there as there should be one in the Spanish spelling of the name. Apparently, I shouldn't have bothered... 😯 No-one was to know. Not your fault! 🙂 8 hours ago, IRF said: I think it's safe to use such special characters now though - it's only a historic problem from before the site was moved over, as evidenced by the fact that I managed to post the special characters in my post earlier tonight. 👍 I'd agree with this, because the 'damage' as such was caused by the conversion tool, which was ran a long time ago. It should not of been knocked unconscious and truncated data when it seemed "did not know what to do" , its not something I can file as a bug now anyway given the previous version would not be supported. I found further evidence of this on Wikipieda too: Quote It is possible in UTF-8 (or any other variable-length encoding) to split or truncate a string in the middle of a character. If the two pieces are not re-appended later before interpretation as characters, this can introduce an invalid sequence at both the end of the previous section and the start of the next, and some decoders will not preserve these bytes and result in data loss. Because UTF-8 is self-synchronizing this will however never introduce a different valid character, and it is also fairly easy to move the truncation point backward to the start of a character. Quote 3 hours ago, jetsetdanny said: That you were able to post them is one thing, but whether your post could be successfully brought back from a backup in its entirety is another, I think. Before the meltdown we were able to post those characters, too - only we didn't know that they would mark the impending doom... 🙃data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw== I cannot (as best as I can) foresee any issues moving forwards as the table is fully UTF8MB4 ( see here ) so there should be sufficient "space" for most sane character encoding schemes. I do appreciate the concern however. To this end to eliminate worries, I've created a temporary topic in the 'testing forum' , please feel free to quote and copy/paste/type in non-standard characters, although I'd rather the topic does not span more than one page. In a few days I'll attempt to restore this on my own local server and provide a screenshot of its outcome. IRF 1 Quote Link to comment Share on other sites More sharing options...
Spider Posted April 20, 2021 Report Share Posted April 20, 2021 Further digging (thanks @IRF for pointing something out) revealed further confirmation of my thoughts: Old unconverted data example: Quote Thanks, I'll check it. Have you tried Roddënwald, IRF? What do you think? New converted data example (what was here after the convertor did its erm 'magic) Quote Thanks, I'll check it. Have you tried Rodd Can be plainly and clearly seen its "given up" as soon as it hit the " ë " character. On a better note again, new data here now 🙂 Quote Thanks, I'll check it. Have you tried Roddënwald, IRF? What do you think? For reference, that post in question is this one IRF and jetsetdanny 2 Quote Link to comment Share on other sites More sharing options...
Spider Posted April 20, 2021 Report Share Posted April 20, 2021 For all those that have participated in this and the 'test' topic too, either pointing things out or pushing me in the right direction: jetsetdanny and IRF 2 Quote Link to comment Share on other sites More sharing options...
Spider Posted April 21, 2021 Report Share Posted April 21, 2021 The following two topics were found to contain truncated posts caused by the charset. I had decided a sane way to look at this further without SQL was to simply view all posts of the member, this is done via their profile and is available to all members anyway. From there I examined each topic that was posted in for any signs of issues, I had to look further than the posts concerned in case any had been quoted etc. I've restored them. To be fair it was two large and two small posts: The post here by myself in "JSW Dark Souls" was restored as the existing copy had no visible content The large post here by jetsetdanny in "Madam Blavskja's Carnival Macabre 48K" was restored as it had been truncated down to a few words! The two other posts are in the latter topic too, although the restoration of those was small as only a few words were missing, ie not really more than one sentence. Here and Here for reference. I'd be surprised to find much if any more 'damage' in posts now though really as not many members tend to use those characters and if its done in the future it won't effect anything. 🙂 Quote Link to comment Share on other sites More sharing options...
jetsetdanny Posted April 21, 2021 Report Share Posted April 21, 2021 Thanks, Andy, especially for restoring my post with the announcement of the release of "Madam Blavskja's Carnival Macabre 48K", which is very important from my personal perspective! 👍 Spider and IRF 2 Quote Link to comment Share on other sites More sharing options...
Spider Posted April 22, 2021 Report Share Posted April 22, 2021 12 hours ago, jetsetdanny said: Thanks, Andy, especially for restoring my post with the announcement of the release of "Madam Blavskja's Carnival Macabre 48K", which is very important from my personal perspective! 👍 Most welcome. 🙂 You may wish to edit it to tidy the text formatting a little bit. But the data is there, that's the main thing. On 4/20/2021 at 8:39 AM, Spider said: ... I've created a temporary topic in the 'testing forum' , please feel free to quote and copy/paste/type in non-standard characters, although I'd rather the topic does not span more than one page. In a few days I'll attempt to restore this on my own local server and provide a screenshot of its outcome. Regarding my quote, I've taken a backup and restored it. The results match the test topic. 🙂 Job done! IRF and jetsetdanny 2 Quote Link to comment Share on other sites More sharing options...
Spider Posted April 24, 2021 Report Share Posted April 24, 2021 The final elusive " missing posts " in the " Jet Set Jason: In Roddënwald " topic were finally restored properly and into the correct places within the topic. Some small loss of formatting has naturally occurred however the content is there! 🙂 For reference the posts in question: One post here by Korzy_iz_Adb from 9th March One post here by IRF from 24th March Two post's here and here by IRF from 26th March jetsetdanny and IRF 1 1 Quote Link to comment Share on other sites More sharing options...
IRF Posted April 24, 2021 Author Report Share Posted April 24, 2021 Thanks for restoring all the missing posts, Andy! You've done a really good job of it! 🙂 Spider 1 Quote Link to comment Share on other sites More sharing options...
jetsetdanny Posted April 24, 2021 Report Share Posted April 24, 2021 Yes, thanks, Andy! 👍 Spider 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.