Source linked

Reverse BASE64 décodage démasque Stealthy PE Payload en JPEG

isc.sans.edu@threat_watch3 hours ago·Cybersecurity·3 comments

Une analyse statistique des caractères BASE64 dans un JPEG malveillant a révélé un encodage personnalisé qui a échangé "A" pour "#", puis inversé l'intégralité de la chaîne pour cacher une exécutable Windows à l'intérieur d'un blob de millions de caractères.

sansdidier stevensbase64dumppybyte statspycybersecuritymalware analysis

45.65% of the bytes in that JPEG were BASE64 characters- but the longest decodable string was barely 1000 characters, while a raw scan found a string nearly a million bytes long. That mismatch is the kind of thing that makes a reverse engineer reach for a better tool.

Xavier's earlier diary on the "Evil MSI Background" flagged a JPEG with a hidden payload. I pulled the same sample and ran my byte-stats.py to see what was really going on. The results threw up a clear anomaly: almost half the file was BASE64 characters, yet base64dump.py could only decode a trivial 1000-character string. Something was off.

Base64 Statistics Catch a Custom Encoding

I added a --stats option to base64dump.py to inspect the distribution of BASE64 characters in the detected strings. The pattern was telling: all 64 standard BASE64 characters appeared, but the letter A showed up significantly less than the others. When I bumped the minimum string length, the letter A vanished entirely. The padding character = was also missing, which is unusual but not definitive. The most frequent character? #.

Standard BASE64 doesn't use #. That single observation pointed to a custom encoding where A had been replaced with #. I fed that substitution into the decoder, but it still failed to produce a clean output. The encoded string length also wasn't a multiple of 4, which explained why base64dump.py had ignored it in the first place.

The Real Trick: Reversed BASE64 and a Million-Character Payload

Looking at the raw 1-million-character string, the pattern jumped out if you know your BASE64 markers. The string started with == (padding belongs at the end, not the start). The tail ended with ...qVT- which is TVq reversed. TVq is the BASE64 representation of the MZ executable header. The whole thing was a reversed BASE64 string.

I reversed the encoded payload with translate.py and decodes cleanly into a PE file. That PE file matched the hash Xavier extracted in his original analysis.

This new --stats feature in base64dump.py makes custom encoding detection systematic, not guesswork. By looking at character frequency distributions, you can spot character swaps, reversals, or any other twist a threat actor throws in. Throwing statistics at obfuscation is the kind of approach that forces attackers to work harder for their hiding spots.

Didier Stevens (SANS ISC) released this update to base64dump.py with the --stats flag for exactly this kind of scenario. If you're analyzing suspicious image files, add this to your workflow now.


Source: Evil MSI Background: BASE64 Statistical Analysis, (Mon, Jun 15th)
Domain: isc.sans.edu

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.