Convert Octet-stream To Pdf
Jan 18, 2018 - File extension.azw3 Category Description AZW3 - Kindle Format 8 (KF8) is Amazon's newer version of AZW. It supports HTML5 and CSS3. AZW to PDF - Convert file now View other document file formats Technical Details Each PDF file encapsulates a complete description of a 2D document (and, with the advent of Acrobat 3D, embedded 3D documents) that includes the text, fonts, images and 2D vector graphics that compose the document. Some scanners mail the scanned pdf file as an application octet-stream with pdf extension these are indeed pdf files currently they are not.
Tools that I'm using for this:
Chrome Notepad++ Sublime Text 3 Fiddler WinMerge Adobe Acrobat Reader X
Synopsis
I have downloaded a pdf twice, once through Chrome as an experimental control; once again through a raw /GET
request via Fiddler which returns me an octet-stream. To this point, I can save the octet-stream as pdf and I can get the proper page count and some of the page headers and numbers, but very little of the body content is loading. When I open my file in Adobe Reader X, I get an error that it
Cannot extract the embedded font 'LFIDTH+ArialMT'. Some characters may not display or print correctly
and I cannot work through why it can be extracted from the 'true' pdf but cannot from the one I am saving.
Details
As for my manual pull of the file, I have provided
Accept: application/pdf, application/x-pdf, application/x-gzpdf, application/x-bzpdf
The server sent me back an aplication/octet-stream
with an attachment Disposition.
So to recap:
- Valid Foo.pdf sitting on my hard drive
- HTTP Response with an octet-stream version of same file, in UTF-8 encoding (I assume)
Here is what I know:
I pulled the Message Body of the response from the server and dropped it to file. I then ran a WinMerge comparison of it against the contents of the pdf and every line mismatched on line endings. I re-encoded the EOLs for Unix and the diff shrank to ~1k lines out of 160k. A close inspection of the mismatch indicates that the valid pdf maintains what looks like a NUL 00
character in places whereas my octet-stream contains literal spaces. Also, the 'true' pdf is reporting EOL: LF 1252 Mixed
through WinMerge. My 'raw' pdf is reporting 1252 Unix
When I homogenize the 'true' pdf to 1252 Unix
, I get the same issue as I explained in the 'raw' one.
Is there anything I can do to get this mess of an octet-stream straightened out?
Note that the pdf that was downloaded through Chrome is historic. I have it on my machine, but I downloaded it 'sometime in the past' and the request headers used when processing that /GET
are no longer available. Attempting to download through the browser 'now' results in an error, but an explicit GET request against the resource through Fiddler is returning the pdf as an octet-stream.
1 Answer
Well now....
In Fiddler Session,
Right click HTTP Response with the application/octet-stream
body Save Response Response Body
If Content-Disposition: attachment;filename
has been set on the response, the File Save Dialog will be prepopulated with filename
Easy after you know it's there.
Not the answer you're looking for? Browse other questions tagged pdfencodinghttpresponsefiddler or ask your own question.
I am using Mozilla Firefox with a PDF viewer plug-in. The plug-in has been correctly associated with Adobe Reader files to view them in the browser in the settings.
I would like to be able to view PDF files in Firefox rather than downloading them. This already works correctly when a web server indicates that a file has the Content-Type
of application/pdf
. However, some web servers provide other Content-Type
s for PDFs, such as application/octet-stream
. (See this example of a PDF served with a non-pdf Content-Type
.)
I have looked at Firefox's MimeTypes.rdf file, and it appears to only support mapping applications based on file extensions for non-Internet-based files. (It looks like it only uses Content-Type
to map Internet-based files.)
How can I have Firefox view all PDF documents in-browser rather than only the ones with the application/pdf
Content-Type
?
Convert Octet Stream To Pdf
2 Answers
Firefox has no content inspection code (e.g. the linux file
command) to detect the actual content type and rely on the Content-Type
header.
Workaround: mozplugger
extension
See man 7 mozplugger
:
Octet Stream Viewer
Workaround: human interaction
Save the file and open it in the file explorer ;-)
Workaround: misconfiguration
An additional workaround is to hack mimeTypes.rdf
and assign application/octet-stream
to the same value as application/pdf
.
I don't advice this workaround.
You can use Force Content-Type extension for pdf files with wrong Content-Type
response header.
For example if web server provides Content-Type: application/octet-stream
you can add rule to transform it to Content-Type: application/pdf
based on pdf file url, like this: