Thursday, February 28, 2008

Convert your PDF to Images


PDF is one of the most widely used format for distribution of electronic documents. My article on ASP Alliance Creating PDFs with C# using Ghostscript gives a brief idea on conversion of different document formats into PDF.


Ghostscript supports output to a variety of image file formats from pdf and ps files. These formats are called "output devices" in Ghostscript terminology. The ghostscript executible gswin32c.exe can be used to achieve this. The command used for this is as follows:

gswin32c -dSAFER -dBATCH -dNOPAUSE -sDEVICE=jpeg -o out_%d.jpg inputFile.pdf
  • -sDEVICE is used to specify the output device or the driver
  • -dSAFER -dBATCH -dNOPAUSE options suppress interactive prompts and enable some security checks on the file to be run.
  • -o is used to specify the output file name. %d option helps to automatically assign file names as per the page number. So, if a file has 10 pages, 10 files will be created.
The supported devices are as follows:

PNG
png16m : 24-bit RGB color
pnggray : grayscale
png256 : 8-bit color
png16 : 4-bit color
pngmono : black-and-white

JPG
jpeg : Produce color JPEG files.
jpeggray : produce grayscale JPEG files.

TIFF
The color TIFF drivers that produce uncompressed output:
tiffgray : Produces 8-bit gray output.
tiff12nc : Produces 12-bit RGB output (4 bits per component).
tiff24nc : Produces 24-bit RGB output (8 bits per component).
tiff32nc : Produces 32-bit CMYK output (8 bits per component).
tiffsep : The tiffsep device creates multiple output files.

TIFF drivers that produce black-and-white output with different compression modes:
tiffcrle : G3 fax encoding with no EOLs
tiffg3 : G3 fax encoding with EOLs
tiffg32d : 2-D G3 fax encoding
tiffg4 : G4 fax encoding
tifflzw : LZW-compatible (tag = 5) compression
tiffpack : PackBits (tag = 32773) compression

BMP
The BMP drivers produces uncompressed images with the help of the following devices:
bmpmono, bmpgray, bmpsep1, bmpsep8, bmp16, bmp256, bmp16m, bmp32b

We can also implement this in a C# application where this command can be executed with the help of the Ghostscript command line tool. Remember, in order to run this we will need the gswin32c.exe in the Application startup path. Below is a code snippet to achieve this.

CODE SNIPPET:
private string createImage(string inputFile, string outputPath)
{
str = new StringBuilder();
string outputFiles = Path.Combine(outputPath, "out_%d.jpg");
string command = "gswin32c -dSAFER -dBATCH -dNOPAUSE -sDEVICE=jpeg -o \"" + outputFiles + "\" \"" + inputFile + "\"";

try
{
Process pdfProcess = new Process();
StreamReader reader;
StreamWriter writer;

ProcessStartInfo info = new ProcessStartInfo("cmd");
info.WorkingDirectory = System.AppDomain.CurrentDomain.BaseDirectory;

info.CreateNoWindow = true;
info.UseShellExecute = false;
info.RedirectStandardInput = true;
info.RedirectStandardOutput = true;

pdfProcess.StartInfo = info;
pdfProcess.Start();
writer = pdfProcess.StandardInput;
reader = pdfProcess.StandardOutput;

writer.AutoFlush = true;
writer.WriteLine(command);
writer.Close();

string ret = reader.ReadToEnd();
}
catch (Exception ex)
{
throw ex;
}

return ret;
}

REFERENCES
http://ghostscript.com/doc/8.54/Use.htm
http://pages.cs.wisc.edu/~ghost/doc/cvs/Devices.htm

DOWNLOAD
You can also download a small windows application from here. This is a C# Windows application that takes in a pdf file and converts it to a number of jpg files.

5 comments:

Vijay said...

Great work........
can it be used with ASP.net

Bhuban said...

Yes, this can also be used with ASP.net. You will have to upload the pdf file to the server, the file will then be converted to images by a web service or directly by the application.

The architecture depends on you and the requirement.

Elie said...

You write very well.

Anonymous said...

hi
I tried your window application for converting pdf to image file.
But this application cannot convert pdf file to image file.
What can i do? Any other installer need?

Anonymous said...

HI Bhuban,

I am not able to convert using the code you have mentioned it doesnt give any errors too can you please help me with this. My email ID is kunu_sid@yahoo.com