How can I convert HTML content to pdf with images using ItextSharp in c#

Updated: Sep 15, 2020

In this blog I will explain the following things

  • Conversion of html to pdf using itextsharp.

  • Add css files while generating the pdf.

  • Convert html Images or canvas to the pdf.

  • Conversion of pdf base64 string to blob using javascript.

About ItextSharp

  • ItextSharp is a nuget package used to generated pdf's.

  • It supports various functionalities to generate pdf like "convert html to pdf".

  • It also supports custom tag processing which helps to "convert html with images".

In this blog i will use my previous blog output as reference, For more information click here


Html Output


Conversion of html to pdf using itextsharp


Nuget Packages:

  • iTextSharp(v5.5.13.1)

  • itextsharp.xmlworker(v5.5.13.1)

Create a controller add "Index" action method with "Index" view.

Index.cshtml file

@{
    ViewBag.Title = "Index";
    Layout = "~/Views/Shared/_Layout.cshtml";
}

<h2>Index</h2>

<h4 class="text-primary">Chart rendered from asp.net mvc</h4>
<div style="width: 900px; height: 800px">
    <canvas id="scatterChart" name="Img1"></canvas>
    <button id="downloadPdf">Generate pdf</button>
</div>

Index action method

 // GET: Chart
        public ActionResult Index()
        {
            return View();
        }

Note

There are some limitations with itextsharp so for that reason i created another view "pdf.cshtml" i will explain about the limitations at the end.


pdf.cshtml file

@{
    ViewBag.Title = "pdf";
}

<h2>Index</h2>

<h4 class="text-primary">Chart rendered from asp.net mvc</h4>
<img id="Img1" src="" />

Explanation

  • "text-primary" is a bootstrap class which is used here to explain us how to generate pdf with css

  • I replaced the canvas tag with img tag to render in pdf.

In your controller add the below post method to get pdf based base64 string

 [HttpPost]
        public JsonResult GeneratePdf(DownloadPdf downloadPdf)
        {
            PdfImages = downloadPdf.PdfImages;
            string htmlString = HtmlToStringConverter.RenderViewToString(this, "pdf", null);
            var tagProcessors = (DefaultTagProcessorFactory)Tags.GetHtmlTagProcessorFactory();
            tagProcessors.RemoveProcessor(HTML.Tag.IMG); // remove the default processor
            tagProcessors.AddProcessor(HTML.Tag.IMG, new CustomImageTagProcessor()); // use our new processor

            var output = new MemoryStream();
           // css files code resolves the css while generating pdf
            List<string> cssFiles = new List<string>();
            cssFiles.Add(@"/Content/bootstrap.css");
            cssFiles.Add(@"/Content/Site.css");

            var input = new MemoryStream(Encoding.UTF8.GetBytes(string.Format(htmlString)));
            var document = new Document();
            var writer = PdfWriter.GetInstance(document, output);
            writer.CloseStream = false;
            document.Open();
            var htmlContext = new HtmlPipelineContext(null);

            // htmlContext.SetTagFactory(iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory());

            htmlContext.SetTagFactory(tagProcessors);
            //map the css files to apply the css styles in the pdf.
            ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
            cssFiles.ForEach(i => cssResolver.AddCssFile(System.Web.HttpContext.Current.Server.MapPath(i), true));

            var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));
            var worker = new XMLWorker(pipeline, true);
            var p = new XMLParser(worker);
            p.Parse(input);
            document.Close();
            output.Position = 0;

            FileContentResult result = this.File(output.ToArray(), "application/pdf");
            string base64String = Convert.ToBase64String(result.FileContents, 0, result.FileContents.Length);
            // return result.FileContents;
            return new JsonResult { Data = new { success = true, pdfString = base64String } };
        }


In the above code the following lines helps to "Add css files while generating the pdf."

// css files code resolves the css while generating pdf
            List<string> cssFiles = new List<string>();
            cssFiles.Add(@"/Content/bootstrap.css");
            cssFiles.Add(@"/Content/Site.css");

            //map the css files to apply the css styles in the pdf.
            ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
            cssFiles.ForEach(i => cssResolver.AddCssFile(System.Web.HttpContext.Current.Server.MapPath(i), true));

Add the "PdfImages" property to your controller

 public static List<Models.PdfImage> PdfImages { get; set; }

Add a modal class "DownloadPdf" which accepts the urls of Images with keys(id's)

DownloadPdf.cs file

public class DownloadPdf
    {
        public List<PdfImage> PdfImages { get; set; }
    }

    public class PdfImage
    {
        public string Key { get; set; }

        public string Base64ImgUrl { get; set; }
    }

Now use the static class HtmlToStringConverter to get your html as string

public static class HtmlToStringConverter
    {
        public static string RenderViewToString(this Controller controller, string viewName, object model)
        {
            var context = controller.ControllerContext;
            if (string.IsNullOrEmpty(viewName))
            {
                viewName = context.RouteData.GetRequiredString("action");
            }

            var viewData = new ViewDataDictionary(model);

            using (var sw = new StringWriter())
            {
                var viewResult = ViewEngines.Engines.FindPartialView(context, viewName);
                var viewContext = new ViewContext(context, viewResult.View, viewData, new TempDataDictionary(), sw);
                viewResult.View.Render(viewContext, sw);

                return sw.GetStringBuilder().ToString();
            }
        }
    } 

Note(Recommended)

Place the HtmlToStringConverter static class inside the namespace that refers to your controller


Now create a class "CustomImageTagProccessor.cs" file which plays a key role on

"Convert html Images or canvas to the pdf."

 public class CustomImageTagProcessor : iTextSharp.tool.xml.html.Image
    {
        public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
        {
            IDictionary<string, string> attributes = tag.Attributes;
            string id = string.Empty;
            string src = "test";
            //string src;

            if (!attributes.TryGetValue(HTML.Attribute.ID, out id))
                return new List<IElement>(1);

            src = ChartController.PdfImages.Where(e => e.Key == id).FirstOrDefault().Base64ImgUrl;

            if (src.StartsWith("data:image/", StringComparison.InvariantCultureIgnoreCase))
            {
                var tempbase64Data = src.Substring(src.IndexOf(",") + 1);
                var tempLength = tempbase64Data.Length;

                var base64Data = Regex.Match(src, @"data:image/(?<type>.+?),(?<data>.+)").Groups["data"].Value;

                int length = base64Data.Length;
                int rem = base64Data.Length % 4;
                switch (rem) // Pad with trailing '='s
                {
                    case 0: break; // No pad chars in this case
                    case 2: base64Data += "=="; break; // Two pad chars
                    case 3: base64Data += "="; break; // One pad char
                    default:
                        throw new System.Exception(
                 "Illegal base64url string!");
                }

                var imagedata = Convert.FromBase64String(base64Data);
                var image = iTextSharp.text.Image.GetInstance(imagedata);

                var list = new List<IElement>();
                var htmlPipelineContext = GetHtmlPipelineContext(ctx);
                list.Add(GetCssAppliers().Apply(new Chunk((iTextSharp.text.Image)GetCssAppliers().Apply(image, tag, htmlPipelineContext), 0, 0, true), tag, htmlPipelineContext));
                return list;
            }
            else
            {
                return base.End(ctx, tag, currentContent);
            }
        }

    }

The above CustomImageTagProcessor class resolves the Images and the image will be encoded in the pdf.


Now the final json result will get a pdfstring which is a base64 encoded format. use a javascript file to call the "GeneratePdf" post method and capture the json result and convert the pdfstring to blob and render in the browser or download


JavaScript file

$(document).ready(function () {
    $("#downloadPdf").click(function () {
        console.log("button clicked");
        var pdfImages = [];
        var pdfImage = new Object();
        console.log($("#scatterChart"));
        pdfImage["Key"] = $("#scatterChart").attr("name");
        pdfImage["Base64ImgUrl"] = $("#scatterChart")[0].toDataURL(1.0);
        console.log(pdfImage["Base64ImgUrl"]);
        pdfImages[0] = pdfImage;
        GetPdfString(pdfImages);
    })
})

GetPdfString = function (pdfImages) {
    var jsonObject = new Object;
    jsonObject["PdfImages"] = pdfImages;
    $.ajax({
        url: '/Chart/GeneratePdf',
        data: JSON.stringify(jsonObject),
        type: "post",
        dataType: "json",
        contentType: "application/json",
        success: function (response) {
            ConvertPdfStringToPdf(response.pdfString);
        }
    })
}

ConvertPdfStringToPdf = function (pdfString) {
    // Json.pdfString is base64 encoded

    const binaryString = window.atob(pdfString);
    const len = binaryString.length;
    const bytes = new Uint8Array(len);
    for (let i = 0; i < len; ++i) {
        bytes[i] = binaryString.charCodeAt(i);
    }
    var file = new Blob([bytes], { type: 'application/pdf' });

    if (window.navigator && window.navigator.msSaveOrOpenBlob) {

        window.navigator.msSaveOrOpenBlob(file, "test.pdf");
    }

    var fileURL = URL.createObjectURL(file);
    let aEle = document.createElement('a');
    aEle.href = fileURL;
    aEle.setAttribute("target", "_blank");
    //aEle.download = "test.pdf";
    aEle.click();
}

ConvertPdfStringToPdf(pdfString) is used for Conversion of pdf base64 string to blob using javascript.


Pdf Output:


Limitations of ItextSharp

  • I used to files Index.cshtml to render as view and pdf.cshtml to render html to pdf

  • I textsharp doesn't understand scripts so that i removed the layout in pdf.csthml file and your html files should be formatted properly (Ex: <img href="#" src=""> without end tag itextsharp wont accept it, It should be either like <img /> or <img></img>)


Thanks & Regards,

PRADEEP KUMAR BAISETTI

Associate Trainee – Enterprise Application Development 

Digital Transformation

w. www.mouritech.com

2,389 views0 comments

Recent Posts

See All