Home .NET Making a PDF book from a webcomic using C# using xkcdas an example

Making a PDF book from a webcomic using C# using xkcdas an example

by admin

Making a PDF book from a webcomic using C# using xkcdas an example
Considering the new issue of xkcd , I looked at my newly purchased Sony PRS-650 e-book, and immediately thought – I want to watch comics on it!Xkcd’s are just black and white and usually small in size.After a little googling I found only a collection of pictures on TPB, and a bash script which should make a PDF. Decided to do some programming, and made a comic book grabber in my beloved C#.
You could have done with a console application, but for the sake of clarity, I made a simple interface in WPF.
A full code parsing will be unnecessary, so I will explain the main points. I recommend opening/downloading the full application code right away from Google Code

1. Get images, titles and alt text from the site

On xkcd the comics are conveniently located at addresses like xkcd.com/n where n=1…
My first thought was to tear it out of the page code, but I found out that you can get all the information in JSON like this xkcd.com {0}/info.0.json
For JSON in .NET there is a DataContractJsonSerializer
We create a corresponding DataContract:

[DataContract]public class XkcdComic{#region Public properties and indexers[DataMember]public string img { get; set; }[DataMember]public string title { get; set; }[DataMember]public string month { get; set; }[DataMember]public string num { get; set; }[DataMember]public string link { get; set; }[DataMember]public string year { get; set; }[DataMember]public string news { get; set; }[DataMember]public string safe_title { get; set; }[DataMember]public string transcript { get; set; }[DataMember]public string day { get; set; }[DataMember]public string alt { get; set; }#endregion}

and use :

private static XkcdComic GetComic(string url){var stream = new WebClient().OpenRead(url);if (stream == null) return null;var serializer = new DataContractJsonSerializer(typeof (XkcdComic));return serializer.ReadObject(stream) as XkcdComic;}

At xkcd.com/info.0.json you can get the last comic strip, and by taking its number from the num field you can get the total number of comics.
Now you need to download the picture itself, it’s easy:

var imageBytes = WebRequest.Create(comicInfo.img).GetResponse().GetResponseStream().ToBytes();

where comicInfo is our JSON data and ToBytes() is a simple extension method that reads data from the stream into an array.
The Comic class is used to represent the comic (comic-strip, or how do we call it in the singular?). In order to validate received picture bytes (we could download something wrong, server could return error, etc.) the class constructor is made private and we added the Create factory method which will return null in case of decoding error. BitmapImage is used for decoding, which, if successful, will be used as a thumbnail to preview the result:

public static Comic Create(byte[] imageBytes){try{// Validate image bytes by trying to create a Thumbnail.return new Comic {ImageBytes = imageBytes};}catch{// Failure, cannot decode bytesreturn null;}}public byte[] ImageBytes{get { return _imageBytes; }private set{_imageBytes = value;var bmp = new BitmapImage();bmp.BeginInit();bmp.DecodePixelHeight = 100; // Do not store whole picturebmp.StreamSource = new MemoryStream(_imageBytes);bmp.EndInit();bmp.Freeze();Thumbnail = bmp;}}

Putting everything together, we get a method for downloading a comic strip by its number :

protected override Comic GetComicByIndex(int index){// Download comic JSONvar comicInfo = GetComic(string.Format(UrlFormatString, index + 1));if (comicInfo == null) return null;// Download picturevar imageStream = WebRequest.Create(comicInfo.img).GetResponse().GetResponseStream().ToMemoryStream();var comic = Comic.Create(imageStream.GetBuffer());if (comic == null) return null;comic.Description = comicInfo.alt;comic.Url = comicInfo.link;comic.Index = index + 1;comic.Title = comicInfo.title;// Auto-rotate for best fitvar t = comic.Thumbnail;if (t.Width > t.Height){comic.RotationDegrees = 90;}return comic;}

So we have the number of comics and a method to get the stripe by index.

Paralleling downloads

I will use Task Parallel Library since I’ve been meaning to try it for a long time but haven’t had a chance. At first glance it’s simple, in the loop instead of calling GetComicByIndex(i) directly do var task = Task.Factory.StartNew(() => GetComicByIndex(i)). We write all the running tasks to the tasks array and do Task.WaitAll(tasks), then get the results of each task from task.Result. But this approach will not allow us to track progress and show already loaded strips to the user. To solve this problem, we will use WaitAny and yield return to return the result of each task as soon as it is finished :

public IEnumerable<Comic> GetComics(){var count = GetCount();var tasks = Enumerable.Range(0, count).Select(GetTask).ToList();while (tasks.Count > 0) // Iterate until all tasks complete{var task = tasks.WaitAnyAndPop();if (task.Result != null) yield return task.Result;}}

Here the GetTask method returns the GetComicByIndex(i) task, plus error handling and caching (this is beyond the scope of this article). WaitAnyAndPop is an extension method that waits for one of the tasks to complete, removes it from the list, and returns :

WaitAnyAndPop is an extension method that waits for one of the tasks to finish, removes it from the list and returns :public static Task<T> WaitAnyAndPop<T> (this List<Task<T> > taskList){var array = taskList.ToArray();var task = array[Task.WaitAny(array)];taskList.Remove(task);return task;}

Now in the ViewModel code (I’m not addressing architectural issues in this article, but MVVM (Model-View-ViewModel) is the de-facto standard for WPF applications, and code for pulling, exporting and other things is of course broken down by appropriate classes) we can iterate on the result of the GetComics method in the background thread and show the user strips as they arrive :

private readonly Dispatcher _dispatcher;private readonly ObservableCollection<Comic> _comics = new ObservableCollection<Comic> ();private void StartGrabbing(){_dispatcher = Dispatcher.CurrentDispatcher; // ObservableCollection modifications should be performed on the UI threadThreadPool.QueueUserWorkItem(o => DoGrabbing());}private void DoGrabbing(){var grabber = new XkcdGrabber();foreach (var comic in grabber.GetComics()){var c = comic;_dispatcher.Invoke((Action) (() => Comics.Add( c )), DispatcherPriority.ApplicationIdle);}}

2. Displaying comics in WPF

In the XAML code, all we have to do is make a Binding to our ObservableCollection, and prepare a corresponding DataTemplate to observe the loading process and the comics themselves, with alt-text in the Tooltip:

<ListView ItemsSource="{Binding Comics}" ScrollViewer.VerticalScrollBarVisibility="Disabled"x:Name="list" Margin="5, 0, 5, 0"ScrollViewer.HorizontalScrollBarVisibility="Visible" Grid.Row="1"><ItemsControl.ItemTemplate><DataTemplate><Border BorderBrush="Gray" CornerRadius="5" Padding="5" Margin="5" BorderThickness="1"><StackPanel Orientation="Vertical"><StackPanel Orientation="Horizontal"><TextBlock Text="{Binding Index}" FontWeight="Bold" /><TextBlock Text="{Binding Title}" FontWeight="Bold" Margin="10, 0, 0, 0" /></StackPanel><Image Source="{Binding Thumbnail}" ToolTip="{Binding Description}"Height="{Binding Thumbnail.PixelHeight}" Width="{Binding Thumbnail.PixelWidth}" /></StackPanel></Border></DataTemplate></ItemsControl.ItemTemplate><ItemsControl.ItemsPanel><ItemsPanelTemplate><StackPanel Orientation="Horizontal" /></ItemsPanelTemplate></ItemsControl.ItemsPanel></ListView>

3. Creating a PDF book

PDF was chosen because of its popularity and good support in Sony eBooks. There is a handy open source library for working with PDF in .NET iTextSharp (you’ll need to download it separately to build the project). It’s pretty straightforward here. Omitting the exception handling, picture size fitting and fonts, we get the following :

var document = new Document(PageSize.LETTER);var wri = PdfWriter.GetInstance(document, new FileStream(fileName, FileMode.Create));document.Open();foreach (var comic in comics.OrderBy(c => c.Index).ToList()){var image = Image.GetInstance(new MemoryStream(comic.ImageBytes));var title = new Paragraph(comic.Index + ". " + comic.Title, titleFont);title.SetAlignment("Center");document.Add(title);document.Add(image);document.Add(new Phrase(comic.Description, altFont));document.Add(Chunk.NEXTPAGE);}document.Close();

Results

We got an application like this, which, in addition to exporting to PDF, allows you to browse comics conveniently enough :
Making a PDF book from a webcomic using C# using xkcdas an example
How the result looks like on the book can be seen in the first picture of the article.

What was left out of the article

Caching downloaded data between application runs (done using IsolatedStorage).
Support for other webcomics (For this purpose I provided IGrabber interface in advance and put some of the functionality into TaskParallelGrabber. While writing this article, I added grabbers for WhatTheDuck and Cyanide Happiness.)

Links

Application code (C#): Google Code
Working with PDF on .NET: iTextSharp
Comics : xkcd
UPD:
Thanks XHunter that you uploaded resulting PDF and compiled program !
UPD2:
I’ll just leave here a link to a good "reply" article that expands on the topic of pumping out comics using WCF: http://darren-brown.com/?p=37

You may also like