Archive for the ‘C#’ Category

Microsoft Office (Excel) Interop Crash Course

23/02/2013

There comes one day in your life that you will be requested to execute some Excel logic on the server side, so that the user (for instance) doesn’t have to manually refresh some pivot tables. Unfortunately, Excel Interop (EI) is a living hell full of gotchas and bugs, thus let me give you a few hints.

Use only if you have to

When dealing with Excel documents (server side) you may have two group of tasks: data based and logic based. Data modifications are simple changes to the contents of the sheets. Whereas logic transformations require faking a functionality of Excel, like entering 1 + 1 should give us a final value 2 when entered into the cell.

Modifications of data are usually handled easily by free tools (such as EPPlus). Unfortunately, faking Excel logic is not that simple, thus EI becomes you a very powerful tool. It allows you to basically perform the same steps that the user could do manually (in an Excel window).

Accept the fact that it might not work

Microsoft has stated that they do not support interop services, so if it doesn’t work for you then … you have a problem. If it doesn’t for you (and you are sure that you made everything you could to make it work) then you can always resort to VBA macros. While this solution is often frowned upon, it’s much more reliable and hundred times easier to implement.

ALWAYS properly clean up your objects

You will be dealing with COM objects which is a framework that I am not very found of, alas it’s even worse when called from C#. If you do not properly clean all the COM objects you referenced, you will end up with a zombie EXCEL.exe process left even after your process has been finished. There might be even other bad things that happened, so it even more serious.

Repeat after me:

  1. I shall not keep multiple references to the same COM object as after cleanup all of them will be invalidated.
  2. I shall Close(), Quit() or use any other cleanup method that an object provides.
  3. I shall always use FinalReleaseComObject() on every COM object I’ve stopped using.
  4. I shall null my reference.
  5. I shall call Garbage Collector multiple times due to “funny” behavior of COM objects:
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();
    
  6. I am aware of exceptions and thus the cleanup will always be done in finally block or by using & IDisposable wrapper class.

This rule is very tedious to follow, and most of the guidelines that I’ll present stem from this one.

Let’s make Excel headless

As I explained using EI is like sending messages to Excel process. It won’t spawn a window, but you will be attacked by a horde of pop ups asking you whether “You really wish to overwrite file foo.xlsx>” etc. To deal with them, I set the following properties for my Application instance.

var app = new Interop.Excel.Application();
app.Interactive = false;
app.DisplayAlerts = false;

There might be some other flags you might have to clear/set, thus you have to experiment yourself.

Double dot = your biggest enemy

Take a look at the following code:

var refToC = refToA.b.c;
// Use refToC
// Close, release, null and gc refToA and refToC

Have we properly cleaned up our objects? No, reference to B is dangling! WTF, you ask? Unfortunately, the refToB (which is not explicitly defined) is created, and we haven’t released it. Because of that lovely behavior every “going deeper” via a reference requires a separate object.

var a = comObj.a;
var b = a.b;
var c = b.c;
c.Foo(); c.Bar();
Release(c);
Release(b);
Release(a);

Forget about Enumerator and foreach

foreach is just a wrapper over Enumerator thus I will just deal with the latter. So what is the problem? Internally, the Enumerator object holds a private reference to a COM object. Sadly, the Enumerator doesn’t provide any cleanup mechanisms (Dispose() and similar), thus you can never be sure when the reference to the COM will be dropped.

Thus, prepare to manually iterate every container using Count and for.

Indices are 1-based

As the last guideline an easy one: every time you call get_Item(), Item[i] and Item(i) (and i is an integer) remember that you index from 1, not 0. If you forget about it, a nice exception will be raised.

Advertisements

Forcing Entity Framework to run outside of transaction

04/01/2013

When using transactions (via TransactionScope) and Entity Framework you might run into a problem when you would like to execute some commands outside of enclosing transaction.

using (var t1 = new TransactionScope()) {
    // ...
    logUsingEF("Done foo"); // Will not be called if an exception is thrown below
    // ...
    t1.Complete();
}

Problem here is that if anything wrong happens (exception), your log won’t be written. A simple solution:

using (var t1 = new TransactionScope()) {
    // ...
    using (var t2 = new TransactionScope(TransactionScopeOption.Suppress)) {
        logUsingEF("Done foo"); // Will always run
        t2.Complete(); // Not required but keep for consistency sake
    }
    // ...
    t1.Complete();
}

ASP.NET dynamic controls & one gotcha you might have missed

24/11/2012

One of the most unintuitive and bristling with gotchas topic regarding ASP.NET site development is the creation of dynamic controls. Dynamic controls are these kinds of controls which are not static on the page. In other words, one page can contain different number and types of controls, depending on its state.

There are several rules to keep in mind while creating such controls:

  • Due to stateless nature of websites controls have to be recreated every postback/page request. Creating them only once will result in empty page after postback.
  • When recreating the controls you have to keep in mind to create the same ones that you have created when rendering last client request. Failure in doing so will cost you a nice exception of form “Invalid View State“.
  • Recreation of dynamically rendered controls should be done in LoadViewState() which you should override.
  • Creation of new set of controls requires deletion of old ones and creation of new set in Load() method during which you will probably consume various events telling you to change the state of the page.

There is however one gotcha that is not mentioned very often, or not emphasized enough. Whenever you are recreating your dynamic controls remember to assign the same IDs to the controls in order the events to work. If you are sure that you done all correctly (regarding dynamic controls) and for some reason the events are not handled, then you better check the IDs.

FileInfo Exists lies! Or not?

20/11/2012

I’ve recently wrote a code that basically looks like that:

FileInfo GetFile()
{
    var info = new FileInfo(path_to_not_yet_created_file);
    
    var someObj = new SomeObj();

    // ... Logic gathering/preparing various pieces of data ...

    someObj.SaveToFile(info);

    return info;
}

void foo()
{
    FileInfo file = GetFile();
    
    // Defensive programming ftw!
    Debug.Assert(file.Exists);
    
    // ... More code ...
}

Seems reasonable.

  1. Create FileInfo to some file to which you gonna save your result.
  2. Do some computations.
  3. Write the result to a new file pointed by FileInfo.
  4. Validate that the file was created using Exists property.

But it doesn’t work. Exists always returns false. The file is definitely there, process have adequate access rights, the path matches … even your dog can see the file. Madness!

After playing around for some time I learned one simple truth: FileInfo isn’t automatically refreshed and won’t reflect the current state of the file when querying its state. In Debug.Assert(file.Exists) Exists will always return false as at the point of creation (var info = new FileInfo(path_to_not_yet_created_file)) the file wasn’t existing.

I will stick with plain ol’ string for now, and remember not to use FileInfo so much. Hope that saves you some time.

No translation to SQL

26/10/2012

When you start to like LINQ (to SQL) very much, someday you may find yourself in a nasty situation when “LINQ cannot translate method to SQL” during … runtime. That’s sad but quite logical: LINQ maps the logic to equivalent SQL statement. If there is no equivalence, it won’t work. In other words:

var foo = my_context.my_table.Where(val => SuperComplexMethod(val));

won’t work.

The usual answer is to use Expression<Func> but it will only help if it is possible to do (like converting Math.Pow()).

Another option is to restructure your function and let it use the supported logic.

Sometimes, however, you are to lazy to do all of that just to call some arbitrary function. The solution? Just use the AsEnumerable(). It basically tells the LINQ: “Ok, you can’t do what I want using the underlying database, so leave it to me”. There might be some performance issues due to not being able to use the power of the database, but many time this is the easiest way to go.

var foo = my_context.my_table.AsEnumerable().Where(val => SuperComplexMethod(val));

ASP.NET FileUpload doesn’t work

08/10/2012

So you added a FileUpload control in order to receive files from your clients. The problem? HasFile() returns false every time you send a file.

Reason of this problem is described in HTML specification. Basically, if the forms sent by your page will contain file data, multipart/form-data should be used.

To use it in ASP.NET we just need to specify following line in our code:

Page.Form.EncType = "multipart/form-data"

I am not 100% sure but I haven’t heard about a rule that forces you to use the above code in load, init, prerender or any other place. If you know more about this, feel free to write a comment.

Mutex-free C# application deadlock

01/10/2012

We’ve all heard about & encountered deadlocks. Nasty buggers they are, won’t you agree? After several long-night fights against them you should be well equipped to deal with them. You remember not to call external (delegate) code in synchronized context, you carefully add various logs to monitor behavior of mutexes, but to your surprise there is still an observable deadlock!

First, let’s make sure that we deal with a deadlock, not some funky system behavior, external library bug or endless loop. To test threads behavior you can basically do two things: change their number or add Thread.Yield(). We will focus on the first option.

In order to change the number of actively running threads we use the ThreadPool method. ThreadPool is the default way of using threads so I can safely assume that you will directly or indirectly (Tasks) use them.

We changed the number of threads, and we can observe that indeed the deadlock appears so we eagerly start debugging our poor parallel program, but the problem is not there.

What you should always check when using thread pools is whether the synchronization logic has taken all available threads. Take a look at the following example (assume the ThreadPool can run 10 threads concurrently).

The problem should be visible now.

  1. #10 waits for empty slot.
  2. Old threads won’t finish as they are long-running.
  3. Short-lived thread #9 (or a chain of inter-dependent threads) is waiting on #10
  4. #10 starves to death locking whole application.

You might not encounter this problem every day, but be careful when:

  • you frequently use BeginInvoke() in your code,
  • you port your application to older system on which the default number of threads is lower than you assumed.

While thread pools provide us with efficient creation of threads, they bring their own issues. Remember to enable sufficiently high number of max threads in pool, or reduce the number of threads that have to be run concurrently, in order to avoid synchronization starvation.

Simple cascading DropDownLists in ASP.NET

12/08/2012

A popular requirement for a page is to have a cascade-type filter using DropDownLists. For example:

Country: [list of all countries you support]
City: [list of all cities with your shops in the SELECTED country]
Store: [all stores in the select city]

The obvious solution is to use:

Everything is well and good. The problem, however, is that the ControlParameter doesn’t work in chain (cascade). Using our previous example we have such control relationship: Country <- City <- Store; where <- denotes dependency. Assume that the user chooses the country “France”. What we expect, is that both the city and store will be updated accordingly. The result is as follows:

Country: France
City: Paris
Shop: Store A at 52 Wall Street (!)

All the solutions that I was able to found on the internet were neither short and all of them suggested use of AJAX. I don’t have anything against AJAX but I don’t see a reason to implement something that I don’t really need.

After playing around for a bit I came up with the following solution. As I understand the logic behind ControlParameter is that when the user selects (explicitly) the observed drop down list’s index, the observing control rebinds its data using the new parameter value for the sql query. The third (second in cascade) control won’t update as the second one hasn’t self-select new index – it changed the underlying data source (thus the selected item value)! In order to fix this, I propose to override the DataBound event of the observed control so that it explicitly kicks its observers to DataBind their values.

.html:

<asp:DropDownList id="Foo" OnDataBound="Foo_DataBound" />

.cs:

protected void Foo_DataBound(object sender, EventArgs e)
{
    m_observingControl.DataBind();
}

If you have a better solution, I am glad to hear it!

Events + Locks = Problems

14/07/2012

TLDR; Never raise events from synchronized context.

Let’s imagine that you are creating some middle layer between your forms and business layer (logic). In order not to allow races and data corruption you introduce synchronized context. Obviously you won’t be synchronizing on this. So this is our first implementation:

class ThreadSafe {
    private Object m_lock = new Object();

    public void Foo() { lock(m_lock) { ... } }
}

Everything seems OK, but being event-lovers as we are we want to raise them in order to tell other components that our state changed. From now on I will mostly add only new parts of the code to reduce the clutter.

class ThreadSafe {
    public event EventHandler DoneFoo;

    // As only the DoneFoo enclosing class can call it we want to allow subclasses to call their base class events.
    protected virtual void OnFoo(EventArgs e) {
        if (DoneFoo != null) DoneFoo(this, e);
    }
}

Now we can finally hook the event to our Foo() method.

class ThreadSafe {
    public void Foo() {
        lock(m_lock) { ... DoneFoo(EventArgs.Empty); }
    }
}

And … boom! You’ve got a deadlock, sir. Why? Imagine that you have an event listener class called ListenOnFoo that operates on GUI elements.

class ListenOnFoo {
    private ThreadSafe m_ts;

    // Register event handler
    public ListenOnFoo(ThreadSafe ts) { ts.DoneFoo += OnDoneFoo; m_ts = ts; }

    // Remember that we have to adhere to the EventHandler delegate declaration.
    private void OnDoneFoo(Object sender, EventArgs e) {
        DoInternalLogic();
    }

    // As we are dealing with GUI we have to make sure we call the logic via [Begin]Invoke() in order not to corrupt the GUI.
    private void DoInternalLogic() {
        if (this.InvokeRequired) {
            this.Invoke(new MethodInvoker(DoInternalLogic));
        }

        m_threadSafe.Foo(); // DEADLOCK!
    }
}

Obviously, DoInternalLogic introduces infinite loop problem, but let’s skip this for the moment. As you probably see, we are locking m_lock with the first thread and the second one (GUI). We have got a deadlock.

The problem with such kind of problems is they are very difficult to track and in case of complex GUI logic it could take us much time to find the issue. Because of that, it’s important not to make such mistakes in the first place.

First, the general rule of thumb is not to use Control.Invoke() and instead use its younger sister Control.BeginInvoke(). The difference between these two (as you can probably guess from the popular naming convention) is that the latter calls the delegate asynchronously, allowing the first thread (in our case) to finish execution and unlock the m_lock.

The issue with BeginInvoke() is that it won’t return the result right away (as it is non-blocking). If you happen to have invokable methods with non-void return (or when you do not want to proceed before they successfully modify the inner object state), then consider handling IAsyncResult that is returned from BeginInvoke().

I won’t discuss ways of dealing with asynchronous programs as this is a very broad topic (decently described on MSDN). What I will say for now, is that you should think about logic of your code and find a place where you can block (definitely not in GUI thread!) – while waiting on AsyncResult.

Second general rule of thumb is to keep synchronized regions as small and simple as possible. In our case, the simplest solution would be to move OnFoo() call from the lock.

class ThreadSafe {
    public void Foo() {
        lock(m_lock) { ... }
        DoneFoo(EventArgs.Empty); // Not in lock anymore
    }
}

That was easy, but I would rather go even further and disallow calling methods in locked regions. I know it sounds crazy but if you allow code like the following, you might have serious deadlock issue in future.

void Bar() {
    lock(m_lock) {
        Baz();
        Bay();
        Bax();
    }
}

You won’t know if any of these methods raises an event or invokes a delegate. Obviously with enough care and comments you can safely use methods inside synchronized regions but, please, KISS!