Full BackUp

Gents

when we see a full tracked backup solution ala GrandMa with the Hog III ??
This Year ??


Denis
  • If you take a look and various HA (high availibility) or cluster projects going on for Webservers and Databases, you can learn a lot about how such things work in other software projects.

    Since a full backup system (hot spare) should automatically take over within less than a second, there should be at least 2 show servers up and running at the same time and both send their output to all DPs (kind of a redunant showserver cluster). The DP should ignore all packets coming from showserver 2 unless it doesn't get any data from showserver 1 any more.
    All clients in the show (Hog3s, IPCs, Hog3PC) should connect to all Showservers and send the same data to both of them. The showservers sync each other by exchanging all "events" they got from the clients in case something got lost.

    In the case of a crash of showserver 1 (running on a hog3 for example), I could still control the show from the desk of the other processes are still running and I have had a second showserver active in the network.
    Or I could pull the network cable of the hog3 and go on with the show on another desk without any interruption in the show or dmx.

    The second problem is the backup DP: We'd need to DPs doing exactly the same. If one crashes, just plug the dmx cables into the backup DP. This should not be configured at the desk but at the DP itself (i.e. I do not want to duplicate my patch on another 4 outputs)

    For the clients: There should be a slave mode like a MSC slave, but without all the MIDI stuff. We have ethernet here and my backup console might not be standing right next to the main console for whatever reason and I do not see any reason why there should be an additional cable be needed. Maybe just add a few network MIDI-Ports? So a console can send on "Network MIDI-Port 1" and another one listens on "Network MIDI-Port 1" in MSC slave mode?

    And although it doesn't really fit in here: The virtual DPs in Hog3PC should be available in the network just like every other DP, especially for these full backup strategies. So it should be possible to have the backup showserver running on a laptop and have superwidgets (or any mixture of usb-dmx-widgets and DP2000s) as backup DPs...

    just doing some brainstorming ;) hope it helps!
    Jan
  • Jan,

    I'm not the architect here, but it seems to me like having both servers always sending traffic to all DPs would be a lot of unnecessary network traffic, especially since the DPs would be ignoring half of the traffic until they decided that the primary server failed. I think it would make more sense to have the DPs listen to all traffic and have the backup server make the determination that the primary has failed and then begin sending packets to the DPs.
  • You are safer when you make this decision on the receiving side. It will end up in a mess if showserver 2 thinks that showserver 1 is offline and starts sending data when showserver 1 is still active but there is only some packte loss/ sync problem inbeetween server 1 and 2.

    I never measured the amount of data the hog3 sends to the DPs, but one dmx-line is 250kbit per second and the hog3 does not send dmx but just commands to the DP since the DPs calculate fades themselves, so the amount of data should be less (but there is of course some overhead, so maybe its 250 to 300kbit per second again.). This would mean we could theoretically fit 300 Universes on a 100Mbit network, if we take some good switches and think about the network layout, even more will fit! 300 Universes would be 150 redunant ones. I think that is no problem. And it would make the whole configuration very stable and easy to handle (in terms of software development).

    But thats something your developers have to think about, I'm just sharing my personal experiences from the software projects I have been programming and I have been using... There are many theories and approaches, but I am always a big fan of very robust solutions. Better send too much data than nothing. Always assume that part of the network will fail because of a broken switch or something like that, assume some nodes will see each other and some other will only see half of them. And keep in mind that this always happens in a show while the operator is pushing a dozen flash-keys for ACLs and audience blinders. We do not want any "1 second delay for network reconfiguration" or something like that.

    Jan
  • Jan,

    Why do you think it would be safer having the DP make the decision? You may very well be right, I just imagine that the decision is being made based on the network traffic that a device sees and that the secondary server could make the determination just as well as the DP could.
  • Tom, I too come from a large scale highly-available software design background. I believe the problem Jan is addressing by having the DP make the decision instead of the server is to avoid what is commonly referred to as a "split brain" situation. The challenge with having the servers make the decision is how to avoid the situation where they both think they are the "master/primary" and send packets confusing the receivers. In many software architectures this problem is addressed by having the "master/primary" exclusively acquire some resource. (For example, in database systems where data corruption can be catastrophic this exclusive resource can be a SCSI device or some other I/O fencing technique.)

    I would not really be able to suggest best alternative for the WHIII without fully knowing the underlying architecture. Depending on some of the design already there, combining your suggestion of letting the server decide and Jan's let the client DPs decide if they receive packets from multiple servers. Sometimes in these cases we make an arbitrary decision on which to listen to -- it might be based on the server with the lowest address/serial #/boot time/etc.

    I have a great deal of large scale high available architecture design experience. (I almost always seek to have no single points of failure [SPOFs] and no manual intervention required to minimize any human-introduced delay.) If you would like to talk more about the subject of availability feel free to contact me offline.
    _________________
    Kevin Montagne
    Litkam, Inc
    713-397-1930
    kevin (at) litkam.com
  • Kevin,

    Thanks for the offer. Rest assured that among the developers here are some of the best architects I have ever worked with. I have no doubt they will be able to implement the proper solution for our needs.
  • Adding my 2cents....

    There needs to be some way of indicating to the human element that a transfer to the back-up has happened. Maybe even something as simple a a macro to execute when the server takes control, similar to, but seperate from the startup macro.

    This becomes especially important if you try to add live back-up options to systems that don't have an operator.
  • That's a great point, Paul.
    I've added your comments to bug #9212.

    Thanks.
  • Tom,

    I would like you to do the color of the frame of the window of the client and the server to the different one.

    As a result, even when two or more systems exist, which a server is easily distinguished.

    Thanks,
    Akito
  • It should also be considerd that many clients are run "unmaned" aka Cruise ships etc. so there would be necisary to have an auto failover of the DMX also, I was thinking you made an "link" cable that connected both DP´s (prim/sec) and the secondary DP DONT output any power before any failover aqures! and then if so happends they auto switch? I dont now if this is physical possible with the current layout of the DP´s.

    The 2nd option, would be to make an "DP Merge" that is network connected and both DPs goes in to ( call it and advances merger) that auto change the DPs if one goes down, it could be controlled from the server side!

    regarding the last option someone would say it would be costly, but I would say the clients that REALLY need this would pay for a real redudant network ( even I would)


    r:finger:

    Edit: Sorry for all the misspellings!
  • Akito,

    That's a great idea. I've added your comments to bug #9212.

    Jan,

    We will need to implement functionality to automatically switch to a backup DP if the primary fails, but the exact details will probably depend on the functional specification that isn't complete yet. It will need to support our existing hardware, so a DP-to-DP link cable probably isn't feasible. The DP Merge could be a possibility, but in addition to the costs of developing a new piece of hardward and the fact that you would need to purchase this for auto-switching, this would also introduce another point of possible failure into the system.
  • Software: Hog III PC 1.4.1
    Console: Dell Inspiron 6400
    Single DMX Widget
    -----------------------------------------------------------------
    Hello,
    My company has recently purchased a single Hog widget and a PC to program and run our show on tour. Coming from a theatrical background, I have to say that I love the ease of programming that the Hog provides. Touring with the Hog has allowed me to walk into any French or Italian festival and have our show up and running in an hour, rather than have to re-program it into whatever console they may provide.
    That being said, the lack of stability of the Hog PC continues to worry me.
    The console crashed twice in programming during our recent tour. Thankfully, in both cases the festival dimmers "held" the dmx output.
    Now that we've returned home, I am looking into backup solutions for Hog PC.

    Based on the suggestions of Cat West at High End, I purchased two Midi-Sports and have "linked" two PC's running the same version of software and the same show. This works great, however, because I have only one widget, I can't get both laptops to recognize the widget. Has anyone figured out how to do this? Is it possible to use a usb switch so you can "choose" which laptop has control over the widget?

    Alternately, I am looking into a small dmx device such as the MicroTech DMX or Doug Fleenor's Preset 10 that could be plugged in-line between the PC and the widget. I could then record several dmx snapshots that could be used as backup looks if the PC console fails. Any suggestions or recommendations would be greatly appreciated.

    Thanks all. :hogsign:
    Laura Bickford
    Lighting Supervisor
    Bill T. Jones/Arnie Zane Dance Company
    lbickford@billtjones.org
  • While I don't have anything to add to the implementation of the fail-over/backup system, I think it would also be good to consider what the actual DMX output is during fail-overs, saves, backups, etc.

    I would really like to have the option of setting a 'last look' or 'panic look' for the DP to catch if it goes into a reset. If possible, it would be nice to see the DP output a static 'panic look' or the last values output prior to the system rebooting.

    Example:

    If I'm in a rehearsal and I have to back up the show or reboot the console, the biggest hiccup for me is the stage lighting having a 'hiccup' or having to go dark. One way I avoid this is to record a panic look into a local DataLynx and activate that during system resets and backups.

    As the nodes become more distributed in larger systems, this becomes very difficult.

    It would be nice if I could tell the console/DPs to freeze a look during reboot or back-up.


    Bonus Prize: Have the console alert you with a pop-up (upon reboot) that there is a look frozen, with the options to cross-fade into the console's current state (with a fade time) or remain in the frozen look until a later time (if you still think there may be a problem at the console).

    Any thoughts?

    Phil
  • I like your "Bonus Prize" idea phil.

    FYI - You can set a DataLynx to "Auto Backup". (under "Recieve Mode" - "Auto-Backup" "ENTER") Which means it will automatically keep the last known good DMX as Panic Scene #1 and activate it if DMX is lost. This helps if your DP and DataLynx live in a rack in dimmer beach for example. You do need to hit the "Exit" key to get out of this mode though.

    Also my experience has been that if a DP loses communication with the console it will "freeze" it's last known good DMX. It will drop this though once it's loading processes have finished once communication is restored. If you are able to fire off cues before the loading processes finish, the DMX will "jump" to the new look. There is no good way to time this, but it beats going to black when you don't want to.:hogsign:
Related