Quantcast
Channel: Ask the Core Team
Viewing all 101 articles
Browse latest View live

Building Windows Server Failover Cluster on Azure IAAS VM – Part 2 (Network and Creation)

$
0
0

Hello, cluster fans. In my previous blog, Part 1, I talked about how to work around the storage blocker in order to implement Windows Server Failover Cluster on Azure IAAS VM. Now let’s discuss another important part – Networking in Cluster on Azure.

Before that, you should know some basic concepts of Azure networking. Here are a few Azure terms we need use to setup the Cluster.

VIP (Virtual IP address): A public IP address belongs to the cloud service. It also serves as an Azure Load Balancer which tells how network traffic should be directed before being routed to the VM.

DIP (Dynamic IP address): An internal IP assigned by Microsoft Azure DHCP to the VM.

Internal Load Balancer: It is configured to port-forward or load-balance traffic inside a VNET or cloud service to different VMs.

Endpoint: It associates a VIP/DIP + port combination on a VM with a port on either the Azure Load Balancer for public-facing traffic or the Internal Load Balancer for traffic inside a VNET (or cloud service).

You can refer to this blog for more details about those terms for Azure network:

VIPs, DIPs and PIPs in Microsoft Azure
http://blogs.msdn.com/b/cloud_solution_architect/archive/2014/11/08/vips-dips-and-pips-in-microsoft-azure.aspx

OK, enough reading, Storage is ready and we know the basics of Azure network, can we start to building the Cluster? Yes!

Instead of using Failover Cluster Manager, the preferred method is to use the New-Cluster PowerShell cmdlet and specify a static IP during Cluster creation. When doing it this way, you can add all the nodes and use the proper IP Address from the get go and not have to use the extra steps through Failover Cluster Manager.

Take the above environment as example:

New-Cluster -Name DEMOCLUSTER -Node node1,node2 -StaticAddress 10.0.0.7

Note:The Static IP Address that you appoint to the CNO is not for network communication. The only purpose is to bring the CNO online due to the dependency request. Therefore, you cannot ping that IP, cannot resolve DNS name, and cannot use the CNO for management since its IP is an unusable IP.

If for some reason you do not want to use PowerShell or you used Failover Cluster Manager instead, there are additional steps that you must take.  The difference with FCM versus PowerShell is that you need create the Cluster with one node and add the other nodes as the next step. This is because the Cluster Name Object (CNO) cannot be online since it cannot acquire a unique IP Address from the Azure DHCP service. Instead, the IP Address assigned to the CNO is a duplicate address of node who owns CNO. That IP fails as a duplicate and can never be brought online. This eventually causes the Cluster to lose quorum because the nodes cannot properly connect to each other. To prevent the Cluster from losing quorum, you start with a one node Cluster. Let the CNO’s IP Address fail and then manually set up the IP address.

Example:

The CNO DEMOCLUSTER is offline because the IP Address it is dependent on is failed. 10.0.0.4 is the VM’s DIP, which is where the CNO’s IP duplicates from.

image

In order to fix this, we will need go into the properties of the IP Address resource and change the address to another address in the same subnet that is not currently in use, for example, 10.0.0.7.

To change the IP address, right mouse click on the resource, choose the Properties of the IP Address, and specify the new 10.0.0.7 address.

image

Once the address is changed, right mouse click on the Cluster Name resource and tell it to come online.

image

Now that these two resources are online, you can add more nodes to the Cluster.

Now you’ve successfully created a Cluster. Let’s add a highly available role inside it. For the demo purpose, I’ll use the File Server role as an example since this is the most common role that lot of us can understand.

Note:In a production environment, we do not recommend File Server Cluster in Azure because of cost and performance. Take this example as a proof of concept.

Different than Cluster on-premises, I recommend you to pause all other nodes and keep only one node up. This is to prevent the new File Server role from moving among the nodes since the file server’s VCO (Virtual Computer Object) will have a duplicated IP Address automatically assigned as the IP on the node who owns this VCO. This IP Address fails and causes the VCO not to come online on any node. This is a similar scenario as for CNO we just talked about previously.

Screenshots are more intuitive.

The VCO DEMOFS won’t come online because of the failed status of IP Address. This is expected because the dynamic IP address duplicates the IP of owner node.

image

Manually editing the IP to a static unused 10.0.0.8, in this example, now the whole resource group is online.

image

But remember, that IP Address is the same unusable IP address as the CNO’s IP. You can use it to bring the resource online but that is not a real IP for network communication. If this is a File Server, none of the VMs except the owner node of this VCO can access the File Share.  The way Azure networking works is that it will loop the traffic back to the node it was originated from.

Show time starts. We need to utilize the Load Balancer in Azure so this IP Address is able to communicate with other machines in order to achieving the client-server traffic.

Load Balancer is an Azure IP resource that can route network traffic to different Azure VMs. The IP can be a public facing VIP, or internal only, like a DIP. Each VM needs have the endpoint(s) so the Load Balancer knows where the traffic should go. In the endpoint, there are two kinds of ports. The first is a Regular port and is used for normal client-server communications. For example, port 445 is for SMB file sharing, port 80 is HTTP, port 1433 is for MSSQL, etc. Another kind of port is a Probe port. The default port number for this is 59999. Probe port’s job is to find out which is the active node that hosts the VCO in the Cluster. Load Balancer sends the probe pings over TCP port 59999 to every node in the cluster, by default, every 10 seconds. When you configure a role in Cluster on an Azure VM, you need to know out what port(s) the application uses because you will need to add the port(s) to the endpoint. Then, you add the probe port to the same endpoint. After that, you need update the parameter of VCO’s IP address to have that probe port. Finally, Load Balancer will do the similar port forward task and route the traffic to the VM who owns the VCO. All the above settings need to be completed using PowerShell as the blog was written.

Note: At the time of this blog (written and posted), Microsoft only supports one resource group in cluster on Azure as an Active/Passive model only. This is because the VCO’s IP can only use the Cloud Service IP address (VIP) or the IP address of the Internal Load Balancer. This limitation is still in effect although Azure now supports the creation of multiple VIP addresses in a given Cloud Service.

Here is the diagram for Internal Load Balancer (ILB) in a Cluster which can explain the above theory better:

image

The application in this Cluster is a File Server. That’s why we have port 445 and the IP for VCO (10.0.0.8) the same as the ILB. There are three steps to configure this:

Step 1: Add the ILB to the Azure cloud service.

Run the following PowerShell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$ServiceName = "demovm1-3va468p3" # the name of the cloud service that contains the VM nodes. Your cloud service name is unique. Use Azure portal to find out service name or use get-azurevm.

image

$ILBName = "DEMOILB" # newly chosen name for the new ILB

$SubnetName = "Subnet-1" # subnet name that the VMs use in the VNet

$ILBStaticIP = "10.0.0.8" # static IP address for the ILB in the subnet

# Add Azure ILB using the above variables.

Add-AzureInternalLoadBalancer -InternalLoadBalancerName $ILBName -SubnetName $SubnetName -ServiceName $ServiceName -StaticVNetIPAddress $ILBStaticIP

# Check the settings.

Get-AzureInternalLoadBalancer –servicename $ServiceName

image

Step 2: Configure the load balanced endpoint for each node using ILB.

Run the following PowerShell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$VMNodes = "DEMOVM1", “DEMOVM2" # cluster nodes’ names, separated by commas. Your nodes’ names will be different.

$EndpointName = "SMB" # newly chosen name of the endpoint

$EndpointPort = "445" # public port to use for the endpoint for SMB file sharing. If the cluster is used for other purpose, i.e., HTTP, the port number needs change to 80.

# Add endpoint with port 445 and probe port 59999 to each node. It will take a few minutes to complete. Please pay attention to ProbeIntervalInSeconds parameter. This tells how often the probe port detects which node is active.

ForEach ($node in $VMNodes)

{

Get-AzureVM -ServiceName $ServiceName -Name $node | Add-AzureEndpoint -Name $EndpointName -LBSetName "$EndpointName-LB" -Protocol tcp -LocalPort $EndpointPort -PublicPort $EndpointPort -ProbePort 59999 -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName $ILBName -DirectServerReturn $true | Update-AzureVM

}

# Check the settings.

ForEach ($node in $VMNodes)

{

Get-AzureVM –ServiceName $ServiceName –Name $node | Get-AzureEndpoint | where-object {$_.name -eq "smb"}

}

Step 3: Update the parameters of VCO’s IP address with Probe Port.

Run the following PowerShell commands inside one of the cluster nodes if you are using Windows Server 2008 R2.

# Define variables

$ClusterNetworkName = "Cluster Network 1"# the cluster network name (Use Get-ClusterNetwork or GUI to find the name)

$IPResourceName = “IP Address 10.0.0.0"# the IP Address resource name (Use get-clusterresource | where-object {$_.resourcetype -eq "IP Address"} or GUI to find the name)

$ILBIP = “10.0.0.8”# the IP Address of the Internal Load Balancer (ILB)

# Update cluster resource parameters of VCO’s IP address to work with ILB.

cluster res $IPResourceName /priv enabledhcp=0 overrideaddressmatch=1 address=$ILBIP probeport=59999  subnetmask=255.255.255.255

Run the following PowerShell commands inside one of the cluster nodes if you are using Windows Server 2012/2012 R2.

# Define variables

$ClusterNetworkName = "Cluster Network 1" # the cluster network name (Use Get-ClusterNetwork or GUI to find the name)

$IPResourceName = “IP Address 10.0.0.0" # the IP Address resource name (Use get-clusterresource | where-object {$_.resourcetype -eq "IP Address"} or GUI to find the name)

$ILBIP = “10.0.0.8” # the IP Address of the Internal Load Balancer (ILB)


$params = @{"Address"="$ILBIP";
          "ProbePort"="59999";
          "SubnetMask"="255.255.255.255";
          "Network"="$ClusterNetworkName";
          "OverrideAddressMatch"=1; 
          "EnableDhcp"=0}

# Update cluster resource parameters of VCO’s IP address to work with ILB

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple $params

You should see this window:

image

Take the IP Address resource offline and bring it online again. Start the clustered role.

Now you have an Internal Load Balancer working with the VCO’s IP. One last task you need do is with the Windows Firewall. You need to at least open port 59999 on all nodes for probe port detection; or turn the firewall off. Then you should be all set. It may take about 10 seconds to establish the connection to the VCO the first time or after you failover the resource group to another node because of the ProbeIntervalInSeconds we set up previously.

In this example, the VCO has an Internal IP of 10.0.0.8. If you want to make your VCO public-facing, you can use the Cloud Service’s IP Address (VIP). The steps are similar and easier because you can skip Step 1 since this VIP is already an Azure Load Balancer. You just need to add the endpoint with a regular port plus the probe port to each VM (Step 2). Then update the VCO’s IP in the Cluster (Step 3). Please be aware, your Clustered resource group will be exposed to the Internet since the VCO has a public IP. You may want to protect it by planning enhanced security methods.

Great! Now you’ve completed all the steps of building a Windows Server Failover Cluster on an Azure IAAS VM. It is a bit longer journey; however, you’ll find it useful and worthwhile. Please leave me comments if you have question.

Happy Clustering!

Mario Liu
Support Escalation Engineer
CSS Americas | WINDOWS | HIGH AVAILABILITY  


Windows Server Failover Cluster on Azure IAAS VM – Part 1 (Storage)

$
0
0

Hello, cluster fans. This is Mario Liu and I am a Support Escalation Engineer on the Windows High Availability team in Microsoft CSS Americas. I have a good news for you that starting in April 2015, Microsoft will support Windows Server Failover Cluster (WSFC) on Azure IAAS Virtual Machines. Here is the supportability announcement for Windows Server on Azure VMs:

Microsoft server software support for Microsoft Azure virtual machines
https://support.microsoft.com/en-us/kb/2721672

The Failover Cluster feature is part of that announcement. The above knowledge base is subject to change once more improvements for WSFC on Azure IAAS VMs are made. Please check the above link for the latest updates.

Today, I’d like to share the main differences when you deploy WSFC on-premises as compared to within Azure. First, the Azure VM operating system must be Windows Server 2008 R2, Windows Server 2012, or Windows Server 2012 R2.  Please note that both Windows Server 2008 R2 and 2012 both require this hotfix to be installed.

At a higher level, the Failover Cluster feature does not change inside the VM and is still a standard Server OS feature. The challenges are outside and relate to Storage and Network. In this blog, I will be discussing Storage.

The biggest challenge to implementing Failover Clustering in Azure is that Azure does not provide native shared block storage to VMs, which is different than on-premises – Fiber Channel SAN, SAS, or iSCSI. That limits SQL Server AlwaysOn Availability Groups (AG) as the primary use case scenario in Azure as SQL AG does not utilize shared storage. Instead, it leverages its own replication at the application layer to replicate the SQL data across the Azure IaaS VMs.

image

Until now, we have a few more options to work around the shared storage limitation; and that is how we can expand the scenarios beyond SQL AlwaysOn.

Option 1: Application-level replication for non-shared storage

Some applications leverage replication through their own means at the application layer.  SQL Server AlwaysOn Availability Groups uses this method.

Option 2: Volume-level replication for non-shared storage

In other words, 3rdparty storage replication.

image

A common 3rdparty solution is SIOS DataKeeper Cluster Edition. There are other solutions on the market, but this is just one example.  For more details, please check SIOS’s website:

DataKeeper Cluster Edition: Real-Time Replication of Windows Server Environments
http://us.sios.com/products/datakeeper-cluster/

Option 3: Leverage ExpressRoute for remote iSCSI Target shared block storage for file based storage from an Azure IaaS VMs

ExpressRoute is an Azure exclusive feature. It enables you to create dedicated private connections between Azure datacenters and infrastructure that’s on your premises. It has high throughput network connectivity to guarantee that the disk performance won’t be degraded.

One of the existing examples is NetApp Private Storage (NPS).  NPS exposes an iSCSI Target via ExpressRoute with Equinix to Azure IaaS VMs.

Availability on Demand - ASR with NetApp Private Storage
http://channel9.msdn.com/Blogs/Windows-Azure/Availability-on-Demand-ASR-with-NetApp-Private-Storage

image

For more details about ExpressRoute, please see

ExpressRoute
http://azure.microsoft.com/en-us/services/expressroute/

There will be more options to present “shared storage” to Failover Clusters as new scenarios present in the future. We’ll update this blog along with the KB once new announcements become available. As long as you fix the storage, you’ve built the foundation of the Cluster.

In my next blog, Part 2, I’ll go through the network part and the creation of a Cluster.

Stay tuned and enjoy the Clustering in Azure!

Mario Liu
Support Escalation Engineer
CSS Americas | WINDOWS | HIGH AVAILABILITY

Unable to add file shares in a Windows 2012 R2 Failover Cluster

$
0
0

My name is Chinmoy Joshi and I am a Support Escalation Engineer with the Windows Core team. I’m writing today to share information regarding an issue which I came across with multiple customers recently.

Consider a two node 2012 R2 Failover Cluster using shared disks to host a File Server role. To add shares to the File Server role, select the role and right-mouse click on it to get the Add File Share option. The Add File Share option is also available along the far right column. Upon doing this, you may receive an error “There were errors retrieving file sharesor the Add Share wizard gets stuck with,Unable to retrieve all data needed to run the wizard”.

image

image

When starting the add share wizard, it is going to try and enumerate all current shares on the node and across the Cluster. There can be multiple reasons why Failover Cluster Manager would throw these errors. We will be covering two of the known scenarios that can cause this.

Scenario 1:

Domain Users/Admins can be part of nested groups; meaning, they are a in a group that is part of another group. As part of the security, there is a token header being passed and that header can be bloated. Bloated headers can occur when the user/admin is part of nested group or may be migrated from some domain to a new domain carrying older SID’s. In our case, the domain user was a part of large number of active directory groups. There can be three ways to resolve this:

A)  Reduce the number of active directory groups the user is member of,
B)  Clean up the SID History, or
C)  Modify the Https service registry with the following registry values:

Caution: Please backup the registry before modifying in case you need to revert the changes.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters
"MaxFieldLength"=dword:0000fffe
"MaxRequestBytes"=dword:00010000

Note that these keys may not be there, so they will need to be created.

Here, HTTPS protocol uses Kerberos for authentication and the token header generated was too large throwing an error.  When this is the case, you will see the following event:

Log Name: Microsoft-Windows-FileServices-ServerManager-EventProvider/Operational
Source: Microsoft-Windows-FileServices-ServerManager-EventProvider
Event ID: 0
Level: Error
Description: Exception: Caught exception Microsoft.Management.Infrastructure.CimException: The WinRM client received an HTTP bad request status (400), but the remote service did not include any other information about the cause of the failure.
   at Microsoft.Management.Infrastructure.Internal.Operations.CimSyncEnumeratorBase`1.MoveNext()
   at Microsoft.FileServer.Management.Plugin.Services.FSCimSession.PerformQuery(String cimNamespace, String queryString)
   at Microsoft.FileServer.Management.Plugin.Services.ClusterEnumerator.RetrieveClusterConnections(ComputerName serverName, ClusterMemberTypes memberTypeToQuery)

References:

Problems with Kerberos authentication when a user belongs to many groups

http://blogs.technet.com/b/askds/archive/2012/09/12/maxtokensize-and-windows-8-and-windows-server-2012.aspx

http://blogs.technet.com/b/surama/archive/2009/04/06/kerberos-authentication-problem-with-active-directory.aspx

Scenario 2:

The second most popular reason for not able to get the file shares created is the WinRM policy being enabled for IPv4filter. When this is set, you will see this in the wizard:

image

To see if it is set on the Cluster nodes, go into the Local Security Policy from the Administrative Tools or Server Manager.  Once there, follow down the path to:

If you go into the Group Policy Editor, it would be located at:

Local Computer Policy
Computer Configuration
Administrative Templates
Windows Components
Windows Remote Management (WinRM)
WinRM Service
Allow remote server management through WinRM

If it is enabled, open that policy up and check to see if the box for IPv6 has an asterisks in it.

image

image

You will run into this error if only IPv4 is selected.  So to resolve this, you would need to either disable the policy or also add an asterisks for IPv6.  For the change to take effect, you will need to reboot the system.  After the reboot, go back into Group Policy Editor to see if it has been reverted back.  If it has, you will need to check your domain policies and have this done there.

Hope this helps you save time in resolving the issue, Good Luck!!

Chinmoy Joshi
Support Escalation Engineer

CROSS POST: How Shared VHDX Works on Server 2012 R2

$
0
0

In the not far back point in time, there was a blog done by Matthew Walker that we felt needed to also be on the AskCore site as well due to the nature and the popularity of the article.  So we are going to cross post it here.  Please keep in mind that the latest changes/updates will be in the original blog post.

CROSS POST: How Shared VHDX Works on Server 2012 R2
http://blogs.technet.com/b/askpfeplat/archive/2015/06/01/how-shared-vhdx-works-on-server-2012-r2.aspx

Hi, Matthew Walker here, I’m a Premier Field Engineer here at Microsoft specializing in Hyper-V and Failover Clustering. In this blog I wanted to address creating clusters of VMs using Microsoft Hyper-V with a focus on Shared VHDX files.

From the advent of Hyper-V we have supported creating clusters of VMs, however the means of adding in shared storage has changed. In Windows 2008/R2 we only supported using iSCSI for shared volumes, with Windows Server 2012 we added the capability to use virtual fibre channel, and SMB file shares depending on the workload, and finally in Windows Server 2012 R2 we added in shared VHDX files.

Shared Storage for Clustered VMs:

Windows Version

2008/R2

2012

2012 R2

ISCSI

Yes

Yes

Yes

Virtual Fibre Channel

No

Yes

Yes

SMB File Share

No

Yes

Yes

Shared VHDX

No

No

Yes

So this provides a great deal of flexibility when creating clusters that require shared storage with VMs. Not all clustered applications or services require shared storage so you should review the requirements of your app to see. Clusters that might require shared storage would be file server clusters, traditional clustered SQL instances, or Distributed Transaction Coordinator (MSDTC) instances. Now to decide which option to use. These solutions all work with live migration, but not with items like VM checkpoints, host based backups or VM replication, so pretty even there. If there is an existing infrastructure with iSCSI or FC SAN, then one of those two may make more sense as it works well with the existing processes for allocating storage to servers. SMB file shares work well but only for a few workloads as the application has to support data residing on a UNC path. This brings us to Shared VHDX.

Available Options:

Hyper-V Capability

Shared VHDX used

ISCSI Drives

Virtual Fibre Channel Drives

SMB Shares used in VM

Non-Shared VHD/X used

Host based backups

No

No

No

No

Yes

Snapshots/Checkpoints

No

No

No

No

Yes

VM Replication

No

No

No

No

Yes

Live Migration

Yes

Yes

Yes

Yes

Yes

Shared VHDX files are attached to the VMs via a virtual SCSI controller so show up in the OS as a shared SAS drive and can be shared with multiple VMs so you aren’t restricted to a two node cluster. There are some prerequisites to using them however.

Requirements for Shared VHDX:

2012 R2 Hyper-V hosts
Shared VHDX files must reside on Cluster Shared Volumes (CSV)
SMB 3.02

It may be possible to host a shared VHDX on a vender NAS if that appliance supports SMB 3.02 as defined in Windows Server 2012 R2, just because a NAS supports SMB 3.0 is not sufficient, check with the vendor to ensure they support the shared VHDX components and that you have the correct firmware revision to enable that capability. Information on the different versions of SMB and capabilities is documented in a blog by Jose Barreto that can be found here.

Adding Shared VHDX files to a VM is relatively easy, through the settings of the VM you simply have to select the check box under advanced features for the VHDX as below.

image

For SCVMM you have to deploy it as a service template and select to share the VHDX across the tier for that service template.

image

And of course you can use PowerShell to create and share the VHDX between VMs.

PS C:\> New-VHD -Path C:\ClusterStorage\Volume1\Shared.VHDX -Fixed -SizeBytes 30GB

PS C:\> Add-VMHardDiskDrive -VMName Node1 -Path C:\ClusterStorage\Volume1\Shared.VHDX -ShareVirtualDisk

PS C:\> Add-VMHardDiskDrive -VMName Node2 -Path C:\ClusterStorage\Volume1\Shared.VHDX -ShareVirtualDisk

Pretty easy right?

At this point you can setup the disks as normal in the VM and add them to your cluster, and install whatever application is to be clustered in your VMs and if you need to you can add additional nodes to scale out your cluster.

Now that things are all setup let’s look at the underlying architecture to see how we can get the best performance from our setup. Before we can get into the shared VHDX scenarios first we need to take a brief stint on how CSV works in general. If you want a more detailed explanation please refer to Vladimir Petter’s excellent blogs starting with this one.

image

This is a simplified diagram of the way we handle data flow for CSV, the main points here are to realize that access to the shared storage in this clustered environment is handled through the Cluster Shared Volume File System (CSVFS) filter driver and supporting components, this system handles how we access the underlying storage. Because CSV is a clustered file system we need to have this orchestration of file access. When possible I/O travels a direct path to the storage, but if that is not possible then we will redirect over the network to a coordinator node. The coordinator node shows up in the Failover Cluster manager as the owner for the CSV.

With Shared VHDX we also have to have orchestration of shared file access, to achieve this with Shared VHDX all I/O requests are centralized and funneled through the coordinator node for that CSV. This results in I/O from VMs on hosts other than the coordinator node being redirected to the coordinator. This is different from a traditional VHD or VHDX file that is not shared.

First let’s look at this from the perspective of a Hyper-V compute cluster using a Scale-Out File Server as our storage. For the following examples I have simplified things by bringing it down to two nodes and added in a nice big red line to show the data path from the VM that currently owns our clustered workload. For my example I making some assumptions, one is that the workload being clustered is configured in an Active/Passive configuration with a single shared VHDX file and we are only concerned with the data flow to that single file from one node or the other. For simplicity I have called the VMs Active and Passive just to indicate which one owns the Shared VHDX in the clustered VMs and is transferring I/O to the storage where the shared VHDX resides.

image

So we have Node 1 in our Hyper-V cluster accessing the Shared VHDX over SMB and connects to the coordinator node of the Scale-Out File Server cluster (SOFS), now let’s move the active workload.

image

So even when we move the active workload SMB and the CSVFS drivers will connect to the coordinator node in the SOFS cluster, so in this configuration our performance is going to be consistent. Ideally you should have high speed connects between your SOFS nodes and on the network connections used by the Hyper-V compute nodes to access the shares. 10 Gb NICs or even RDMA NICs. Some examples of RDMA NICs are Infiniband, iWarp and RDMA over Converged Ethernet (RoCE) NICs.

Now as we change things up a bit, we will move the compute onto the same servers that are hosting the storage

image

As you can see the access to the VHDX is sent through the CSVFS and SMB drivers to access the storage, and everything works like we expect as long as the active VM of the clustered VMs is on the same node as the coordination node of the underlying CSV, so now let’s look at how the data flows when the active VM is on a different node.

image

Here things take a different path than we might expect, since SMB and CSVFS are an integral part of ensuring proper orchestrated access to the Shared VHDX we send the data across the interconnects between the cluster nodes rather than straight down to storage, this can have a significant impact on your performance depending on how you have scaled your connections.

If the direct access to storage is a 4Gb fibre connect and the interconnect between nodes is a 1Gb connection there is going to be a serious difference in performance when the active workload is not on the same node that owns the CSV. This is exacerbated when we have 8Gb or 10Gb bandwidth to storage and the interconnects between nodes is only 1Gb. To help mitigate this behavior make sure to scale up your cluster interconnects to match using options such as 10 Gb NICs, SMB Multi-channel and/or RDMA capable devices that will improve your bandwidth between the nodes.

One final set of examples to address concerns about scenarios where you may have an application active on multiple clustered VMs that are accessing the same Shared VHDX file. First let’s go back to the separate compute and storage nodes.

image

And now to show how it goes with everything all together in the same servers.

image

So we can even implement a scale out file server or other multi-access scenarios using clustered VMs.

So the big takeaway here is more about understanding the architecture to know when you will see certain types of performance, and how to set proper expectations based on where and how we access the final storage repository for the shared VHDX. By moving some of the responsibility for handling access to the VHDX to SMB and CSVFS we get a more flexible architecture and more options, but without proper planning and an understanding of how it works there can be some significant differences in performance based on what type of separation there is between the compute side and the storage side. For the best performance ensure you have high speed and high bandwidth interconnects from the running VM all the way to the final storage by using 10 Gb or RDMA NICs, and try to take advantage of SMB Multi-Channel.

--- Matthew Walker

So what exactly is the CLIUSR account?

$
0
0

From time to time, people stumble across the local user account called CLIUSR and wonder what it is, while you really don’t need to worry about it; we will cover it for the curious in this blog.

The CLIUSR account is a local user account created by the Failover Clustering feature when it is installed on Windows Server 2012 or later. Well, that’s easy enough, but why is this account here? Taking a step back, let’s take a look at why we are using this account

In the Windows Server 2003 and previous versions of the Cluster Service, a domain user account was used to start the Cluster Service. This Cluster Service Account (CSA) was used for forming the Cluster, joining a node, registry replication, etc. Basically, any kind of authentication that was done between nodes used this user account as a common identity.

A number of support issues were encountered as domain administrators were pushing down group policies that stripped rights away from domain user accounts, not taking into consideration that some of those user accounts were used to run services. An example of this is the Logon as a Service right. If the Cluster Service account did not have this right, it was not going to be able to start the Cluster Service. If you were using the same account for multiple clusters, then you could incur production downtime across a number of critical systems. You also had to deal with password changes in Active Directory. If you changed the user accounts password in AD, you also needed to change passwords across all Clusters/nodes that use the account.

In Windows Server 2008, we learned and redesigned everything about the way we use start the service to make it more resilient, less error prone, and easier to manage. We started using the built-in Network Service to start the Cluster Service. Keep in mind that this is not the full blown account, just simply a reduced privileged set. Changing it to this reduced account was a solution for the group policy issues.

For authentication purposes, it was switched over to use the computer object associated with the Cluster Name known as the Cluster Name Object (CNO)for a common identity. Because this CNO is a machine account in the domain, it will automatically rotate the password as defined by the domain’s policy for you (which is every 30 days by default).

Great!! No more domain user account and its password changes we have to account for. No more trying to remember which Cluster was using which account. Yes!! Ah, not so fast my friend. While this solved some major pain, it did have some side effects.

Starting in Windows Server 2008 R2, admins started virtualizing everything in their datacenters, including domain controllers. Cluster Shared Volumes (CSV) was also introduced and became the standard for private cloud storage. Some admin’s completely embraced virtualization and virtualized every server in their datacenter, including to add domain controllers as a virtual machine to a Cluster and utilize the CSV drive to hold the VHD/VHDX of the VM.

This created a “chicken or the egg” scenario that many companies ended up in. In order to mount the CSV drive to get to the VMs, you had to contact a domain controller to get the CNO. However, you couldn’t start the domain controller because it was running on the CSV.

Having slow or unreliable connectivity to domain controllers also had effect on I/O to CSV drives. CSV does intra-cluster communication via SMB much like connecting to file shares. To connect with SMB, it needs to authenticate and in Windows Server 2008 R2, that involved authenticating the CNO with a remote domain controller.

For Windows Server 2012, we had to think about how we could take the best of both worlds and get around some of the issues we were seeing. We are still using the reduced Network Service privilege to start the Cluster Service, but now to remove all external dependencies we have a local (non-domain) user account for authentication between the nodes.

This local “user” account is not an administrative account or domain account. This account is automatically created for you on each of the nodes when you create a cluster or on a new node being added to the existing Cluster. This account is completely self-managed by the Cluster Service and handles automatically rotating the password for the account and synchronizing all the nodes for you. The CLIUSR password is rotated at the same frequency as the CNO, as defined by your domain policy (which is every 30 days by default). With it being a local account, it can authenticate and mount CSV so the virtualized domain controllers can start successfully. You can now virtualize all your domain controllers without fear. So we are increasing the resiliency and availability of the Cluster by reducing external dependencies.

This account is the CLIUSR account and is identified by its description.

clip_image002

One question that we get asked is if the CLIUSR account can be deleted. From a security standpoint, additional local accounts (not default) may get flagged during audits. If the network administrator isn’t sure what this account is for (i.e. they don’t read the description of “Failover Cluster Local Identity”), they may delete it without understanding the ramifications. For Failover Clustering to function properly, this account is necessary for authentication.

clip_image004

1. Joining node starts the Cluster Service and passes the CLIUSR credentials across.

2. All passes, so the node is allowed to join.

There is one extra safe guard we did to ensure continued success. If you accidentally delete the CLIUSR account, it will be recreated automatically when a node tries to join the Cluster.

Short story… the CLIUSR account is an internal component of the Cluster Service. It is completely self-managing and there is nothing you need to worry about regarding configuring and managing it. So leave it alone and let it do its job.

In Windows Server 2016, we will be taking this even a step further by leveraging certificates to allow Clusters to operate without any external dependencies of any kind. This allows you to create Clusters out of servers that reside in different domains or no domains at all. But that’s a blog for another day.

Hopefully, this answers any questions you have regarding the CLIUSR account and its use.

Enjoy,
John Marlin
Senior Support Escalation Engineer
Microsoft Enterprise Cloud Group

Errors Retrieving File Shares on Windows Failover Cluster

$
0
0

Hi AskCore, Chinmoy here again. In today’s blog, I would like to share one more scenario in continuation to my previous blog on Unable to add file shares in Windows 2012 R2 Failover Cluster.

This is about WinRm a setting that could lead to failure on adding file shares using Windows 2012/2012R2 Failover Cluster Manager.

Consider a two-node Windows Server 2012 R2 Failover Cluster using shared disks to host a File Server role. To access the shares, we click on the file shares and go to the shares tab at the bottom.  We see the error on the Information column next to the Roles:

“There were errors retrieving the file shares.”

clip_image002

There can be multiple reasons why Failover Cluster Manager would throw these errors. We will be covering one of the scenarios caused because of a WinRm configuration.

Scenario:

We cannot add new shares using Failover Cluster Manager, but can via PowerShell.  This may occur, if Winrm is not correctly configured.  WinRm is the Microsoft implementation of the WS-Management protocol and more can be found here.

If we have Winrm configuration issues, we may even fail to connect to remote servers or other Cluster nodes using Server Manager as shown below.

clip_image004

The equivalent PowerShell cmdlet reports the below error: -

PS X:\> Enter-PSSession Hostname
Enter-PSSession : Connecting to remote server hostname failed with the following error message : The client cannot connect to the destination specified in the request. Verify that the service on the destination is running and is accepting requests. Consult the logs and documentation for the WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service, run the following command on the destination to analyze and configure the WinRM service: "winrm quickconfig". For more information, see the about_Remote_Troubleshooting Help topic.

At line:1 char:1
+ Enter-PSSession hostname
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (hostname:String) [Enter-PSSession], PSRemotingTransportException
    + FullyQualifiedErrorId : CreateRemoteRunspaceFailed

The above is a sign of WinRm being unable to connect to the remote server.

Let’s dig more, and check the event logs:

Log Name: Microsoft-Windows-FileServices-ServerManager-EventProvider/Operational
Event ID: 0
Source: Microsoft-Windows-FileServices-ServerManager-EventProvider
Description: Exception: Caught exception Microsoft.Management.Infrastructure.CimException: The client cannot connect to the destination specified in the request. Verify that the service on the destination is running and is accepting requests. Consult the logs and documentation for the WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service, run the following command on the destination to analyze and configure the WinRM service: "winrm quickconfig".

The above event states that there is a communication issue with WinRm component. A quick way to configure WinRm is to run the command:

winrm quickconfig

This command starts the WinRM service and sets the service startup type to Auto-start. It also configures a listener for the ports that send and receive WS-Management protocol messages using either HTTP or HTTPS on any IP address. If it returns the following message:

WinRM service is already running on this machine.
WinRM is already set up for remote management on this computer.

Then try running the below command:

winrm id -r:ComputerName

You may receive the following message if WinRM is not able to communicate to the WinRS client. It also means we cannot resolve the destination using a loopback IP configured to "IP Listen List" for HTTP communications.

clip_image006

You can validate if the Loopback adapter IP is configured to "IP Listen List" for HTTP communication:

clip_image007

For this problem, run the below command:

clip_image008

On removing the loopback IP, we shall be able to add the file share successfully using the Failover Cluster console. Hope it helps to fix the issue. Good Luck!

Chinmoy Joshi
Support Escalation Engineer

How to convert Windows 10 Pro to Windows 10 Enterprise using ICD

$
0
0

Windows 10 makes life easier and brings a lot of benefits in the enterprise world. Converting Windows 10 without an ISO image or DVD is one such benefit. In this blog, we’ll take an example of upgrading Windows 10 Professional edition to Windows 10 Enterprise edition.

Let’s consider a scenario wherein you purchase a few computers. These computers come pre-installed with Windows 10 Pro and you would like to convert it to Windows 10 Enterprise.

The simpler way is to use changePK.exeon each machine using an elevated command prompt. The below window shows up and once we follow the prompts the machine gets converted to Enterprise with very little efforts.

image

The second option is to use the Windows Imaging and Configuration Designer (ICD). You can get the Windows ICD as part of the Windows 10 Assessment and Deployment Kit (ADK), which is available for download here.

With the help of ICD, admins can create a provisioning packages (.ppkg) which can help configuring WiFi networks, adding certificates, connecting to Active Directory, enrolling a device in Mobile Device Management aka MDM, and even updating Windows 10 Editions – all without the need to format the drive and reinstall Windows.

Install Windows ICD from the Windows 10 ADK

The Windows ICD relies on some other tools in the ADK kit, so you need to select the options to install the following:

  • Deployment Tools,
  • Windows Preinstallation Environment (Windows PE)
  • Imaging and Configuration Designer (ICD),

Before proceeding any further, let’s ensure you understand the prerequisite that you have the required licenses to install Windows 10 Enterprise.

The below steps require KMS license keys. You cannot use MAK license keys to convert. Since you are using KMS keys to do the convert, you need to have a KMS host capable of activating Windows 10 computers or you will need to change to a MAK key after the upgrade is complete.

Follow below steps to convert:

image

Click on File menu and select New Project.

It will ask to enter the following details. You may name the package as per your convenience and save it to a different location if you would like to.

image

Navigate to the path Runtime Settings –> EditionUpgrade –> UpgradeEditionWithProductKey

image

Once you enter the product key (Use the KMS client key for Windows 10 Enterprise available here.)

Click on File –> Save.

Click on Export –> Provisioning Package.

The above step will build the provisioning package.

image

In the screenshot below, if anyone wants to keep a password or a certificate, then he may set it up.

image

Select any location to save the provisioning package.

image

Once complete, it will give the summary of all the choices selected. Now, we just need to click the button BUILD.

image

image

Navigating to the above folder will open the location below and note the .ppkg file has been created which we will use to upgrade Windows 10 Professional.

image

We now need to connect the Windows 10 Professional machine to the above share and run the .ppkg file.

Here is the screenshot before I ran the package which shows that the machine is installed with Windows 10 Professional version:

image

Run the file “Upgrade_Win10Pro_To_Win10Ent.ppkg” to complete the upgrade process.

image

After double clicking the .ppkg file, we will get the warning or a prompt similar to UAC below:

image

Just select “Yes, add it” and proceed. After this we need to wait and the system is getting prepared for upgrade.

image

After the upgrade is complete, the machine will reboot and the OS is going to be Windows 10 Enterprise and we get the below screen as confirmation:

image

And this is where we confirm that the upgrade is successful:

image

The .ppkg file can be sent to the user through an email. The package can be on located on an internal share and run from there or copied to a USB drive and used on that drive.

There are few ways to automate the above process:

  1. Either use MDT by following adding the option of Install Applications under Add –> General.
  2. Or using SCCM by following steps in the blog below:

Apply a provisioning package from a SCCM Task Sequence
http://blogs.msdn.com/b/beanexpert/archive/2015/09/29/apply-a-provisioning-package-from-a-sccm-task-sequence.aspx

Happy Upgrading!

Thanks
Amrik Kalsi

Errors Retrieving File Shares on Windows Failover Cluster

$
0
0

Hi AskCore, Chinmoy here again. In today’s blog, I would like to share one more scenario in continuation to my previous blog on Unable to add file shares in Windows 2012 R2 Failover Cluster.

This is about WinRm a setting that could lead to failure on adding file shares using Windows 2012/2012R2 Failover Cluster Manager.

Consider a two-node Windows Server 2012 R2 Failover Cluster using shared disks to host a File Server role. To access the shares, we click on the file shares and go to the shares tab at the bottom.  We see the error on the Information column next to the Roles:

“There were errors retrieving the file shares.”

clip_image002

There can be multiple reasons why Failover Cluster Manager would throw these errors. We will be covering one of the scenarios caused because of a WinRm configuration.

Scenario 1:

We cannot add new shares using Failover Cluster Manager, but can via PowerShell.  This may occur, if Winrm is not correctly configured.  WinRm is the Microsoft implementation of the WS-Management protocol and more can be found here.

If we have Winrm configuration issues, we may even fail to connect to remote servers or other Cluster nodes using Server Manager as shown below.

clip_image004

The equivalent PowerShell cmdlet reports the below error: –

PS X:\> Enter-PSSession Hostname
Enter-PSSession : Connecting to remote server hostname failed with the following error message : The client cannot connect to the destination specified in the request. Verify that the service on the destination is running and is accepting requests. Consult the logs and documentation for the WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service, run the following command on the destination to analyze and configure the WinRM service: “winrm quickconfig”. For more information, see the about_Remote_Troubleshooting Help topic.

At line:1 char:1
+ Enter-PSSession hostname
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : InvalidArgument: (hostname:String) [Enter-PSSession], PSRemotingTransportException
+ FullyQualifiedErrorId : CreateRemoteRunspaceFailed

The above is a sign of WinRm being unable to connect to the remote server.

Let’s dig more, and check the event logs:

Log Name: Microsoft-Windows-FileServices-ServerManager-EventProvider/Operational
Event ID: 0
Source: Microsoft-Windows-FileServices-ServerManager-EventProvider
Description: Exception: Caught exception Microsoft.Management.Infrastructure.CimException: The client cannot connect to the destination specified in the request. Verify that the service on the destination is running and is accepting requests. Consult the logs and documentation for the WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service, run the following command on the destination to analyze and configure the WinRM service: “winrm quickconfig”.

The above event states that there is a communication issue with WinRm component. A quick way to configure WinRm is to run the command:

winrm quickconfig

This command starts the WinRM service and sets the service startup type to Auto-start. It also configures a listener for the ports that send and receive WS-Management protocol messages using either HTTP or HTTPS on any IP address. If it returns the following message:

WinRM service is already running on this machine.
WinRM is already set up for remote management on this computer.

Then try running the below command:

winrm id -r:ComputerName

You may receive the following message if WinRM is not able to communicate to the WinRS client. It also means we cannot resolve the destination using a loopback IP configured to “IP Listen List” for HTTP communications.

clip_image006

You can validate if the Loopback adapter IP is configured to “IP Listen List” for HTTP communication:

clip_image007

For this problem, run the below command:

clip_image008

On removing the loopback IP, we shall be able to add the file share successfully using the Failover Cluster console.

Scenario 2:

If we see below event:

Log Name:      Microsoft-Windows-FileServices-ServerManager-EventProvider/Operational
Source:        Microsoft-Windows-FileServices-ServerManager-EventProvider
Level:         Error
Description:
Exception: Caught exception Microsoft.Management.Infrastructure.CimException: The WinRM client received an HTTP status code of 504 from the remote WS-Management service.
   at Microsoft.Management.Infrastructure.Internal.Operations.CimSyncEnumeratorBase`1.MoveNext()
   at Microsoft.FileServer.Management.Plugin.Services.FSCimSession.PerformQuery(String cimNamespace, String queryString)
   at Microsoft.FileServer.Management.Plugin.Services.ClusterEnumerator.RetrieveClusterConnections(ComputerName serverName, ClusterMemberTypes memberTypeToQuery)

In the description of the event, you see that it is reporting a status code of 504.  What does this code mean?

Symbolic Name:       HTTP_STATUS_GATEWAY_TIMEOUT
Error description:       timed out waiting for gateway

It’s important to understand that we cannot have Failover Cluster Manager show all component’s information(i.e. RPC, WMI, etc) that gets utilized in the background while enumerating the file shares.  Therefore, we need to depend on the event logs.

Resolution:

Try the below command to validate if you have any proxy set:

C:\Windows\system32>netsh winhttp show proxy

The output may look similar to the below with an IP Address listed against the proxy server(s):

Current WinHTTP proxy settings:

    Proxy Server(s) :  136.105.214.3:3128
    Bypass List     :  <local>

If you have a value set for proxy server(s), run the below command to fix the issue:

netsh winttp reset proxy

Later, after running the above command to reset proxy, on doing “netsh winhttp show proxy” you shall see as given below:

C:\>netsh winhttp show proxy

Current WinHTTP proxy settings:

Direct access (no proxy server).

Hope it helps to fix the issue. Good Luck!

Chinmoy Joshi
Support Escalation Engineer


Troubleshooting Activation Issues

$
0
0

Today, Henry Chen and I are going to talk about troubleshooting some activation issues that we often run into.

To begin, here is an article which talks about what Microsoft Product Activation is and why it is important. Also, thisarticle explains KMS Activation.

Now, let’s jump into some common activation scenarios.

Scenario 1 – Security Processor Loader Driver

1. You get an error 0x80070426 when you try to activate a Windows 7 SP1 or a Windows Server 2008 R2 SP1 KMS client by running slmgr /ato.

clip_image002

When you try to start Software Protection services, you will see this popup error.

clip_image004

If you review the Application Event log, you will see the Event 1001.

Source:  Microsoft-Windows-Security-SPP
Event ID:  1001
Level:  Error
Description:  The Software Protection service failed to start. 0x80070002

To resolve this, make sure the Security Processor Loader Driver is started.

  1. Go to Device Manager.
  2. Click on View — > Show hidden devices
  3. Drop down Non-Plug and Play Drivers

clip_image006

clip_image008

In this case, it is disabled.  It could be either Automatic, Demand or System, but not started.

clip_image010

If it’s other than Boot, change the startup type to Bootand then start the driver.

You could also as shown below change it from the registry by browsing to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\spldrand change the start value to 0 and reboot.

clip_image012

If it fails to start, uninstall and re-install the driver and reboot your machine. In almost every case that we have seen, reinstalling the driver fixes the issue (i.e. you are able to start the driver).

Once it’s started, you will be able to start Software Protection Service and then activate Windows successfully.

Scenario 2 – Plug & Play

When trying to activate using slmgr /atoyou get the following error even when running the command elevated:

—————————
Windows Script Host
—————————

Activating Windows Server(R), ServerStandard edition (68531fb9-5511-4989-97be-d11a0f55633f) …Error: 0x80070005 Access denied: the requested action requires elevated privileges

—————————
OK  
—————————

And the below is shown when you try to display activation information using slmgr /dlv

—————————
Windows Script Host
—————————

Script: C:\Windows\system32\slmgr.vbs
Line:   1131
Char:   5
Error:  Permission denied
Code:   800A0046
Source: Microsoft VBScript runtime error

—————————
OK  
—————————

We do have an article

KB2008385

which talks about the cause of the issue. While missing permission is the root cause, we have seen instances where GPO is not enabled and the permission does not seem to be correct. We also have a

blog

written by our office team member on how to set the permissions using command line which we have found to be useful. We often combine both these articles to resolve issues.

First, to verify you have the right permissions, run the below command.

sc sdshow plugplay

Below is how the correct permissions should look like:

On Windows 7 SP1 or Windows Server 2008 R2 SP1

D:(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;SY)
(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)
(A;;CCLCSWLOCRRC;;;IU)
(A;;CCLCSWLOCRRC;;;SU) <——– This is the permission that seems to be missing in almost all instances.
S:(AU;FA;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;WD)

On a broken machine this is what we see.

D:(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;SY)
(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)
(A;;CCLCSWLOCRRC;;;IU)
S:(AU;FA;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;WD)

In order to set the correct permissions, run the following command as given in the blogfor Office:

sc sdset plugplay D:(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;BA)(A;;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;SY)(A;;CCLCSWLOCRRC;;;IU)(A;;CCLCSWLOCRRC;;;SU)S:(AU;FA;CCDCLCSWRPWPDTLOCRSDRCWDWO;;;WD)

Then run sc sdshow plugplayto make sure the permissions have been set. Once they are set, you will be able to activate Windows successfully.

There also have been instances where we have seen combination of 1 and 2, so you might have to check if spldr driver is started as well as permission on plugplayservice.

On Windows Server 2012 R2

When you run slmgr /atoyou get the below error on a machine that is domain joined. The other commands like slmgr /dlv works.

—————————
Windows Script Host
—————————

Activating Windows(R), ServerDatacenter edition (00091344-1ea4-4f37-b789-01750ba6988c) …

Error: 0x80070005 Access denied: the requested action requires elevated privileges

—————————
OK
—————————

This happens when SELFaccount is missing access permission on COM Security.

To add the permission back, type dcomcnfgon the RUN box and hit OK.

clip_image014

Under Component Services, expand Computers, right-click My Computer, and then click Properties.

clip_image016

Click the COM Security tab, and then click Edit Default under Access Permissions.

clip_image018

If SELF does not appear in the Group or user names list, click Add, type SELF, click Check Names, and then click OK.

clip_image020

Click SELF, and then click to select the following check boxes in the Allowcolumn:

· Local Access

· Remote Access

clip_image022

Then click OK on Access Permission and then OK on My Computer Properties.

Reboot the machine.

Scenario 3 – Read-only attribute

As in scenario 1, we may get error 0x80070426, where a user gets the following when trying to activate Windows 2008 R2 SP1 or Windows 7 SP1.

clip_image024

When trying to Start Software Protectionservice, you get an access is denied error message.

clip_image026

To get more details on the error, we open the Application Event Log which shows the following error:

Source: Microsoft-Windows-Security-SPP
Event ID: 1001
Level: Error
Description: The Software Protection service failed to start. 0xD0000022
6.1.7601.17514

To resolve this issue, browse to %windir%\system32 and make sure the following files have the file attribute Read-Onlyunchecked.

7B296FB0-376B-497e-B012-9C450E1B7327-5P-0.C7483456-A289-439d-8115-601632D005A0

7B296FB0-376B-497e-B012-9C450E1B7327-5P-1.C7483456-A289-439d-8115-601632D005A0

clip_image028

Software Protectionservice should start now.

Scenario 4 – Troubleshooting with Procmon

Here, we will give an idea on how to use Procmonto troubleshoot activation issue.

Windows Server 2012 R2

On a Windows Server 2012 R2 server, when we try to run any slmgrswitches, we get the error below.

clip_image030

When you try to start Software Protection service we get the following error.

clip_image032

Launch process monitor and stop the capture by click on the Captureicon.

clip_image034

Click on the Filtericon.

clip_image036

Choose Process Name, is, type sppsvc.exe (Software Protection Service) and click Add

clip_image038

We will add another Filter. So choose Result, contains, denied and click Add then OK.

clip_image040

Start the capture by clicking on the Capture icon as shown above and start the Software Protectionservice.

Once you get the error, we should see entries similar to what is shown below. In this case it’s a folder but could be a registry path too based on where we are missing permissions.

clip_image042

As per the result, looks like we have permission issue on C:\Windows\System32\spp\store\2.0. We could be missing permissions on any of the folders in the path.

Usually we start with the last folder so in this case it would be 2.0.

Comparing permissions on broken machine (Left) and working machine (Right) we can see that sppsvcis missing.

clip_image044

clip_image046

As you already guessed, the next step is to add sppsvcback and give it full control.

Click on Edit and from Locations choose your local machine name, then under Enter the object names to select type NT Service\sppsvc and click on Check Names then OK.

clip_image048

Make sure you give the service account Full control and click OK on the warning message and OKto close the Permissions box.

clip_image050

Now try starting the Software Protection service and it should start successfully and you will be able to successfully activate Windows.

We hope this blog was useful in troubleshooting some of your activations issues.

Saurabh Koshta
Henry Chen

Display Scaling in Windows 10

$
0
0

Hope everyone is having a good day.  Today, we have a guest among us.  Steve Wright is a Senior Program Manager in the Developer Platform Group.  He is authoring this blog regarding scaling in Windows 10 with how it works and how users will benefit from the work we have done with scaling.

Overview/Introduction

Windows 10 is an important release for Windows display scaling. It implements a unified approach to display scaling across all SKUs and devices aimed at these goals:

1) Our end users enjoy a mix of UWP and classic desktop applications on desktop SKUs which reliably provide content at a consistent size

2) Our developers can create UWP applications that deliver high quality reliably-sized content across all display devices and all Windows SKUs

Windows 10 also delivers desktop and mobile UI which looks polished and crisp across a wider range of display densities and viewing distances than we have ever before supported on Windows. Finally, Windows 10 drives support for high quality multi-monitor scaling for docking and projection into more of both our desktop and our mobile UI.

This article covers the basics of scaling in Windows 10, how it works, and how users will benefit from the work we have done. It wraps up by charting the course forward and show what we want to tackle in future updates to Win10.

Our vision for display scaling

For our end users, display scaling is a platform technology ensuring that content is presented at a consistent and optimal–yet easily adjustable–size for readability and comprehension on every device. For our developers, display scaling is an abstraction layer in the Windows presentation platform, making it easy for them to design and build apps, which look great on both high and low density displays.

Basic concepts and terms

We need a basic glossary of terms and some examples to show why scaling is important:

image

image

image

image

While these examples use phones for the sake of simplicity, the same concepts apply to wearables, tablets, laptops, desktop displays and even conference room wall-mounted TVs and projectors.

Dynamic scaling scenarios

Note that more than one display may be used on the same device—either all at the same time, or at different times in sequence. Scale factor and effective resolution are therefore dynamic concepts and depend on where content is displayed at a particular time.

Some everyday scenarios where this dynamic scaling can take place include projecting, docking, moving apps between different monitors, and using remote desktop to connect your local display to a remote device.

Who does the scaling, and how do they do it

Because Windows supports many different kind of applications and presentation platforms, scaling can occur in different places. This table illustrates the major scaling categories:

Scaling Class

Examples

Pros and cons

Dynamically scaling apps:

  • Apps that scale themselves on the fly no matter where they are presented

UWP apps

  • XAML and HTML frameworks and MRT handle this for the developer
  • DX-based UWPs need to do the work to scale themselves

Desktop UI built on XAML and HTML

  • Start menu
  • Notifications

Some classic desktop apps

  • file explorer
  • taskbar
  • cmd shell
  • IE (canvas, not UI chrome)

+ Crisp and right-sized content everywhere

+ Very easy to support for UWP apps (developer can rely entirely on framework support)

- Very hard to support for Win32 apps

“System scale factor” apps:

  • Apps that understand a single system-side scale factor (usually taken from the primary display at logon time)
  • When these apps are presented on a display that doesn’t match the system scale factor, Windows bitmap stretches them to the right size

A small number of top-tier classic desktop apps–about 50% of them, weighed by user “face time”:

  • Microsoft products: Office & Visual Studio
  • Browsers: Chrome & Firefox
  • Photoshop & Illustrator (support for some scale factors, not all)
  • Notepad++, Editpad Pro, etc.

WPF apps: all WPF apps support this

+ Crisp and right-sized on primary display

- Right-sized but somewhat blurry on other displays

- Moderately hard for Win32 developer

+ Comes for free in WPF apps

“Scaling unaware” apps:

  • Apps that only understand low DPI displays
  • On any other display, Windows bitmap stretches them to the right size

Majority of classic apps, weighed by app count

  • Some Windows tools (device manager)

+ Crisp and right-sized on low DPI displays

- Right-sized but somewhat blurry on any high DPI display

What this means for the user:

  1. UWPs and most Windows UI looks great on high DPI displays and in any multi-monitor scenarios where different display scale factors are in play
  2. A few important classic desktop apps (and all WPF apps) look great on high DPI primary displays but a little blurry on other secondary displays
  3. A large number of older classic desktop apps look blurry on high DPI displays.

What we have done in Windows 10

Now we can talk about the work done in Windows 10 to improve our support for both high DPI displays and for dynamic scaling scenarios. This works falls into several major areas:

  1. Unifying how content is scaled across all devices running Windows to ensure it consistently appears at the right size
  2. Extending the scaling system and important system UI to ensure we can handle very large (8K) and very dense (600 DPI) displays
  3. Adding scaling support to the mobile UX
  4. Improve Windows support for dynamic scaling: more OS and application content scales dynamically, and the user has greater control over each display’s scaling

Let’s take a closer look at each of these.

Unified and extended scaling system

In Windows 8.1 the set of supported scale factors was different for different kinds of content. Classic desktop applications scaled to 100%, 125%, 150%, 200% and 250%; Store apps scaled to 100%, 140% and 180%. As a result, when running different apps side by side in productivity scenarios, content could have inconsistent sizes in different apps. In addition, on very dense displays, the scaling systems “capped out” at different points, making some apps too small on them.

This chart shows the complexity and limits of the 8.1 scaling systems:

image

For Windows 10, we unified all scaling to a single set of scale factors for both UWP and classic applications on both the Desktop and Mobile SKU:

image

In Windows 8.1 all scaling topped out at 180% or 250%. For Windows 10 we knew that devices like 13.3” 4K laptops and 5.2” and 5.7” QHD phones would require even higher scale factors. Our unified scaling model for Windows 10 runs all the way to support 450%, which gives us enough headroom to support future displays like 4K 6” phones and 23” 8K desktop monitors.

As part of this effort, Windows 10 has polished the most commonly used desktop UI to look beautiful and clear even at 400% scaling.

Making the mobile shell scalable

We have also overhauled our Mobile SKU so that the mobile shell UI and UWP apps will scale to the Windows 10 scale factors. This work ensures that UWP apps run at the right size on phones and phablets as well as desktop displays, and that the mobile shell UI is presented at the right size on phones of different sizes, resolutions and pixel densities. This provides our users with a more consistent experience, and makes it easier to support new screen sizes and resolutions.

Improve Windows’ support for dynamic scaling

When we added dynamic scaling support in Windows 8.1, there was relatively little inbox UI that worked well with dynamic scaling, but in Windows 10, we have done work in many areas of the Windows UI to handle dynamic scaling.

UWP application dynamic scaling

As noted above, UWP HTML and XAML apps are designed to be dynamically scalable. As a result, these applications render crisply and with the right size content on all connected displays.

Windows “classic” desktop UI

Windows 10 makes large parts of the most important system UI scale properly in multi-monitor setups and other dynamic scaling scenarios so that it will be the right size on any display.

Start Experience

For example, the desktop Start and Cortana experiences are built on the XAML presentation platform, and because of that, they scale crisply to the right size on every display.

File Explorer

File Explorer—a classic desktop application built on the Win32 presentation platform—was not designed to dynamically rescale itself. In Windows 10, however, the file explorer app has been updated to support dynamic scaling.

Windows Taskbar

In Windows 8.1 the Windows taskbar had similar historical limitations. In Windows 10, the taskbar renders itself crisply at every scale factor and the correct size on all connected displays in all different scenarios. Secondary taskbar UI like the system clock, jumplists and context menus also scale to the right size in these scenarios.

Command shells et al.

We have done similar work elsewhere in commonly used parts of the desktop UI. For example, in Windows 10 “console windows” like the command prompt scale correctly on all monitors (provided you choose to use scalable fonts), and other secondary UI like the “run dialog” now scales correctly on each monitor.

Mobile shell and frameworks

In Windows 10 the mobile platform also supports dynamic scaling scenarios. In particular, with Continuum, the phone can run apps on a second attached display. In most cases external monitors have different scale factors than the phone’s display. UWP apps and shell UI can now scale to a different DPI on the secondary display applications so that Continuum works correctly at the right size on the Mobile SKU.

User scaling setting

Windows 8.1 users reported frustration with the user setting for scaling:

  1. There was a single slider for multiple monitors. The slider changed the scale factor for every connected monitor, making it impossible to reliably tweak the scale factor for only one of the displays.
  2. Users found it confusing that there were two scale settings, one for modern apps/UI and another for classic apps/UI, and that the two settings worked in significantly different ways.

In Windows 10, there is a single scale setting that applies to all applications, and the user applies it to a single display at a time. In the fall update, this setting has been streamlined to apply instantly.

What we didn’t get to

We are already seeing a number of common feedback issues that we’re working on for future releases of Windows. Here are some of the biggest ones we are tracking for future releases:

Unscaled content: Lync, desktop icons

Some applications (for example, Lync) choose to disable bitmap scaling for a variety of technical reasons, but do not take care of all their own scaling in dynamic scaling scenarios. As a result, these apps can display content that is too large or too small. We are working to improve these apps for a future release. For example, desktop icons are not per-monitor scaled in Windows 10, but in the fall update they are properly scaled in several common cases, such as docking, undocking, and projection scenarios.

Blurry bitmap-scaled content: Office apps

Although the UWP Office applications are fully per-monitor scaled in Windows 10, the classic desktop apps are “System scale factor apps”, as described in the table above. They generally look great on a high DPI device, but when used on secondary displays at different scale factors (including docking and projection), they may be somewhat blurry due to bitmap scaling. A number of popular desktop applications (Notepad++, Chrome, Firefox) have similar blurriness issues in these scenarios. We have ongoing work on improving migration tools for developers with these complex Win32 desktop applications.

Conclusion

Scaling is a complex problem for the open Windows ecosystem, which has to support devices ranging in size from roughly 4” to 84”, with densities ranging from 50DPI to 500DPI. In Windows 10 we took steps to consolidate and simplify our developer story for scaling and to improve the end-user visual experience. Stay tuned for future releases!

Users can't access the desktop and other resources through Quick Access in Windows 10

$
0
0

If you use copyprofile when customizing your Windows 10 profiles, you may encounter a scenario where pinned icons, such as Desktop under Quick Access for Windows 10 will not be accessible and users may encounter an issue similar to the following when attempting to access or save an item to that location.

“Location is not available. C:\Users\Administrator\Desktop is not accessible. Access is denied.”

Microsoft is aware of the issue and is investigating further. To work around this issue, or to fix the issue if user profiles are already deployed and experiencing this behavior, consider implementing any of the following following options depending on your deployment scenario and requirements.

1. Before the image is created- Unpin the "desktop" shortcut from Quick Access prior to sysprep/copyprofile. The "desktop" shortcut under This PC will not be available upon profile creation. All other customizations will be retained.

2. After the image is created and deployed to address new logons- After sysprep (e.g. while in OOBE or logged in), delete the following file from the default profile . This will remove any customizations made to the Quick Access list prior to sysprep/copyprofile.

a. %systemdrive%\users\default\appdata\roaming\microsoft\windows\Recent\AutomaticDestinations\f01b4d95cf55d32a.automaticDestinations-ms

3. After the image is created and deployed to address existing logons- Delete the file per-user so it's regenerated the next time Explorer is opened (again, losing any customizations):

a. %appdata%\microsoft\windows\Recent\AutomaticDestinations\f01b4d95cf55d32a.automaticDestinations-ms

4. After the image is created and deployed to address existing logons – Have the user unpin and re-pin the Desktop from Quick Access after logon.

For steps 2a and 3a, you can utilize group policy preferences to deploy this to users that might be already experiencing the issue in their environment.

2a: %systemdrive%\users\default\appdata\roaming\microsoft\windows\Recent\AutomaticDestinations\f01b4d95cf55d32a.automaticDestinations-ms

image

3a: %appdata%\microsoft\windows\Recent\AutomaticDestinations\f01b4d95cf55d32a.automaticDestinations-ms

image

Using the Windows 10 Compatibility Reports to understand upgrade issues

$
0
0

This blog discusses on how to obtain and review the Compatibility Reports to troubleshoot Windows 10 Upgrades.
On a PC that is eligible for the free upgrade offer, you can use the "Get Windows 10" app and choosing the "Check your upgrade status". The report will be displayed within the app showing issues in separate categories for devices and apps that would have potential issues.

If your PC/Tablet does not qualify for the Free Windows 10 Upgrade Offer, you would not be able to launch the app and get the Compatibility reports. Example: You are running the Windows 8.1 Enterprise/Windows 7 Enterprise editions or other scenarios.

1. Use the Windows 10 installation media that you intend to use and launch the Windows 10 Setup Program.

image

2. After checking for the most recent Dynamic updates for the Windows 10 installation, the installation will run a compatibility check in the background and you should see:

image

3. You can see the full list of potential compatibility issues in the files located in the folder:

C:\$Windows.~BT\Sources\Panther

The files would be named as CompatData_YEAR_MONTH_DATE_HOUR_MIN_SEC… so on. These CompatData files would provide information about the compatibility of Hardware / Software issues.

You can also get the setupact.log file from the C:\$Windows.~BT\Sources\Panther folder and use the below steps to get only the Compatibility information from the logs.

To view the details that included in the Setupact.Log file, you can copy the information to the Setupactdetails.txt file by using the Findstr command, and then view the details in the Setupactdetails.txt. To do this, follow these steps:

A. Open an elevated command prompt.

B. At the command prompt, type the following command, and then press ENTER:

findstr /c:"CONX" C:\$Windows.~BT\Sources\Panther\setupact.log  >"%userprofile%\Desktop\Setupactdetails.txt"

C. Open the Setupactdetails.txt file from your desktop to review the details.

Also, see:
Troubleshooting common Windows 10 upgrade errors and issues
https://support.microsoft.com/en-us/kb/3107983

Ram Malkani
Support Escalation Engineer
Windows Core Team

Failover Cluster Node Startup Order in Windows Server 2012 R2

$
0
0

In this blog, my Colleague, JP, and I would like to talk about how to start a Cluster without the need of the ForceQuorum (FQ) switch.  We have identified 3 different scenarios and how the Cluster behaves when you turn on the nodes in a certain order for Windows Server 2012 R2.  I first want to mention two articles that you should be familiar with.

How to Properly Shutdown a Failover Cluster or a Node
http://blogs.msdn.com/b/clustering/archive/2013/08/23/10443912.aspx

Microsoft’s recommendation is to configure the Cluster with a witness
https://technet.microsoft.com/en-us/library/dn265972.aspx#BKMK_Witness

Now, on to the scenarios.

Scenario 1: Cluster without a witness (Node majority)
Scenario 2: Cluster with a disk witness
Scenario 3: cluster with a file share witness

In the below scenario, we have tried starting the cluster with and without the witness.

Scenario 1: Cluster without a witness (Node Majority)
=====================================================

Let’s use the name of cluster as ‘CLUSTER’ and the name of the nodes as ‘A’ ‘B’ and ‘C’.  We have setup the witness type as Node Majority.  All
nodes have a weighted vote (meaning an assigned and a current vote).  The core Cluster Group and the other resources (two Cluster Shared Volumes) are on Node A.  We also have not defined any Preferred Owners of any group.  For simplistic sake, the Node ID of each is as follows.  You can get NodeID with the Powershell commandlet Get-ClusterNode.

Name ID State
==== == =====
A    1  Up
B    2  Up
C    3  Up

When we gracefully shut down Node A, all the resources on the node fail over to Node B, which is the next highest Node ID.  When we say a graceful shutdown, we are meaning shutting down the machine by clicking on the Start Menu or shutting down after applying patches.  All the resources are on Node B.  So the current votes would be:

Node A = 0
Node B = 1
Node C = 1

Now, let’s gracefully shut down Node B.  All the resources now failover to Node C.  As per the way dynamic quorum works with Windows Server 2012 R2, the Cluster sustains on one node as the “last man standing”.  So our current votes are:

Node A = 0
Node B = 0
Node C = 1

Now we want to gracefully shut down Node C as well.  Since all the nodes are down, the Cluster is down. 

When we start Node A, which was shut down first, the Cluster is not formed and we see the below in the Cluster log:

INFO  [NODE] Node 3: New join with n1: stage: ‘Attempt Initial Connection’ status (10060) reason: ‘Failed to connect to remote endpoint 192.168.1.101:~3343~’
DBG   [HM] Connection attempt to C failed with error (10060): Failed to connect to remote endpoint 192.168.1.101:~3343~.
INFO  [NODE] Node 3: New join with n2: stage: ‘Attempt Initial Connection’ status (10060) reason: ‘Failed to connect to remote endpoint 192.168.1.100:~3343~’
DBG   [HM] Connection attempt to C failed with error (10060): Failed to connect to remote endpoint 192.168.1.100:~3343~.
DBG   [VER] Calculated cluster versions: highest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00082580], lowest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00082580] with exclude node list: ()

When we start Node B, which was shut down second, the Cluster is not formed and below are the entries we see in the Cluster log:

INFO  [NODE] Node 1: New join with n2: stage: ‘Attempt Initial Connection’ status (10060) reason: ‘Failed to connect to remote endpoint 192.168.1.100:~3343~’
DBG   [HM] Connection attempt to C failed with error (10060): Failed to connect to remote endpoint 192.168.1.100:~3343~.
DBG   [VER] Calculated cluster versions: highest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00082580], lowest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00082580] with exclude node list: ()

Both nodes are trying to connect to Node C, which is shut down.  Since they are unable to connect to Node C, it does not form the Cluster.  Even though we have two nodes up (A and B) and configured for Node Majority, the Cluster is not formed.

WHY??  Well, let’s see.

We start Node C and now the Cluster is formed.

Again, WHY??  Why did this happen when the others wouldn’t??

This is because the last node (Node C) that was shutdown was holding the Cluster Group. So to answer your questions, the node that was shut down last  is the first node to be turned on.  When a node is shutdown, its vote is changed to 0 in the Cluster registry.  When a node goes to start the Cluster Service, it will check its vote.  If it is 0, it will only join a Cluster.  If it is 1, it will first try to join a Cluster and if it cannot connect to the Cluster to join, it will form the Cluster.

This is by design.

Shut down all the 3 nodes again in the same order. 

Node A first
Node B second
Node C last

Power up Node C and the Cluster is formed with the current votes as:

Node A = 0
Node B = 0
Node C = 1

Turn on Node B.  It joins and is given a vote.  Turn on Node A.  It joins and is given a vote. 

If you start any other node in the Cluster other than the node that was last to be shut down, the ForceQuorum (FQ) switch must be used to form the Cluster.  Once it is formed, you can start the other nodes in any order and they would join.

Scenario 2: Cluster with a disk witness
=======================================
We take the same 3 nodes and the same environment; but, add a disk witness to it.

Let’s observe the difference and the advantage of adding the witness.  To view the property of Dynamic witness weight, use the Powershell commandlet (Get-Cluster).WitnessDynamicWeight.

PS C:\> (Get-Cluster).WitnessDynamicWeight
0

NOTE:
The setting of 1 means it has a vote.  The setting of 0 means it does not have a vote.  Remember, we still go by the old ways of keeping the votes at an odd number

Initially, the Cluster Group and all the resources are on Node A with the other 2 nodes adding votes to it.  The Disk Witness also adds a vote dynamically when it is needed.

Node A = 1 vote = Node ID 1
Node B = 1 vote = Node ID 2
Node C = 1 vote = Node ID 3
Disk Witness = 0 vote

We gracefully shut down Node A.  All the resources and the Cluster Group move to Node B while Node A loses its vote.  Next, we gracefully shut down Node B and it loses its vote.  All the resources and Cluster Group move to Node C.  This leaves Node C as the “Last Man Standing” as in the previous scenario.  Gracefully shut down Node C as well and the Cluster is down.

This time, instead of powering on the last node that was shut down, i.e. Node C, power on Node B which was shut down in the order, second in the list.

THE CLUSTER IS UP !!!!!

This is because we have a witness configured and the Dynamic Quorum comes into play.  If you check the witness dynamic weight now, you will see that it has a vote.

PS C:\> (Get-Cluster).WitnessDynamicWeight
1

Because it has a vote, the Cluster forms.

Scenario 3: Cluster with a file share witness
=============================================
Again, we take the same 3 nodes, with the same environment and add a file share witness to it.

Presently, Node A is holding the Cluster Group and the other resources.  With the other 2 nodes and a file share witness adding the ability to dynamically add a vote to it if it needs it.

The votes are follows

Node A = 1 vote = Node ID 1
Node B = 1 vote = Node ID 2
Node C = 1 vote = Node ID 3
File Share Witness = 0 vote

We gracefully shut down Node A.  The resources move over to Node B and Node A loses a vote.  Because Node A lost the vote, the file share witness dynamically adjusted and gave itself a vote to keep it at an odd number.  Next, we gradefully shut down Node B. The resources move over to Node C and Node B also loses its vote.

Node C is now the “Last Man Standing” which is holding the Cluster Group and all other resources.  When we shut down Node C, the Cluster shuts down.

Let’s take a look back at the 2nd scenario where we could turn on any node and the Cluster would form.  All the resources come online and we had a disk witness in place.  In the case of a file share witness, this does not happen.

If we turn on Node A, which was shut down first, the Cluster would not form even though we have a file share witness.  We need to revert back to turning on the node that was shut down last, i.e. Node C (the “Last Man Standing”), to automatically form the Cluster. 

So what is the difference?  We have a witness configured….  This is because the file share witness does not hold a copy of the Cluster Database.

So why are you doing it this way? 

To answer this, we have to go back in time to the way the Cluster starts and what database the Cluster uses when a form takes place.

In Windows 2003 and below, we had the quorum drive.  The quorum drive always had the latest copy of the database.  The database holds all configurations, resources, etc for the Cluster.  It also took care of replicating any changes to all nodes so they would have the up to date information.  So when the Cluster formed, it would download the copy on the quorum drive and then start.  This actually wasn’t the best way of doing things as there is really only one copy and if it goes down, everything goes down.

In Windows 2008, this changed. Now, any of the nodes or the disk witness would have the latest copy.  We track this with a “paxos” tag.  When a change is made on a node (add resource, delete resource, node join, etc), that nodes Paxos tag is updated.  It will then send out a message to all other nodes (and disk witness if available) to update its database.  This way, everything is current. 

When you start a node in the Cluster to form the Cluster, it is going to compare it’s paxos with the one on the witness disk.  Whichever is later is the direction in which the Cluster database is used.  If the paxos is later on the disk witness, then it downloads to the node the latest copy and uses it.  If the local node is later, it uploads it to the disk witness and runs with it.

We do things in this fashion so that you will not lose any configuration.  For example, you have a 7 node Hyper-V Cluster with 600 virtual machines running.  Node 6 is powered down, for whatever reason, and is down for a while.  In the meantime, you add an additional 200 virtual machines.  All nodes and a disk witness knows about this.  Say that the rack or datacenter the Cluster is in loses power.  Power is restored and Node6 gets powered up first.  If there is a disk witness, it is going to have a copy of the Cluster database with all 800 virtual machines and this node that has been down for so long will have them.  If you had a file share witness (or no witness) that does not contain the Cluster database, you would lose the 200 and have to reconfigure them.

The ForceQuorum (FQ) switch will override this and starts with whatever Cluster database (and configurations) that are on the node, irregardless of paxos tags numbers.  When you use this, it makes that node’s Cluster database the “golden” copy and replicates it to all other nodes (and disk witness) as they come up.  So be cautious when using this switch.  In the above example, you start up Node 6, you lost the 200 virtual machines and will need to recreate them in the Cluster.

As a side note, Windows Server 2016 Failover Cluster follows this same design.  If you haven’t had a chance to test it out and see all the new features, come on aboard and try it out.

https://www.microsoft.com/en-us/server-cloud/products/windows-server-2016/default.aspx

Regards,
Santosh Bathija and S Jayaprakash
Microsoft India GTSC

New Guided Walkthrough for troubleshooting problems relating to Event ID 1135 in a Failover Clustering environment

$
0
0

I wanted to post about a new walkthrough that we have to help in troubleshooting an Event 1135 on a Failover Cluster. 

As a bit of a background, Failover Clustering sends a heartbeat from and to each node of a Cluster to determine its health and if it responding.  If it does not respond in certain time period, it is considered down and will be removed from Cluster membership.  In the System Event Log of the remaining nodes, an Event 1135 will be triggered stating that the non-responding node was removed.

There is now a guided walkthrough to step you through troubleshooting and aiding in determining the cause.  The walkthrough will cover a number of things including Cluster networks, Antivirus, etc.  Check it out and see what you think.  Try it out the next time to see if it helps.

Troubleshooting cluster issue with event ID 1135
http://support.microsoft.com/kb/3036113

Windows 10 Deployment Links

$
0
0

Hi, my name is Scott McArthur and I am the Supportability Program Manager for Commercial Surface and Windows products.  A common question we get from customers is “Can you direct me to information regarding what I should be thinking about when deploying Windows 10.”  All this information is available online but it helps sometimes if someone organizes all the links.  The areas below are based on what I have seen as the most common questions and topics of discussion with Windows 10.  I hope this helps with your planning and deployment of Windows 10.

Windows 10 General Information

Windows 10 release info:  https://technet.microsoft.com/en-us/windows/release-info

Windows 10 Update history:

http://windows.microsoft.com/en-us/windows-10/update-history-windows-10

Windows 10 Servicing or updating

Windows 10 Servicing options:  https://technet.microsoft.com/library/mt574263(v=vs.85).aspx

Windows 10 servicing options for updates and upgrades:  https://technet.microsoft.com/en-us/library/mt598226(v=vs.85).aspx

Windows 10 Privacy and Telemetry

Windows 10 privacy:  http://windows.microsoft.com/en-us/windows-10/windows-privacy-faq

Configure Windows 10 Telemetry settings:  https://technet.microsoft.com/library/mt577208(v=vs.85).aspx

Windows 10 Deployment

Windows 10 unattend.xml reference:  https://msdn.microsoft.com/en-us/library/windows/hardware/dn923277.aspx

Microsoft Deployment Toolkit 2013 Update 2:  http://blogs.technet.com/b/msdeployment/archive/2015/12/22/mdt-2013-update-2-now-available.aspx

Window 10 Group policy settings reference:  https://www.microsoft.com/en-us/download/details.aspx?id=25250

Windows 10 deployment using Configuration Manager: https://blogs.technet.microsoft.com/configmgrteam/2015/12/08/now-generally-available-system-center-configuration-manager-and-endpoint-protection-version-1511/

Removing Windows 10 inbox applications in MDT task sequence:  http://blogs.technet.com/b/mniehaus/archive/2015/11/11/removing-windows-10-in-box-apps-during-a-task-sequence.aspx

Remove apps such as Candy Crush, twitter, etc..:  http://blogs.technet.com/b/mniehaus/archive/2015/11/23/seeing-extra-apps-turn-them-off.aspx

.Net Framework 3.5 deployment considerations:  https://msdn.microsoft.com/en-us/library/windows/hardware/dn898590(v=vs.85).aspx

.NET Framework 3.5 deployment considerations with MDT:  http://blogs.technet.com/b/mniehaus/archive/2015/09/01/adding-features-including-net-3-5-to-windows-10.aspx

Microsoft Edge customization and configuration:   https://technet.microsoft.com/en-us/library/mt270205.aspx

Windows 10 Activation

Windows 10 volume activation tips:  https://blogs.technet.microsoft.com/askcore/2015/09/15/windows-10-volume-activation-tips/

Windows 10 KMS Host Update for Windows 10:  https://support.microsoft.com/en-us/kb/3058168

Customizing the Start Menu or TaskBar in Windows 10

Windows 10 Start Layout Customization Deployment Guys Blog:  http://blogs.technet.com/b/deploymentguys/archive/2016/03/07/windows-10-start-layout-customization.aspx

Manage Windows 10 start layout options:  https://technet.microsoft.com/en-us/library/mt484194(v=vs.85).aspx

Customize Start Menu using group policy:  https://technet.microsoft.com/en-us/library/mt431718(v=vs.85).aspx

Customize Start Menu using export/import: https://technet.microsoft.com/en-us/library/mt592638(v=vs.85).aspx

Customizing ICD or provisioning packages:  https://technet.microsoft.com/en-us/library/mt484193(v=vs.85).aspx

Changes to group policy settings for start menu in Windows 10:  https://technet.microsoft.com/en-us/library/mt484191(v=vs.85).aspx

Customizing TaskBarLinks:  https://msdn.microsoft.com/en-us/library/windows/hardware/dn923245.aspx

Windows 10 Other

Locking down Windows 10:  https://technet.microsoft.com/en-us/library/mt592641(v=vs.85).aspx

Managing Cortana in Enterprise:  https://technet.microsoft.com/en-us/library/mt637066(v=vs.85).aspx

Intro to Configuration Service Providers(CSP):  https://technet.microsoft.com/en-us/library/mt683468(v=vs.85).aspx

Windows 10 Display Scalinghttps://blogs.technet.microsoft.com/askcore/2015/12/08/display-scaling-in-windows-10/

Windows Store for Business

Windows Store for Business:  https://technet.microsoft.com/en-us/library/mt606951(v=vs.85).aspx

Using Windows Store for Business with MDT 2013: http://blogs.technet.com/b/mniehaus/archive/2016/01/11/using-the-windows-store-for-business-with-mdt-2013.aspx

 

Windows Update for Business

Windows Update for Business:  https://technet.microsoft.com/en-us/library/mt622730(v=vs.85).aspx

Videos

Windows 10: Deploying and Staying Up To Date [WIN326]:  https://msftignite.com.au/sessions/session-details/1706/windows-10-deploying-and-staying-up-to-date-win326

Using the Windows Store for Business: New Capabilities for Managing Apps in the Enterprise [WIN335]:  https://msftignite.com.au/sessions/session-details/1707/using-the-windows-store-for-business-new-capabilities-for-managing-apps-in-the-enterprise-win335

Upgrading to Windows 10: In Depth:  https://channel9.msdn.com/events/Ignite/2015/BRK3307

Preparing Your Enterprise for Windows 10 as a Service:  https://mva.microsoft.com/en-US/training-courses/preparing-your-enterprise-for-windows-10-as-a-service-11813?l=uUf1tqtQB_8505094681

Windows 10 for Mobile Devices: Provisioning Is Not Imaging:  https://channel9.msdn.com/Events/Ignite/2015/BRK3301

 

Scott McArthur
Senior Supportability Program Manager


Behavior of Dynamic Witness on Windows Server 2012 R2 Failover Clustering

$
0
0

With Failover Clustering on Windows Server 2012 R2, we introduced the concept of Dynamic Witness and enhanced how we handle a tie breaker when the nodes are in a 50% split:

See:

Today, I would like to explain how we handle the scenario where you are left with one node and the witness as the only votes on your Windows Server 2012 R2 Cluster.

Let’s assume that we have a 4 Node cluster with dynamic quorum enabled.

All Nodes Up

To see the status of dynamic quorum and the witness’ dynamic weight, you can use this PowerShell command:

Get-Cluster Command 1

 

 

 

Now, let’s use PowerShell to look at the weights of the nodes:

Get-ClusterNode Command 1

 

 

 

 

 

Now, let’s take down the nodes one by one until we have just one of the nodes and the witness standing. I turned off Node 4, followed by Node 3, and finally Node 2:

One Node Up

The cluster will continue to remain functional thanks to dynamic quorum, assuming that all of the resources can run on the single node.

Let’s look at the node weights and dynamic witness weight now:

Get-Cluster Command-2

 

 

 

 

 

 

 

Let’s take this further and assume that for some reason, the witness also sees a failure and you see the event ID 1069 for the failure of the witness:

Log Name: System
Source: Microsoft-Windows-FailoverCluster
Date: Date
Event ID: 1069
Level: Error
User: AUTHORITY\SYSTEM
Computer: servername
Description:
Cluster resource ‘Cluster Disk x’ in clustered service or application ‘Cluster Group’ failed.

One Node Up Witness Down

We really do not expect that this would happen on a production cluster where nodes go offline until there is one left and the witness also suffers a failure. Unfortunately, in this scenario, the cluster will not continue running and the cluster service will terminate because we can no longer achieve quorum.  We will not dynamically adjust the votes below three in a multi-node cluster with a witness, so that means we need two votes active to continue functioning.

When you configure a cluster with a witness, we want to ensure that the cluster is able to recover from a partitioned scenario. The philosophy is that two replicas of the cluster configuration are better than one. If we adjusted the quorum weight after we suffered the loss of Node 2 in our scenario above (when we had two nodes and the witness), then your data would be subject to loss with a single failure. This is intentional, we are keeping two copies of the cluster configuration and now either copy can start the cluster back up. You have a much better chance of surviving and recovering from a loss.

That’s about the little secret to keep your cluster fully functional.

Until next time!

Ram Malkani
Technical Advisor
Windows Core Team

Customizing the recovery partition after upgrading the OS from Windows 8.1 to Windows 10

$
0
0

 

Hi Everyone, my name is Suganya and I am from the Windows Devices and Deployment Team. I would like to discuss one of the common issues customers face today; “Customizing the recovery partition after upgrading from Windows 8.1 to Windows 10”.

What is Windows RE and why is it used?

The Windows Recovery Environment (Windows RE) is a recovery environment that can repair common causes of unbootable operating systems. Windows RE is based on Windows Preinstallation Environment (Windows PE), and can be customized with additional drivers, languages, Windows PE Optional Components, and other troubleshooting and diagnostic tools. By default, Windows RE is preloaded into the Windows 8.1 and Windows Server 2012 R2 installations. (For more information please refer to the following article:

Windows Recovery Environment (Windows RE) Overview
https://technet.microsoft.com/en-in/library/hh825173.aspx

Consider the following scenario:  You are planning an upgrade from Windows 8.1 to Windows 10.  Before upgrading the OS, you see the following partitions in the diskpart:

  • System reserved partition (350 MB)
  • OS partition (126 GB)
  • Recovery partition (300 MB)

image

After the upgrade you see the following partitions:

  • Recovery partition from the old OS (300 MB)
  • System reserved partition (350 MB)
  • OS partition (125GB)
  • New Recovery partition (450MB)

clip_image003

Now we have two recovery partitions, but we would like to only have one and customize the partition based on our requirements.

Delete both of the current recovery partitions on the drive and follow these steps to create a new recovery partition and customize it.

  1. Open the command prompt with admin privileges and run the following commands:
    1. Type diskpart
    2. Sel disk 0
    3. create partition primary size=450
    4. format quick fs=ntfs label=”Recovery tools”
    5. assign letter=”R” (This is assuming that the drive letter R is not already in use)
  2. Create the folders “Recovery” and “WindowsRE” on the R:\ drive.

 

Use the Windows 10 ISO to copy the winre.wim to a local drive using the following command:

  • dism /mount-wim /wimfile:”D:\sources\install.wim” /index:1 /mountdir:C:\test\ /readonly

You will now have the winre.wim file in the following location: “C:\Test\Windows\System32\Recovery”

  • Copy the winre.wim to the c:\perflogs folder
  • Unmount the image with this command:  dism /unmount-wim /mountdir:c:\test /discard

Note: This is an example as drive letters may differ in your environment. Verify drive letters in WinRE with Diskpart.

  • Copy the Winre.wim to the “R:\Recovery\WindowsRE” folder.

Now, to configure the Windows® Recovery Environment, run the following command.

  • reagentc /setreimage /path R:\Recovery\WindowsRE

Now if you run the command “reagentc /info” from a command prompt it shows the Status as Disabled.  When you enable it using the command “reagentc /enable”, the Status will still show as Disabled.

clip_image005

This can happen if the Windows Boot Loader is not updated with the correct device information.  You will have to edit the recovery.xml which is located in C:\WINDOWS\SYSTEM32\RECOVERY as it contains the older WinRE and OS image configurations.  You will have to give everyone full control for the XML file before copying the following lines to the file.

Reagent.xml should be edited to reflect these changes:

<?xml version=’1.0′ encoding=’utf-8’?>

<WindowsRE version=”2.0″>

<WinreBCD id=””/>

<WinreLocation path=”” id=”0″ offset=”0″/>

<ImageLocation path=”” id=”0″ offset=”0″/>

<PBRImageLocation path=”” id=”0″ offset=”0″ index=”0″/>

<PBRCustomImageLocation path=”” id=”0″ offset=”0″ index=”0″/>

<InstallState state=”0″/>

<OsInstallAvailable state=”0″/>

<CustomImageAvailable state=”0″/>

<WinREStaged state=”0″/>

<ScheduledOperation state=”4″/>

<OperationParam path=””/>

<OsBuildVersion path=””/>

<OemTool state=”0″/>

</WindowsRE>

Copy the edited reagent.xml to R:\Recovery\WindowsRE.  You will need to set the location of the Windows RE boot image using the following command:

  • Reagentc /setreimage /path R:\RECOVERY\WINDOWSRE /target c:\windows

clip_image006

When you run the command “reagent /info”, you will still see the Status as Disabled.  When you run the command “reagentc /enable”, the Windows RE Status will now show as Enabled.

clip_image008

Also when you check the output of the command “bcdedit /enum all”, you can see that the windows boot loader is updated with the correct device information.

clip_image010

Now you know how to customize the recovery partitions.  For more information, please refer to the article:

Windows PE (WinPE)
https://msdn.microsoft.com/en-us/library/windows/hardware/dn938389(v=vs.85).aspx

I hope this was helpful.

Suganya Natarajan
Technical Advisor
Windows Devices and Deployment Team

FREE EBOOK: Introducing Windows Server 2016 Technical Preview

$
0
0

 

The Introducing Windows Server 2016 Technical Preview book from Microsoft Press is now available. You can download it for FREE and it’s available in both standard and mobile PDF formats. This is a great way for you and our valued customers and partners to see what’s new in Windows Server 2016.

ebook

There are ways you can download depending on what you are using:

Description

Windows Server has powered a generation of organizations, from small businesses to large global enterprises. No matter what your role in IT, you can be guaranteed that you have touched Windows Server at some point in your career or, at the very least, you have seen it from afar! No matter what your area of expertise, this ebook introduces you to Windows Server 2016 Technical Preview and its latest developments, which is the next version of Windows Server.

Each chapter has been written by either field experts or members of the product group, who provide you with the latest information on every improvement or new feature that is coming in Windows Server. This information will help you to get ready for Windows Server 2016 Technical Preview and give you an opportunity to develop and design a path to introduce this powerful technology into your environment and take full advantage of what is to come. This book was written at a time when the product was still evolving, and it should be noted that things might change or not appear in the final version of Windows Server 2016 when it is released. All guidance in these chapters is meant to be tried and evaluated in a test setting; you should not implement this in a production environment.

This book assumes that you are familiar with key Windows Server concepts (i.e., Hyper-V, networking, and storage) as well as cloud technologies such as Microsoft Azure. In this ebook, we cover a variety of concepts related to the technology and present scenarios with a customer focus, but it is not intended as a how-to guide or design manual. You should use other sources including the online Microsoft resources to stay up to date with the latest developments on the roles and features of Windows Server 2016 Technical Preview. The online resources will also contain the latest how-to procedures and information about designing a Windows Server 2016 infrastructure for your business.

Enjoy,
John Marlin
Enterprise Cloud Group

Updates taking a long time to install in Windows Server 2008 R2

$
0
0

Today I would like to discuss one of the issues that you face while installing updates in Windows Server 2008 R2.  When you install an update, it may hang for a very long period of time without any progress showing.  If you open Event Viewer and look in the Setup Event log, you will get the entries as shown below (in the below example hotfix KB2545850 is facing the issue):

2/2/2016 10:29:17 AM Information server.contoso.com 1 Microsoft-Windows-Servicing N/A NT AUTHORITY\SYSTEM Initiating changes for package KB2545850. Current state is Absent. Target state is Installed. Client id: WindowsUpdateAgent.

2/2/2016 1:51:06 PM Error server.contoso.com 3 Microsoft-Windows-WUSA N/A Contoso\User Windows update ‘Hotfix for Windows (KB2545850)’ could not be installed because of error 2149842953 ” (Command line: ”C:\Windows\system32\wusa.exe’ ‘D:\temp\Windows6.1-KB2545850-x64.msu’ ‘)

2/2/2016 2:02:08 PM Information server.contoso.com 1 Microsoft-Windows-Servicing N/A NT AUTHORITY\SYSTEM Initiating changes for package KB2545850. Current state is Staged. Target state is Installed. Client id: WindowsUpdateAgent.

In these situations, checking the CBS.log (located in c:\windows\logs\cbs) can be really helpful as it captures all the details regarding update installation failures.

If you look in the CBS log you will see the entries as follows:

2016-02-02 10:29:19, Info                  CBS    WatchList: Add package Package_57_for_KB2868725~31bf3856ad364e35~amd64~~6.1.1.1.2868725-165_neutral_LDR to re-evaluation(install) due to the change on Component Family: amd64_microsoft-windows-lsa_31bf3856ad364e35_0.0.0.0_none_26431bf35d52e5a2, Version: 6.1.7601.21728, change: (Owner: Package_5_for_KB2545850~31bf3856ad364e35~amd64~~6.1.1.0.2545850-5_neutral_LDR, Flag: 5, Action: 3)

2016-02-02 10:29:19, Info                  CBS    WatchList: Add package Package_33_for_KB2871997~31bf3856ad364e35~amd64~~6.1.2.5.Trigger_1 to re-evaluation(Always) due to the change on Component Family: amd64_microsoft-windows-lsa_31bf3856ad364e35_0.0.0.0_none_26431bf35d52e5a2, Version: 6.1.7601.21728, change: (Owner: Package_5_for_KB2545850~31bf3856ad364e35~amd64~~6.1.1.0.2545850-5_neutral_LDR, Flag: 5, Action: 3)

2016-02-02 10:29:19, Info                  CBS    WatchList: Add package Package_120_for_KB3121212~31bf3856ad364e35~amd64~~6.1.1.2.Trigger_1 to re-evaluation(Always) due to the change on Component Family: wow64_microsoft-windows-lsa_31bf3856ad364e35_0.0.0.0_none_3097c64591b3a79d, Version: 6.1.7601.21728, change: (Owner: Package_5_for_KB2545850~31bf3856ad364e35~amd64~~6.1.1.0.2545850-5_neutral_LDR, Flag: 5, Action: 3)

2016-02-02 10:29:19, Info                  CBS    WatchList: Add package Package_121_for_KB3121212~31bf3856ad364e35~amd64~~6.1.1.2.Trigger_1 to re-evaluation(Always) due to the change on Component Family: wow64_microsoft-windows-lsa_31bf3856ad364e35_0.0.0.0_none_3097c64591b3a79d, Version: 6.1.7601.21728, change: (Owner: Package_5_for_KB2545850~31bf3856ad364e35~amd64~~6.1.1.0.2545850-5_neutral_LDR, Flag: 5, Action: 3)

Why does this happen?

When you check the file version on the machine, it is currently using the GDR (General Distribution Release) branch and the date stamp is marked as 12/30/2015. (latest date)

clip_image002

When you check the file version in KB2545850, it is an LDR (Limited Distribution Release) branch hot-fix that has the date stamp of 03/14/2011 (older date).

image

When we try to update a binary that is using the GDR branch of the hot-fix with the LDR version of the hot-fix, we cannot compare the version numbers to see if the file needs to be updated.  At this point we have to compare the dates of the files.  In this situation, the date of the file in the hot-fix we are trying to install from KB2545850 is older than the binary that is already on the machine.  This triggers a re-evaluation during the applicability checks and takes more time.  When we have to compare the dates on a lot of binaries like the example above, it causes the process of installing to take longer.

This behavior can be expected in Windows Server 2008 R2.

We will have to wait for hours or days for the installation to complete.  So, how do we speed up the servicing mechanism?

The following workaround can help in speeding up the operation. The below steps may vary for each KB. These steps are proposed based on the behavior noticed in each scenario.

Workaround steps

  • Cancel the update when it is stuck and reinitiate it right away and check. At times reevaluation might be cached which would complete the installation of the updates.
  • Check if the update/hotfix is superseded by checking in the Microsoft Update Catalog site: http://catalog.update.microsoft.com . Since the superseded update will have all the binaries with latest date stamp, it is least likely to trigger re-evaluation.
  • To speed up the installation we can also install the hotfix/update manually using DISM and check if that helps:

1. Download problematic update from Microsoft Download Center and place it to any folder i.e. C:\Temp
2. Run cmd.exe with elevated privileges (right click on cmd.exe and choose “run as administrator”)
3. Unpack the update by using an administrative command prompt and running the following commands:

  • expand -f:*{update name}.msu {destination folder}
    • Example: To unpack update Windows6.1-kb2545850-v2.msu which is stored in C:\Temp folder the command will be following:
    • expand -f:* C:\TEMP\Windows6.1-kb254580-v2.msu  C:\TEMP
    • This should result in a .cab file in the C:\Temp directory

4. Now from the elevated command prompt, run the following command:

Dism.exe /Online /Add-Package /PackagePath:c:\temp\Windows6.1- kb2545850-v2.cab

More Information on Re-evaluation:

Re-evaluation of the applicability rules vary from machine after machine and this cannot be predicted as the applicability and installation phases are working properly.

Below are some of the Applicability Evaluation Scenarios.

1. Package being installed by CBS:
Updates could be installed, possibly triggering other updates to change state, or they could be staged, waiting for their applicability to be satisfied, or they could be absent with no presence on the system.

2. Package being uninstalled by CBS:
Components being removed can cause applicability rules targeting them to change state, triggering additional servicing operations. Child packages could be triggered to be removed when their applicable parent package is uninstalled.

3. Update being installed by CBS, or selectable feature selected on:
Other updates may be triggered for installation

4. Update being uninstalled by CBS, or selectable feature selected off:
Other updates may be triggered for uninstallation.

5. Normal component being upgraded (on existing branch):
Reevaluation based on the component version change may uninstall an older version.

6. New component being installed:
Other updates may be triggered for installation.

7. Branch-forcing component being installed:
Reevaluation based on component version change may trigger patches for elevated branch.

8. Normal component being removed:
Reevaluation based on component version removal may install an older version.

9. Branch forcing component being removed (determined by mapping back to update):
If another component forcing the same branch, reevaluation is based on current branch.
If branch is released, the new active branch needs to be determined, which then triggers reevaluation.

More Information on LDR and GDR:

How does Windows choose which version of a file to install?
https://blogs.technet.microsoft.com/joscon/2011/11/30/how-does-windows-choose-which-version-of-a-file-to-install

What is the difference between general distribution and limited distribution releases?
https://blogs.msdn.microsoft.com/windowsvistanow/2008/03/11/what-is-the-difference-between-general-distribution-and-limited-distribution-releases

QFE vs GDR/LDR hotfixes
https://blogs.technet.microsoft.com/instan/2009/03/04/qfe-vs-gdrldr-hotfixes

GDR, QFE, LDR… WTH?
http://blogs.technet.com/b/mrsnrub/archive/2009/05/14/gdr-qfe-ldr-wth.aspx

Branching Out
http://blogs.technet.com/b/mrsnrub/archive/2009/05/18/branching-out.aspx

Hope this information was helpful

Poornima Venkataraman
Support Engineer
Windows Core Team, Microsoft Enterprise Platforms

Suganya Natarajan
Technical Advisor
Windows Core Team, Microsoft Enterprise Platforms

Establishing Network Connectivity to a Share in the Windows Recovery Environment

$
0
0

Hi there! My name is Neil Dsouza and I’m a Support Escalation Engineer with the Windows Core team.

Today I’m going to cover a scenario where you have a server that fails to boot and all you want to do is copy the data off the machine to a network share.  In most cases connecting a USB flash drive/hard drive is the easiest solution to copy off the data.  However, if you don’t have physical access to the server, but you do have remote console access, then you can copy the data to a network share. These steps will also help gather logs or data when troubleshooting Windows in a no-boot scenario.

For Operating systems newer than Windows 7, by default Windows Recovery Environment (WinRE) is installed, unless this was changed during deployment/installation of Windows. The steps should work for most operating systems Windows 7 and newer.

When the Operating system fails to boot, by default it will take you to a boot menu with an option to boot into WinRE which would say ‘Repair your computer’ or ‘Launch Startup repair’.

image

Image1: Boot menu to go to WinRE in Windows 7 or Windows Server 2008 R2

image

Image 2: Boot Menu to go to WinRE in Windows 8 / 2012 / 2012 R2 / 8.1 / 10

Choosing the ‘Startup Repair’ option will run the ‘Startup Repair Wizard’ and attempt to fix the most common issues that cause operating system boot failures.

Capture

Image 3: Startup Repair running in Windows 8 and newer OS

image

image

image

Images 4: Startup Repair running in Windows version before Windows 8

In the end a report is provided with the tests that ran to detect issues and what the result was. This information can be useful to understand why Windows failed to boot.

Capture2

Image 5: Startup repair Results

If you miss seeing this in the wizard, you can always go to the Command Prompt in WinRE and open the below file which has the information logged: %WINDIR%\System32\LogFiles\SrtTrail.txt

If you do not see the ‘Repair Your Computer’ or ‘Launch Startup Repair’ option, it means that WinRE was not installed when the OS was installed. In such cases you can still boot to WinRE by using the operating system disk and selecting ‘Repair your computer’ at the install screen.

Capture3

Image 6: Boot from CD/DVD/ISO screen for Windows 7

Capture4

Image 7: Boot from CD/DVD/ISO screens for Windows 8 / 2012 / 8.1 / 2012 R2 / 10

 

On Windows 8 and newer OS’s, you have to navigate through the options further as shown below:

Select ‘Troubleshoot’

image

Select ‘Advanced Options’

image

Select ‘Command Prompt’, or you could run the ‘Startup Repair’ from here

image

For OS versions of Vista thru Windows Server 2008 R2

Click ‘Next’

image

Select ‘Command Prompt’, or you could run the ‘Startup Repair’ from here

image

Once we are at the command prompt we can do our magic.

First thing we want to do is see what the drive letters are for each partition. DISKPART is our friend here.

Run the command: diskpart

List Volume

image

Now we come to the most interesting part.

To establish network connectivity with a file share on some machine where you need to copy your data, log files or that memory dump Microsoft support is asking for when the machine blue screens on startup, you can run ‘wpeinit’ from the command prompt. This is a program built into WinPE from which WinRE is created.

Now you can run ‘ipconfig’ and you will see that an IP address is assigned to the WinRE session. This will work only if you have a DHCP server assigning IP addresses.

In certain cases, ‘wpeinit’ runs, however does not initialize the NIC or does not assign IP address. There are couple of reasons why that happens.

1. NIC driver is not loaded

In this scenario you can manually load the NIC driver. First you need to identify the right driver that may already reside on the machine. All drivers that were installed on the machine are maintained, unless explicitly removed, in the path %WINDIR%\System32\DriverStore\FileRepository with a folder name starting with the driver inf file name followed by a GUID. You may have multiple folders starting with inf filename if you have installed multiple versions of the same driver. In any case you can download the driver and extract it on to a USB stick. We need the .sys, .inf, etc files uncompressed to be able to load the driver manually.

An example of the driver files in FileRepository is below.

image

Run the below command to load the NIC driver from the above image:

drvload c:\Windows\system32\DriverStore\FileRepository\netwew01.inf_amd64_9963f911be06feae\netwew01.inf

2. There’s no DHCP Server in the environment that could automatically assign an IP address

image

What do you do if there isn’t a DHCP server assigning IP addresses? Well, you can assign a static IP address using the below netsh command. You may use the same IP address of the server, however if you have trouble with that use a different IP address.

netsh int ipv4 set address “<Connection Name>” static <IP> <Subnet Mask> <Default Gateway>

The Connection Name can be obtained by running ‘ipconfig /all’ command. It’s the text highlighted in blue in the below image.

image

Once you have an IP address, you can map a network drive using the command below to a file server or a simple share on another machine.

net use y: \\ServerName\ShareName

ServerName is the Computer Name of the server or IP Address in case name resolution is not working and ShareName is the name of the share. You will be asked for credentials to access the network share.

You could run the command below for it to take next available drive letter and display the letter.

net use * \\ServerName\ShareName

Now you can copy files and folders from the non-booting machine to a network share using copy, xcopy or much better use robocopy.

I hope this helps you save some time when you have a machine that is not booting up, whether it’s a server or a client machine and help you copy/backup important data or logs to investigate the issue

Neil Dsouza
Support Escalation Engineer
Windows Core Team

Viewing all 101 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>